Sciencing_Icons_Science SCIENCE

Sciencing_icons_biology biology, sciencing_icons_cells cells, sciencing_icons_molecular molecular, sciencing_icons_microorganisms microorganisms, sciencing_icons_genetics genetics, sciencing_icons_human body human body, sciencing_icons_ecology ecology, sciencing_icons_chemistry chemistry, sciencing_icons_atomic & molecular structure atomic & molecular structure, sciencing_icons_bonds bonds, sciencing_icons_reactions reactions, sciencing_icons_stoichiometry stoichiometry, sciencing_icons_solutions solutions, sciencing_icons_acids & bases acids & bases, sciencing_icons_thermodynamics thermodynamics, sciencing_icons_organic chemistry organic chemistry, sciencing_icons_physics physics, sciencing_icons_fundamentals-physics fundamentals, sciencing_icons_electronics electronics, sciencing_icons_waves waves, sciencing_icons_energy energy, sciencing_icons_fluid fluid, sciencing_icons_astronomy astronomy, sciencing_icons_geology geology, sciencing_icons_fundamentals-geology fundamentals, sciencing_icons_minerals & rocks minerals & rocks, sciencing_icons_earth scructure earth structure, sciencing_icons_fossils fossils, sciencing_icons_natural disasters natural disasters, sciencing_icons_nature nature, sciencing_icons_ecosystems ecosystems, sciencing_icons_environment environment, sciencing_icons_insects insects, sciencing_icons_plants & mushrooms plants & mushrooms, sciencing_icons_animals animals, sciencing_icons_math math, sciencing_icons_arithmetic arithmetic, sciencing_icons_addition & subtraction addition & subtraction, sciencing_icons_multiplication & division multiplication & division, sciencing_icons_decimals decimals, sciencing_icons_fractions fractions, sciencing_icons_conversions conversions, sciencing_icons_algebra algebra, sciencing_icons_working with units working with units, sciencing_icons_equations & expressions equations & expressions, sciencing_icons_ratios & proportions ratios & proportions, sciencing_icons_inequalities inequalities, sciencing_icons_exponents & logarithms exponents & logarithms, sciencing_icons_factorization factorization, sciencing_icons_functions functions, sciencing_icons_linear equations linear equations, sciencing_icons_graphs graphs, sciencing_icons_quadratics quadratics, sciencing_icons_polynomials polynomials, sciencing_icons_geometry geometry, sciencing_icons_fundamentals-geometry fundamentals, sciencing_icons_cartesian cartesian, sciencing_icons_circles circles, sciencing_icons_solids solids, sciencing_icons_trigonometry trigonometry, sciencing_icons_probability-statistics probability & statistics, sciencing_icons_mean-median-mode mean/median/mode, sciencing_icons_independent-dependent variables independent/dependent variables, sciencing_icons_deviation deviation, sciencing_icons_correlation correlation, sciencing_icons_sampling sampling, sciencing_icons_distributions distributions, sciencing_icons_probability probability, sciencing_icons_calculus calculus, sciencing_icons_differentiation-integration differentiation/integration, sciencing_icons_application application, sciencing_icons_projects projects, sciencing_icons_news news.

  • Share Tweet Email Print
  • Home ⋅
  • Math ⋅
  • Probability & Statistics ⋅
  • Distributions

How to Write a Hypothesis for Correlation

A hypothesis for correlation predicts a statistically significant relationship.

How to Calculate a P-Value

A hypothesis is a testable statement about how something works in the natural world. While some hypotheses predict a causal relationship between two variables, other hypotheses predict a correlation between them. According to the Research Methods Knowledge Base, a correlation is a single number that describes the relationship between two variables. If you do not predict a causal relationship or cannot measure one objectively, state clearly in your hypothesis that you are merely predicting a correlation.

Research the topic in depth before forming a hypothesis. Without adequate knowledge about the subject matter, you will not be able to decide whether to write a hypothesis for correlation or causation. Read the findings of similar experiments before writing your own hypothesis.

Identify the independent variable and dependent variable. Your hypothesis will be concerned with what happens to the dependent variable when a change is made in the independent variable. In a correlation, the two variables undergo changes at the same time in a significant number of cases. However, this does not mean that the change in the independent variable causes the change in the dependent variable.

Construct an experiment to test your hypothesis. In a correlative experiment, you must be able to measure the exact relationship between two variables. This means you will need to find out how often a change occurs in both variables in terms of a specific percentage.

Establish the requirements of the experiment with regard to statistical significance. Instruct readers exactly how often the variables must correlate to reach a high enough level of statistical significance. This number will vary considerably depending on the field. In a highly technical scientific study, for instance, the variables may need to correlate 98 percent of the time; but in a sociological study, 90 percent correlation may suffice. Look at other studies in your particular field to determine the requirements for statistical significance.

State the null hypothesis. The null hypothesis gives an exact value that implies there is no correlation between the two variables. If the results show a percentage equal to or lower than the value of the null hypothesis, then the variables are not proven to correlate.

Record and summarize the results of your experiment. State whether or not the experiment met the minimum requirements of your hypothesis in terms of both percentage and significance.

Related Articles

How to determine the sample size in a quantitative..., how to calculate a two-tailed test, how to interpret a student's t-test results, how to know if something is significant using spss, quantitative vs. qualitative data and laboratory testing, similarities of univariate & multivariate statistical..., what is the meaning of sample size, distinguishing between descriptive & causal studies, how to calculate cv values, how to determine your practice clep score, what are the different types of correlations, how to calculate p-hat, how to calculate percentage error, how to calculate percent relative range, how to calculate a sample size population, how to calculate bias, how to calculate the percentage of another number, how to find y value for the slope of a line, advantages & disadvantages of finding variance.

  • University of New England; Steps in Hypothesis Testing for Correlation; 2000
  • Research Methods Knowledge Base; Correlation; William M.K. Trochim; 2006
  • Science Buddies; Hypothesis

About the Author

Brian Gabriel has been a writer and blogger since 2009, contributing to various online publications. He earned his Bachelor of Arts in history from Whitworth University.

Photo Credits

Thinkstock/Comstock/Getty Images

Find Your Next Great Science Fair Project! GO

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Correlational Research | Guide, Design & Examples

Correlational Research | Guide, Design & Examples

Published on 5 May 2022 by Pritha Bhandari . Revised on 5 December 2022.

A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them.

A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative.

Table of contents

Correlational vs experimental research, when to use correlational research, how to collect correlational data, how to analyse correlational data, correlation and causation, frequently asked questions about correlational research.

Correlational and experimental research both use quantitative methods to investigate relationships between variables. But there are important differences in how data is collected and the types of conclusions you can draw.

Prevent plagiarism, run a free check.

Correlational research is ideal for gathering data quickly from natural settings. That helps you generalise your findings to real-life situations in an externally valid way.

There are a few situations where correlational research is an appropriate choice.

To investigate non-causal relationships

You want to find out if there is an association between two variables, but you don’t expect to find a causal relationship between them.

Correlational research can provide insights into complex real-world relationships, helping researchers develop theories and make predictions.

To explore causal relationships between variables

You think there is a causal relationship between two variables, but it is impractical, unethical, or too costly to conduct experimental research that manipulates one of the variables.

Correlational research can provide initial indications or additional support for theories about causal relationships.

To test new measurement tools

You have developed a new instrument for measuring your variable, and you need to test its reliability or validity .

Correlational research can be used to assess whether a tool consistently or accurately captures the concept it aims to measure.

There are many different methods you can use in correlational research. In the social and behavioural sciences, the most common data collection methods for this type of research include surveys, observations, and secondary data.

It’s important to carefully choose and plan your methods to ensure the reliability and validity of your results. You should carefully select a representative sample so that your data reflects the population you’re interested in without bias .

In survey research , you can use questionnaires to measure your variables of interest. You can conduct surveys online, by post, by phone, or in person.

Surveys are a quick, flexible way to collect standardised data from many participants, but it’s important to ensure that your questions are worded in an unbiased way and capture relevant insights.

Naturalistic observation

Naturalistic observation is a type of field research where you gather data about a behaviour or phenomenon in its natural environment.

This method often involves recording, counting, describing, and categorising actions and events. Naturalistic observation can include both qualitative and quantitative elements, but to assess correlation, you collect data that can be analysed quantitatively (e.g., frequencies, durations, scales, and amounts).

Naturalistic observation lets you easily generalise your results to real-world contexts, and you can study experiences that aren’t replicable in lab settings. But data analysis can be time-consuming and unpredictable, and researcher bias may skew the interpretations.

Secondary data

Instead of collecting original data, you can also use data that has already been collected for a different purpose, such as official records, polls, or previous studies.

Using secondary data is inexpensive and fast, because data collection is complete. However, the data may be unreliable, incomplete, or not entirely relevant, and you have no control over the reliability or validity of the data collection procedures.

After collecting data, you can statistically analyse the relationship between variables using correlation or regression analyses, or both. You can also visualise the relationships between variables with a scatterplot.

Different types of correlation coefficients and regression analyses are appropriate for your data based on their levels of measurement and distributions .

Correlation analysis

Using a correlation analysis, you can summarise the relationship between variables into a correlation coefficient : a single number that describes the strength and direction of the relationship between variables. With this number, you’ll quantify the degree of the relationship between variables.

The Pearson product-moment correlation coefficient, also known as Pearson’s r , is commonly used for assessing a linear relationship between two quantitative variables.

Correlation coefficients are usually found for two variables at a time, but you can use a multiple correlation coefficient for three or more variables.

Regression analysis

With a regression analysis , you can predict how much a change in one variable will be associated with a change in the other variable. The result is a regression equation that describes the line on a graph of your variables.

You can use this equation to predict the value of one variable based on the given value(s) of the other variable(s). It’s best to perform a regression analysis after testing for a correlation between your variables.

It’s important to remember that correlation does not imply causation . Just because you find a correlation between two things doesn’t mean you can conclude one of them causes the other, for a few reasons.

Directionality problem

If two variables are correlated, it could be because one of them is a cause and the other is an effect. But the correlational research design doesn’t allow you to infer which is which. To err on the side of caution, researchers don’t conclude causality from correlational studies.

Third variable problem

A confounding variable is a third variable that influences other variables to make them seem causally related even though they are not. Instead, there are separate causal links between the confounder and each variable.

In correlational research, there’s limited or no researcher control over extraneous variables . Even if you statistically control for some potential confounders, there may still be other hidden variables that disguise the relationship between your study variables.

Although a correlational study can’t demonstrate causation on its own, it can help you develop a causal hypothesis that’s tested in controlled experiments.

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, December 05). Correlational Research | Guide, Design & Examples. Scribbr. Retrieved 22 April 2024, from https://www.scribbr.co.uk/research-methods/correlational-research-design/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, a quick guide to experimental design | 5 steps & examples, quasi-experimental design | definition, types & examples, qualitative vs quantitative research | examples & methods.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

12.1.2: Hypothesis Test for a Correlation

  • Last updated
  • Save as PDF
  • Page ID 34784

  • Rachel Webb
  • Portland State University

One should perform a hypothesis test to determine if there is a statistically significant correlation between the independent and the dependent variables. The population correlation coefficient \(\rho\) (this is the Greek letter rho, which sounds like “row” and is not a \(p\)) is the correlation among all possible pairs of data values \((x, y)\) taken from a population.

We will only be using the two-tailed test for a population correlation coefficient \(\rho\). The hypotheses are:

\(H_{0}: \rho = 0\) \(H_{1}: \rho \neq 0\)

The null-hypothesis of a two-tailed test states that there is no correlation (there is not a linear relation) between \(x\) and \(y\). The alternative-hypothesis states that there is a significant correlation (there is a linear relation) between \(x\) and \(y\).

The t-test is a statistical test for the correlation coefficient. It can be used when \(x\) and \(y\) are linearly related, the variables are random variables, and when the population of the variable \(y\) is normally distributed.

The formula for the t-test statistic is \(t = r \sqrt{\left( \dfrac{n-2}{1-r^{2}} \right)}\).

Use the t-distribution with degrees of freedom equal to \(df = n - 2\).

Note the \(df = n - 2\) since we have two variables, \(x\) and \(y\).

Test to see if the correlation for hours studied on the exam and grade on the exam is statistically significant. Use \(\alpha\) = 0.05.

Correlation is Not Causation

Just because two variables are significantly correlated does not imply a cause and effect relationship. There are several relationships that are possible. It could be that \(x\) causes \(y\) to change. You can actually swap \(x\) and \(y\) in the fields and get the same \(r\) value and \(y\) could be causing \(x\) to change. There could be other variables that are affecting the two variables of interest. For instance, you can usually show a high correlation between ice cream sales and home burglaries. Selling more ice cream does not “cause” burglars to rob homes. More home burglaries do not cause more ice cream sales. We would probably notice that the temperature outside may be causing both ice cream sales to increase and more people to leave their windows open. This third variable is called a lurking variable and causes both \(x\) and \(y\) to change, making it look like the relationship is just between \(x\) and \(y\).

There are also highly correlated variables that seemingly have nothing to do with one another. These seemingly unrelated variables are called spurious correlations.

The following website has some examples of spurious correlations (a slight caution that the author has some gloomy examples): http://www.tylervigen.com/spurious-correlations . Figure 12-7 is one of their examples:

Chart from tylervigen.com, showing correlation from 2000 to 2009 between per-capita mozzerella cheese consumption number of civil engineering doctorates awarded.

If we were to take out each pair of measurements by year from the time-series plot in Figure 12-7, we would get the following data.

Using Excel to find a scatterplot and compute a correlation coefficient, we get the scatterplot shown in Figure 12-8 and a correlation of \(r = 0.9586\).

Excel-generated scatterplot of the spurious correlation example, with mozzarella cheese consumption on the x-axis and engineering doctorates on the y-axis.

With \(r = 0.9586\), there is strong correlation between the number of engineering doctorate degrees earned and mozzarella cheese consumption over time, but earning your doctorate degree does not cause one to go eat more cheese. Nor does eating more cheese cause people to earn a doctorate degree. Most likely these items are both increasing over time and therefore show a spurious correlation to one another.

When two variables are correlated, it does not imply that one variable causes the other variable to change.

“Correlation is causation” is an incorrect assumption that because something correlates, there is a causal relationship. Causality is the area of statistics that is most commonly misused, and misinterpreted, by people. Media, advertising, politicians and lobby groups often leap upon a perceived correlation and use it to “prove” their own agenda. They fail to understand that, just because results show a correlation, there is no proof of an underlying causality. Many people assume that because a poll, or a statistic, contains many numbers, it must be scientific, and therefore correct. The human brain is built to try and subconsciously establish links between many pieces of information at once. The brain often tries to construct patterns from randomness, and may jump to conclusions, and assume that a cause and effect relationship exists. Relationships may be accidental or due to other unmeasured variables. Overcoming this tendency to jump to a cause and effect relationship is part of academic training for students and in most fields, from statistics to the arts.

When looking at correlations, start with a scatterplot to see if there is a linear relationship prior to finding a correlation coefficient. If there is a linear relationship in the scatterplot, then we can find the correlation coefficient to tell the strength and direction of the relationship. Clusters of dots forming a linear uphill pattern from left to right will have a positive correlation. The closer the dots in the scatterplot are to a straight line, the closer \(r\) will be to \(1\). If the cluster of dots in the scatterplots go downhill from left to right in linear pattern, then there is a negative relationship. The closer those dots in the scatterplot are to a straight line going downhill, the closer \(r\) will be to \(-1\). Use a t-test to see if the correlation is statistically significant. As sample sizes get larger, smaller values of \(r\) become statistically significant. Be careful with outliers, which can heavily influence correlations. Most importantly, correlation is not causation. When \(x\) and \(y\) are significantly correlated, this does not mean that \(x\) causes \(y\) to change.

What is Correlational Research? (+ Design, Examples)

Appinio Research · 04.03.2024 · 30min read

What is Correlational Research Design Examples

Ever wondered how researchers explore connections between different factors without manipulating them? Correlational research offers a window into understanding the relationships between variables in the world around us. From examining the link between exercise habits and mental well-being to exploring patterns in consumer behavior, correlational studies help us uncover insights that shape our understanding of human behavior, inform decision-making, and drive innovation. In this guide, we'll dive into the fundamentals of correlational research, exploring its definition, importance, ethical considerations, and practical applications across various fields. Whether you're a student delving into research methods or a seasoned researcher seeking to expand your methodological toolkit, this guide will equip you with the knowledge and skills to conduct and interpret correlational studies effectively.

What is Correlational Research?

Correlational research is a methodological approach used in scientific inquiry to examine the relationship between two or more variables. Unlike experimental research, which seeks to establish cause-and-effect relationships through manipulation and control of variables, correlational research focuses on identifying and quantifying the degree to which variables are related to one another. This method allows researchers to investigate associations, patterns, and trends in naturalistic settings without imposing experimental manipulations.

Importance of Correlational Research

Correlational research plays a crucial role in advancing scientific knowledge across various disciplines. Its importance stems from several key factors:

  • Exploratory Analysis :  Correlational studies provide a starting point for exploring potential relationships between variables. By identifying correlations, researchers can generate hypotheses and guide further investigation into causal mechanisms and underlying processes.
  • Predictive Modeling:  Correlation coefficients can be used to predict the behavior or outcomes of one variable based on the values of another variable. This predictive ability has practical applications in fields such as economics, psychology, and epidemiology, where forecasting future trends or outcomes is essential.
  • Diagnostic Purposes:  Correlational analyses can help identify patterns or associations that may indicate the presence of underlying conditions or risk factors. For example, correlations between certain biomarkers and disease outcomes can inform diagnostic criteria and screening protocols in healthcare.
  • Theory Development:  Correlational research contributes to theory development by providing empirical evidence for proposed relationships between variables. Researchers can refine and validate theoretical models in their respective fields by systematically examining correlations across different contexts and populations.
  • Ethical Considerations:  In situations where experimental manipulation is not feasible or ethical, correlational research offers an alternative approach to studying naturally occurring phenomena. This allows researchers to address research questions that may otherwise be inaccessible or impractical to investigate.

Correlational vs. Causation in Research

It's important to distinguish between correlation and causation in research. While correlational studies can identify relationships between variables, they cannot establish causal relationships on their own. Several factors contribute to this distinction:

  • Directionality:  Correlation does not imply the direction of causation. A correlation between two variables does not indicate which variable is causing the other; it merely suggests that they are related in some way. Additional evidence, such as experimental manipulation or longitudinal studies , is needed to establish causality.
  • Third Variables:  Correlations may be influenced by third variables, also known as confounding variables, that are not directly measured or controlled in the study. These third variables can create spurious correlations or obscure true causal relationships between the variables of interest.
  • Temporal Sequence:  Causation requires a temporal sequence, with the cause preceding the effect in time. Correlational studies alone cannot establish the temporal order of events, making it difficult to determine whether one variable causes changes in another or vice versa.

Understanding the distinction between correlation and causation is critical for interpreting research findings accurately and drawing valid conclusions about the relationships between variables. While correlational research provides valuable insights into associations and patterns, establishing causation typically requires additional evidence from experimental studies or other research designs.

Key Concepts in Correlation

Understanding key concepts in correlation is essential for conducting meaningful research and interpreting results accurately.

Correlation Coefficient

The correlation coefficient is a statistical measure that quantifies the strength and direction of the relationship between two variables. It's denoted by the symbol  r  and ranges from -1 to +1.

  • A correlation coefficient of  -1  indicates a perfect negative correlation, meaning that as one variable increases, the other decreases in a perfectly predictable manner.
  • A coefficient of  +1  signifies a perfect positive correlation, where both variables increase or decrease together in perfect sync.
  • A coefficient of  0  implies no correlation, indicating no systematic relationship between the variables.

Strength and Direction of Correlation

The strength of correlation refers to how closely the data points cluster around a straight line on the scatterplot. A correlation coefficient close to -1 or +1 indicates a strong relationship between the variables, while a coefficient close to 0 suggests a weak relationship.

  • Strong correlation:  When the correlation coefficient approaches -1 or +1, it indicates a strong relationship between the variables. For example, a correlation coefficient of -0.9 suggests a strong negative relationship, while a coefficient of +0.8 indicates a strong positive relationship.
  • Weak correlation:  A correlation coefficient close to 0 indicates a weak or negligible relationship between the variables. For instance, a coefficient of -0.1 or +0.1 suggests a weak correlation where the variables are minimally related.

The direction of correlation determines how the variables change relative to each other.

  • Positive correlation:  When one variable increases, the other variable also tends to increase. Conversely, when one variable decreases, the other variable tends to decrease. This is represented by a positive correlation coefficient.
  • Negative correlation:  In a negative correlation, as one variable increases, the other variable tends to decrease. Similarly, when one variable decreases, the other variable tends to increase. This relationship is indicated by a negative correlation coefficient.

Scatterplots

A scatterplot is a graphical representation of the relationship between two variables. Each data point on the plot represents the values of both variables for a single observation. By plotting the data points on a Cartesian plane, you can visualize patterns and trends in the relationship between the variables.

  • Interpretation:  When examining a scatterplot, observe the pattern of data points. If the points cluster around a straight line, it indicates a strong correlation. However, if the points are scattered randomly, it suggests a weak or no correlation.
  • Outliers:  Identify any outliers or data points that deviate significantly from the overall pattern. Outliers can influence the correlation coefficient and may warrant further investigation to determine their impact on the relationship between variables.
  • Line of Best Fit:  In some cases, you may draw a line of best fit through the data points to visually represent the overall trend in the relationship. This line can help illustrate the direction and strength of the correlation between the variables.

Understanding these key concepts will enable you to interpret correlation coefficients accurately and draw meaningful conclusions from your data.

How to Design a Correlational Study?

When embarking on a correlational study, careful planning and consideration are crucial to ensure the validity and reliability of your research findings.

Research Question Formulation

Formulating clear and focused research questions is the cornerstone of any successful correlational study. Your research questions should articulate the variables you intend to investigate and the nature of the relationship you seek to explore. When formulating your research questions:

  • Be Specific:  Clearly define the variables you are interested in studying and the population to which your findings will apply.
  • Be Testable:  Ensure that your research questions are empirically testable using correlational methods. Avoid vague or overly broad questions that are difficult to operationalize.
  • Consider Prior Research:  Review existing literature to identify gaps or unanswered questions in your area of interest. Your research questions should build upon prior knowledge and contribute to advancing the field.

For example, if you're interested in examining the relationship between sleep duration and academic performance among college students, your research question might be: "Is there a significant correlation between the number of hours of sleep per night and GPA among undergraduate students?"

Participant Selection

Selecting an appropriate sample of participants is critical to ensuring the generalizability and validity of your findings. Consider the following factors when selecting participants for your correlational study:

  • Population Characteristics:  Identify the population of interest for your study and ensure that your sample reflects the demographics and characteristics of this population.
  • Sampling Method:  Choose a sampling method that is appropriate for your research question and accessible, given your resources and constraints. Standard sampling methods include random sampling, stratified sampling, and convenience sampling.
  • Sample Size:   Determine the appropriate sample size based on factors such as the effect size you expect to detect, the desired level of statistical power, and practical considerations such as time and budget constraints.

For example, suppose you're studying the relationship between exercise habits and mental health outcomes in adults aged 18-65. In that case, you might use stratified random sampling to ensure representation from different age groups within the population.

Variables Identification

Identifying and operationalizing the variables of interest is essential for conducting a rigorous correlational study. When identifying variables for your research:

  • Independent and Dependent Variables:  Clearly distinguish between independent variables (factors that are hypothesized to influence the outcome) and dependent variables (the outcomes or behaviors of interest).
  • Control Variables:  Identify any potential confounding variables or extraneous factors that may influence the relationship between your independent and dependent variables. These variables should be controlled for in your analysis.
  • Measurement Scales:  Determine the appropriate measurement scales for your variables (e.g., nominal, ordinal, interval, or ratio) and select valid and reliable measures for assessing each construct.

For instance, if you're investigating the relationship between socioeconomic status (SES) and academic achievement, SES would be your independent variable, while academic achievement would be your dependent variable. You might measure SES using a composite index based on factors such as income, education level, and occupation.

Data Collection Methods

Selecting appropriate data collection methods is essential for obtaining reliable and valid data for your correlational study. When choosing data collection methods:

  • Quantitative vs. Qualitative :  Determine whether quantitative or qualitative methods are best suited to your research question and objectives. Correlational studies typically involve quantitative data collection methods like surveys, questionnaires, or archival data analysis.
  • Instrument Selection:  Choose measurement instruments that are valid, reliable, and appropriate for your variables of interest. Pilot test your instruments to ensure clarity and comprehension among your target population.
  • Data Collection Procedures :  Develop clear and standardized procedures for data collection to minimize bias and ensure consistency across participants and time points.

For example, if you're examining the relationship between smartphone use and sleep quality among adolescents, you might administer a self-report questionnaire assessing smartphone usage patterns and sleep quality indicators such as sleep duration and sleep disturbances.

Crafting a well-designed correlational study is essential for yielding meaningful insights into the relationships between variables. By meticulously formulating research questions , selecting appropriate participants, identifying relevant variables, and employing effective data collection methods, researchers can ensure the validity and reliability of their findings.

With Appinio , conducting correlational research becomes even more seamless and efficient. Our intuitive platform empowers researchers to gather real-time consumer insights in minutes, enabling them to make informed decisions with confidence.

Experience the power of Appinio and unlock valuable insights for your research endeavors. Schedule a demo today and revolutionize the way you conduct correlational studies!

Book a Demo

How to Analyze Correlational Data?

Once you have collected your data in a correlational study, the next crucial step is to analyze it effectively to draw meaningful conclusions about the relationship between variables.

How to Calculate Correlation Coefficients?

The correlation coefficient is a numerical measure that quantifies the strength and direction of the relationship between two variables. There are different types of correlation coefficients, including Pearson's correlation coefficient (for linear relationships), Spearman's rank correlation coefficient (for ordinal data ), and Kendall's tau (for non-parametric data). Here, we'll focus on calculating Pearson's correlation coefficient (r), which is commonly used for interval or ratio-level data.

To calculate Pearson's correlation coefficient (r), you can use statistical software such as SPSS, R, or Excel. However, if you prefer to calculate it manually, you can use the following formula:

r = Σ((X - X̄)(Y - Ȳ)) / ((n - 1) * (s_X * s_Y))
  • X  and  Y  are the scores of the two variables,
  • X̄  and  Ȳ  are the means of X and Y, respectively,
  • n  is the number of data points,
  • s_X  and  s_Y  are the standard deviations of X and Y, respectively.

Interpreting Correlation Results

Once you have calculated the correlation coefficient (r), it's essential to interpret the results correctly. When interpreting correlation results:

  • Magnitude:  The absolute value of the correlation coefficient (r) indicates the strength of the relationship between the variables. A coefficient close to 1 or -1 suggests a strong correlation, while a coefficient close to 0 indicates a weak or no correlation.
  • Direction:  The sign of the correlation coefficient (positive or negative) indicates the direction of the relationship between the variables. A positive correlation coefficient indicates a positive relationship (as one variable increases, the other tends to increase), while a negative correlation coefficient indicates a negative relationship (as one variable increases, the other tends to decrease).
  • Statistical Significance :  Assess the statistical significance of the correlation coefficient to determine whether the observed relationship is likely to be due to chance. This is typically done using hypothesis testing, where you compare the calculated correlation coefficient to a critical value based on the sample size and desired level of significance (e.g.,  α =0.05).

Statistical Significance

Determining the statistical significance of the correlation coefficient involves conducting hypothesis testing to assess whether the observed correlation is likely to occur by chance. The most common approach is to use a significance level (alpha,  α ) of 0.05, which corresponds to a 5% chance of obtaining the observed correlation coefficient if there is no true relationship between the variables.

To test the null hypothesis that the correlation coefficient is zero (i.e., no correlation), you can use inferential statistics such as the t-test or z-test. If the calculated p-value is less than the chosen significance level (e.g.,  p <0.05), you can reject the null hypothesis and conclude that the correlation coefficient is statistically significant.

Remember that statistical significance does not necessarily imply practical significance or the strength of the relationship. Even a statistically significant correlation with a small effect size may not be meaningful in practical terms.

By understanding how to calculate correlation coefficients, interpret correlation results, and assess statistical significance, you can effectively analyze correlational data and draw accurate conclusions about the relationships between variables in your study.

Correlational Research Limitations

As with any research methodology, correlational studies have inherent considerations and limitations that researchers must acknowledge and address to ensure the validity and reliability of their findings.

Third Variables

One of the primary considerations in correlational research is the presence of third variables, also known as confounding variables. These are extraneous factors that may influence or confound the observed relationship between the variables under study. Failing to account for third variables can lead to spurious correlations or erroneous conclusions about causality.

For example, consider a correlational study examining the relationship between ice cream consumption and drowning incidents. While these variables may exhibit a positive correlation during the summer months, the true causal factor is likely to be a third variable—such as hot weather—that influences both ice cream consumption and swimming activities, thereby increasing the risk of drowning.

To address the influence of third variables, researchers can employ various strategies, such as statistical control techniques, experimental designs (when feasible), and careful operationalization of variables.

Causal Inferences

Correlation does not imply causation—a fundamental principle in correlational research. While correlational studies can identify relationships between variables, they cannot determine causality. This is because correlation merely describes the degree to which two variables co-vary; it does not establish a cause-and-effect relationship between them.

For example, consider a correlational study that finds a positive relationship between the frequency of exercise and self-reported happiness. While it may be tempting to conclude that exercise causes happiness, it's equally plausible that happier individuals are more likely to exercise regularly. Without experimental manipulation and control over potential confounding variables, causal inferences cannot be made.

To strengthen causal inferences in correlational research, researchers can employ longitudinal designs, experimental methods (when ethical and feasible), and theoretical frameworks to guide their interpretations.

Sample Size and Representativeness

The size and representativeness of the sample are critical considerations in correlational research. A small or non-representative sample may limit the generalizability of findings and increase the risk of sampling bias .

For example, if a correlational study examines the relationship between socioeconomic status (SES) and educational attainment using a sample composed primarily of high-income individuals, the findings may not accurately reflect the broader population's experiences. Similarly, an undersized sample may lack the statistical power to detect meaningful correlations or relationships.

To mitigate these issues, researchers should aim for adequate sample sizes based on power analyses, employ random or stratified sampling techniques to enhance representativeness and consider the demographic characteristics of the target population when interpreting findings.

Ensure your survey delivers accurate insights by using our Sample Size Calculator . With customizable options for margin of error, confidence level, and standard deviation, you can determine the optimal sample size to ensure representative results. Make confident decisions backed by robust data.

Reliability and Validity

Ensuring the reliability and validity of measures is paramount in correlational research. Reliability refers to the consistency and stability of measurement over time, whereas validity pertains to the accuracy and appropriateness of measurement in capturing the intended constructs.

For example, suppose a correlational study utilizes self-report measures of depression and anxiety. In that case, it's essential to assess the measures' reliability (e.g., internal consistency, test-retest reliability) and validity (e.g., content validity, criterion validity) to ensure that they accurately reflect participants' mental health status.

To enhance reliability and validity in correlational research, researchers can employ established measurement scales, pilot-test instruments, use multiple measures of the same construct, and assess convergent and discriminant validity.

By addressing these considerations and limitations, researchers can enhance the robustness and credibility of their correlational studies and make more informed interpretations of their findings.

Correlational Research Examples and Applications

Correlational research is widely used across various disciplines to explore relationships between variables and gain insights into complex phenomena. We'll examine examples and applications of correlational studies, highlighting their practical significance and impact on understanding human behavior and societal trends across various industries and use cases.

Psychological Correlational Studies

In psychology, correlational studies play a crucial role in understanding various aspects of human behavior, cognition, and mental health. Researchers use correlational methods to investigate relationships between psychological variables and identify factors that may contribute to or predict specific outcomes.

For example, a psychological correlational study might examine the relationship between self-esteem and depression symptoms among adolescents. By administering self-report measures of self-esteem and depression to a sample of teenagers and calculating the correlation coefficient between the two variables, researchers can assess whether lower self-esteem is associated with higher levels of depression symptoms.

Other examples of psychological correlational studies include investigating the relationship between:

  • Parenting styles and academic achievement in children
  • Personality traits and job performance in the workplace
  • Stress levels and coping strategies among college students

These studies provide valuable insights into the factors influencing human behavior and mental well-being, informing interventions and treatment approaches in clinical and counseling settings.

Business Correlational Studies

Correlational research is also widely utilized in the business and management fields to explore relationships between organizational variables and outcomes. By examining correlations between different factors within an organization, researchers can identify patterns and trends that may impact performance, productivity, and profitability.

For example, a business correlational study might investigate the relationship between employee satisfaction and customer loyalty in a retail setting. By surveying employees to assess their job satisfaction levels and analyzing customer feedback and purchase behavior, researchers can determine whether higher employee satisfaction is correlated with increased customer loyalty and retention.

Other examples of business correlational studies include examining the relationship between:

  • Leadership styles and employee motivation
  • Organizational culture and innovation
  • Marketing strategies and brand perception

These studies provide valuable insights for organizations seeking to optimize their operations, improve employee engagement, and enhance customer satisfaction.

Marketing Correlational Studies

In marketing, correlational studies are instrumental in understanding consumer behavior, identifying market trends, and optimizing marketing strategies. By examining correlations between various marketing variables, researchers can uncover insights that drive effective advertising campaigns, product development, and brand management.

For example, a marketing correlational study might explore the relationship between social media engagement and brand loyalty among millennials. By collecting data on millennials' social media usage, brand interactions, and purchase behaviors, researchers can analyze whether higher levels of social media engagement correlate with increased brand loyalty and advocacy.

Another example of a marketing correlational study could focus on investigating the relationship between pricing strategies and customer satisfaction in the retail sector. By analyzing data on pricing fluctuations, customer feedback , and sales performance, researchers can assess whether pricing strategies such as discounts or promotions impact customer satisfaction and repeat purchase behavior.

Other potential areas of inquiry in marketing correlational studies include examining the relationship between:

  • Product features and consumer preferences
  • Advertising expenditures and brand awareness
  • Online reviews and purchase intent

These studies provide valuable insights for marketers seeking to optimize their strategies, allocate resources effectively, and build strong relationships with consumers in an increasingly competitive marketplace. By leveraging correlational methods, marketers can make data-driven decisions that drive business growth and enhance customer satisfaction.

Correlational Research Ethical Considerations

Ethical considerations are paramount in all stages of the research process, including correlational studies. Researchers must adhere to ethical guidelines to ensure the rights, well-being, and privacy of participants are protected. Key ethical considerations to keep in mind include:

  • Informed Consent:  Obtain informed consent from participants before collecting any data. Clearly explain the purpose of the study, the procedures involved, and any potential risks or benefits. Participants should have the right to withdraw from the study at any time without consequence.
  • Confidentiality:  Safeguard the confidentiality of participants' data. Ensure that any personal or sensitive information collected during the study is kept confidential and is only accessible to authorized individuals. Use anonymization techniques when reporting findings to protect participants' privacy.
  • Voluntary Participation:  Ensure that participation in the study is voluntary and not coerced. Participants should not feel pressured to take part in the study or feel that they will suffer negative consequences for declining to participate.
  • Avoiding Harm:  Take measures to minimize any potential physical, psychological, or emotional harm to participants. This includes avoiding deceptive practices, providing appropriate debriefing procedures (if necessary), and offering access to support services if participants experience distress.
  • Deception:  If deception is necessary for the study, it must be justified and minimized. Deception should be disclosed to participants as soon as possible after data collection, and any potential risks associated with the deception should be mitigated.
  • Researcher Integrity:  Maintain integrity and honesty throughout the research process. Avoid falsifying data, manipulating results, or engaging in any other unethical practices that could compromise the integrity of the study.
  • Respect for Diversity:  Respect participants' cultural, social, and individual differences. Ensure that research protocols are culturally sensitive and inclusive, and that participants from diverse backgrounds are represented and treated with respect.
  • Institutional Review:  Obtain ethical approval from institutional review boards or ethics committees before commencing the study. Adhere to the guidelines and regulations set forth by the relevant governing bodies and professional organizations.

Adhering to these ethical considerations ensures that correlational research is conducted responsibly and ethically, promoting trust and integrity in the scientific community.

Correlational Research Best Practices and Tips

Conducting a successful correlational study requires careful planning, attention to detail, and adherence to best practices in research methodology. Here are some tips and best practices to help you conduct your correlational research effectively:

  • Clearly Define Variables:  Clearly define the variables you are studying and operationalize them into measurable constructs. Ensure that your variables are accurately and consistently measured to avoid ambiguity and ensure reliability.
  • Use Valid and Reliable Measures:  Select measurement instruments that are valid and reliable for assessing your variables of interest. Pilot test your measures to ensure clarity, comprehension, and appropriateness for your target population.
  • Consider Potential Confounding Variables:  Identify and control for potential confounding variables that could influence the relationship between your variables of interest. Consider including control variables in your analysis to isolate the effects of interest.
  • Ensure Adequate Sample Size:  Determine the appropriate sample size based on power analyses and considerations of statistical power. Larger sample sizes increase the reliability and generalizability of your findings.
  • Random Sampling:  Whenever possible, use random sampling techniques to ensure that your sample is representative of the population you are studying. If random sampling is not feasible, carefully consider the characteristics of your sample and the extent to which findings can be generalized.
  • Statistical Analysis :  Choose appropriate statistical techniques for analyzing your data, taking into account the nature of your variables and research questions. Consult with a statistician if necessary to ensure the validity and accuracy of your analyses.
  • Transparent Reporting:  Transparently report your methods, procedures, and findings in accordance with best practices in research reporting. Clearly articulate your research questions, methods, results, and interpretations to facilitate reproducibility and transparency.
  • Peer Review:  Seek feedback from colleagues, mentors, or peer reviewers throughout the research process. Peer review helps identify potential flaws or biases in your study design, analysis, and interpretation, improving your research's overall quality and credibility.

By following these best practices and tips, you can conduct your correlational research with rigor, integrity, and confidence, leading to valuable insights and contributions to your field.

Conclusion for Correlational Research

Correlational research serves as a powerful tool for uncovering connections between variables in the world around us. By examining the relationships between different factors, researchers can gain valuable insights into human behavior, health outcomes, market trends, and more. While correlational studies cannot establish causation on their own, they provide a crucial foundation for generating hypotheses, predicting outcomes, and informing decision-making in various fields. Understanding the principles and practices of correlational research empowers researchers to explore complex phenomena, advance scientific knowledge, and address real-world challenges. Moreover, embracing ethical considerations and best practices in correlational research ensures the integrity, validity, and reliability of study findings. By prioritizing informed consent, confidentiality, and participant well-being, researchers can conduct studies that uphold ethical standards and contribute meaningfully to the body of knowledge. Incorporating transparent reporting, peer review, and continuous learning further enhances the quality and credibility of correlational research. Ultimately, by leveraging correlational methods responsibly and ethically, researchers can unlock new insights, drive innovation, and make a positive impact on society.

How to Collect Data for Correlational Research in Minutes?

Discover the revolutionary power of Appinio , the real-time market research platform. With Appinio, conducting your own correlational research has never been easier or more exciting. Gain access to real-time consumer insights, empowering you to make data-driven decisions in minutes. Here's why Appinio stands out:

  • From questions to insights in minutes:  Say goodbye to lengthy research processes. With Appinio, you can gather valuable insights swiftly, allowing you to act on them immediately.
  • Intuitive platform for everyone:  No need for a PhD in research. Appinio's user-friendly interface makes it accessible to anyone, empowering you to conduct professional-grade research effortlessly.
  • Extensive reach, global impact:  Define your target group from over 1200 characteristics and survey consumers in over 90 countries. With Appinio, the world is your research playground.

Register now EN

Get free access to the platform!

Join the loop 💌

Be the first to hear about new updates, product news, and data insights. We'll send it all straight to your inbox.

Get the latest market research news straight to your inbox! 💌

Wait, there's more

Targeted Advertising Definition Benefits Examples

25.04.2024 | 37min read

Targeted Advertising: Definition, Benefits, Examples

Quota Sampling Definition Types Methods Examples

17.04.2024 | 25min read

Quota Sampling: Definition, Types, Methods, Examples

What is Market Share? Definition, Formula, Examples

15.04.2024 | 34min read

What is Market Share? Definition, Formula, Examples

Logo for M Libraries Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

7.2 Correlational Research

Learning objectives.

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of nonexperimental research.

What Is Correlational Research?

Correlational research is a type of nonexperimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are essentially two reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms independent variable and dependent variable do not apply to this kind of research.

The other reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher cannot manipulate the independent variable because it is impossible, impractical, or unethical. For example, Allen Kanner and his colleagues thought that the number of “daily hassles” (e.g., rude salespeople, heavy traffic) that people experience affects the number of physical and psychological symptoms they have (Kanner, Coyne, Schaefer, & Lazarus, 1981). But because they could not manipulate the number of daily hassles their participants experienced, they had to settle for measuring the number of daily hassles—along with the number of symptoms—using self-report questionnaires. Although the strong positive relationship they found between these two variables is consistent with their idea that hassles cause symptoms, it is also consistent with the idea that symptoms cause hassles or that some third variable (e.g., neuroticism) causes both.

A common misconception among beginning researchers is that correlational research must involve two quantitative variables, such as scores on two extraversion tests or the number of hassles and number of symptoms people have experienced. However, the defining feature of correlational research is that the two variables are measured—neither one is manipulated—and this is true regardless of whether the variables are quantitative or categorical. Imagine, for example, that a researcher administers the Rosenberg Self-Esteem Scale to 50 American college students and 50 Japanese college students. Although this “feels” like a between-subjects experiment, it is a correlational study because the researcher did not manipulate the students’ nationalities. The same is true of the study by Cacioppo and Petty comparing college faculty and factory workers in terms of their need for cognition. It is a correlational study because the researchers did not manipulate the participants’ occupations.

Figure 7.2 “Results of a Hypothetical Study on Whether People Who Make Daily To-Do Lists Experience Less Stress Than People Who Do Not Make Such Lists” shows data from a hypothetical study on the relationship between whether people make a daily list of things to do (a “to-do list”) and stress. Notice that it is unclear whether this is an experiment or a correlational study because it is unclear whether the independent variable was manipulated. If the researcher randomly assigned some participants to make daily to-do lists and others not to, then it is an experiment. If the researcher simply asked participants whether they made daily to-do lists, then it is a correlational study. The distinction is important because if the study was an experiment, then it could be concluded that making the daily to-do lists reduced participants’ stress. But if it was a correlational study, it could only be concluded that these variables are statistically related. Perhaps being stressed has a negative effect on people’s ability to plan ahead (the directionality problem). Or perhaps people who are more conscientious are more likely to make to-do lists and less likely to be stressed (the third-variable problem). The crucial point is that what defines a study as experimental or correlational is not the variables being studied, nor whether the variables are quantitative or categorical, nor the type of graph or statistics used to analyze the data. It is how the study is conducted.

Figure 7.2 Results of a Hypothetical Study on Whether People Who Make Daily To-Do Lists Experience Less Stress Than People Who Do Not Make Such Lists

Results of a Hypothetical Study on Whether People Who Make Daily To-Do Lists Experience Less Stress Than People Who Do Not Make Such Lists

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. However, because some approaches to data collection are strongly associated with correlational research, it makes sense to discuss them here. The two we will focus on are naturalistic observation and archival data. A third, survey research, is discussed in its own chapter.

Naturalistic Observation

Naturalistic observation is an approach to data collection that involves observing people’s behavior in the environment in which it typically occurs. Thus naturalistic observation is a type of field research (as opposed to a type of laboratory research). It could involve observing shoppers in a grocery store, children on a school playground, or psychiatric inpatients in their wards. Researchers engaged in naturalistic observation usually make their observations as unobtrusively as possible so that participants are often not aware that they are being studied. Ethically, this is considered to be acceptable if the participants remain anonymous and the behavior occurs in a public setting where people would not normally have an expectation of privacy. Grocery shoppers putting items into their shopping carts, for example, are engaged in public behavior that is easily observable by store employees and other shoppers. For this reason, most researchers would consider it ethically acceptable to observe them for a study. On the other hand, one of the arguments against the ethicality of the naturalistic observation of “bathroom behavior” discussed earlier in the book is that people have a reasonable expectation of privacy even in a public restroom and that this expectation was violated.

Researchers Robert Levine and Ara Norenzayan used naturalistic observation to study differences in the “pace of life” across countries (Levine & Norenzayan, 1999). One of their measures involved observing pedestrians in a large city to see how long it took them to walk 60 feet. They found that people in some countries walked reliably faster than people in other countries. For example, people in the United States and Japan covered 60 feet in about 12 seconds on average, while people in Brazil and Romania took close to 17 seconds.

Because naturalistic observation takes place in the complex and even chaotic “real world,” there are two closely related issues that researchers must deal with before collecting data. The first is sampling. When, where, and under what conditions will the observations be made, and who exactly will be observed? Levine and Norenzayan described their sampling process as follows:

Male and female walking speed over a distance of 60 feet was measured in at least two locations in main downtown areas in each city. Measurements were taken during main business hours on clear summer days. All locations were flat, unobstructed, had broad sidewalks, and were sufficiently uncrowded to allow pedestrians to move at potentially maximum speeds. To control for the effects of socializing, only pedestrians walking alone were used. Children, individuals with obvious physical handicaps, and window-shoppers were not timed. Thirty-five men and 35 women were timed in most cities. (p. 186)

Precise specification of the sampling process in this way makes data collection manageable for the observers, and it also provides some control over important extraneous variables. For example, by making their observations on clear summer days in all countries, Levine and Norenzayan controlled for effects of the weather on people’s walking speeds.

The second issue is measurement. What specific behaviors will be observed? In Levine and Norenzayan’s study, measurement was relatively straightforward. They simply measured out a 60-foot distance along a city sidewalk and then used a stopwatch to time participants as they walked over that distance. Often, however, the behaviors of interest are not so obvious or objective. For example, researchers Robert Kraut and Robert Johnston wanted to study bowlers’ reactions to their shots, both when they were facing the pins and then when they turned toward their companions (Kraut & Johnston, 1979). But what “reactions” should they observe? Based on previous research and their own pilot testing, Kraut and Johnston created a list of reactions that included “closed smile,” “open smile,” “laugh,” “neutral face,” “look down,” “look away,” and “face cover” (covering one’s face with one’s hands). The observers committed this list to memory and then practiced by coding the reactions of bowlers who had been videotaped. During the actual study, the observers spoke into an audio recorder, describing the reactions they observed. Among the most interesting results of this study was that bowlers rarely smiled while they still faced the pins. They were much more likely to smile after they turned toward their companions, suggesting that smiling is not purely an expression of happiness but also a form of social communication.

A woman bowling

Naturalistic observation has revealed that bowlers tend to smile when they turn away from the pins and toward their companions, suggesting that smiling is not purely an expression of happiness but also a form of social communication.

sieneke toering – bowling big lebowski style – CC BY-NC-ND 2.0.

When the observations require a judgment on the part of the observers—as in Kraut and Johnston’s study—this process is often described as coding . Coding generally requires clearly defining a set of target behaviors. The observers then categorize participants individually in terms of which behavior they have engaged in and the number of times they engaged in each behavior. The observers might even record the duration of each behavior. The target behaviors must be defined in such a way that different observers code them in the same way. This is the issue of interrater reliability. Researchers are expected to demonstrate the interrater reliability of their coding procedure by having multiple raters code the same behaviors independently and then showing that the different observers are in close agreement. Kraut and Johnston, for example, video recorded a subset of their participants’ reactions and had two observers independently code them. The two observers showed that they agreed on the reactions that were exhibited 97% of the time, indicating good interrater reliability.

Archival Data

Another approach to correlational research is the use of archival data , which are data that have already been collected for some other purpose. An example is a study by Brett Pelham and his colleagues on “implicit egotism”—the tendency for people to prefer people, places, and things that are similar to themselves (Pelham, Carvallo, & Jones, 2005). In one study, they examined Social Security records to show that women with the names Virginia, Georgia, Louise, and Florence were especially likely to have moved to the states of Virginia, Georgia, Louisiana, and Florida, respectively.

As with naturalistic observation, measurement can be more or less straightforward when working with archival data. For example, counting the number of people named Virginia who live in various states based on Social Security records is relatively straightforward. But consider a study by Christopher Peterson and his colleagues on the relationship between optimism and health using data that had been collected many years before for a study on adult development (Peterson, Seligman, & Vaillant, 1988). In the 1940s, healthy male college students had completed an open-ended questionnaire about difficult wartime experiences. In the late 1980s, Peterson and his colleagues reviewed the men’s questionnaire responses to obtain a measure of explanatory style—their habitual ways of explaining bad events that happen to them. More pessimistic people tend to blame themselves and expect long-term negative consequences that affect many aspects of their lives, while more optimistic people tend to blame outside forces and expect limited negative consequences. To obtain a measure of explanatory style for each participant, the researchers used a procedure in which all negative events mentioned in the questionnaire responses, and any causal explanations for them, were identified and written on index cards. These were given to a separate group of raters who rated each explanation in terms of three separate dimensions of optimism-pessimism. These ratings were then averaged to produce an explanatory style score for each participant. The researchers then assessed the statistical relationship between the men’s explanatory style as college students and archival measures of their health at approximately 60 years of age. The primary result was that the more optimistic the men were as college students, the healthier they were as older men. Pearson’s r was +.25.

This is an example of content analysis —a family of systematic approaches to measurement using complex archival data. Just as naturalistic observation requires specifying the behaviors of interest and then noting them as they occur, content analysis requires specifying keywords, phrases, or ideas and then finding all occurrences of them in the data. These occurrences can then be counted, timed (e.g., the amount of time devoted to entertainment topics on the nightly news show), or analyzed in a variety of other ways.

Key Takeaways

  • Correlational research involves measuring two variables and assessing the relationship between them, with no manipulation of an independent variable.
  • Correlational research is not defined by where or how the data are collected. However, some approaches to data collection are strongly associated with correlational research. These include naturalistic observation (in which researchers observe people’s behavior in the context in which it normally occurs) and the use of archival data that were already collected for some other purpose.

Discussion: For each of the following, decide whether it is most likely that the study described is experimental or correlational and explain why.

  • An educational researcher compares the academic performance of students from the “rich” side of town with that of students from the “poor” side of town.
  • A cognitive psychologist compares the ability of people to recall words that they were instructed to “read” with their ability to recall words that they were instructed to “imagine.”
  • A manager studies the correlation between new employees’ college grade point averages and their first-year performance reports.
  • An automotive engineer installs different stick shifts in a new car prototype, each time asking several people to rate how comfortable the stick shift feels.
  • A food scientist studies the relationship between the temperature inside people’s refrigerators and the amount of bacteria on their food.
  • A social psychologist tells some research participants that they need to hurry over to the next building to complete a study. She tells others that they can take their time. Then she observes whether they stop to help a research assistant who is pretending to be hurt.

Kanner, A. D., Coyne, J. C., Schaefer, C., & Lazarus, R. S. (1981). Comparison of two modes of stress measurement: Daily hassles and uplifts versus major life events. Journal of Behavioral Medicine, 4 , 1–39.

Kraut, R. E., & Johnston, R. E. (1979). Social and emotional messages of smiling: An ethological approach. Journal of Personality and Social Psychology, 37 , 1539–1553.

Levine, R. V., & Norenzayan, A. (1999). The pace of life in 31 countries. Journal of Cross-Cultural Psychology, 30 , 178–205.

Pelham, B. W., Carvallo, M., & Jones, J. T. (2005). Implicit egotism. Current Directions in Psychological Science, 14 , 106–110.

Peterson, C., Seligman, M. E. P., & Vaillant, G. E. (1988). Pessimistic explanatory style is a risk factor for physical illness: A thirty-five year longitudinal study. Journal of Personality and Social Psychology, 55 , 23–27.

Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Examples

Correlation Hypothesis

hypothesis for correlational study example

Understanding the relationships between variables is pivotal in research. Correlation hypotheses delve into the degree of association between two or more variables. In this guide, delve into an array of correlation hypothesis examples that explore connections, followed by a step-by-step tutorial on crafting these thesis statement hypothesis effectively. Enhance your research prowess with valuable tips tailored to unravel the intricate world of correlations.

What is Correlation Hypothesis?

A correlation hypothesis is a statement that predicts a specific relationship between two or more variables based on the assumption that changes in one variable are associated with changes in another variable. It suggests that there is a correlation or statistical relationship between the variables, meaning that when one variable changes, the other variable is likely to change in a consistent manner.

What is an example of a Correlation Hypothesis Statement?

Example: “If the amount of exercise increases, then the level of physical fitness will also increase.”

In this example, the correlation hypothesis suggests that there is a positive correlation between the amount of exercise a person engages in and their level of physical fitness. As exercise increases, the hypothesis predicts that physical fitness will increase as well. This hypothesis can be tested by collecting data on exercise levels and physical fitness levels and analyzing the relationship between the two variables using statistical methods.

100 Correlation Hypothesis Statement Examples

Correlation Hypothesis Statement Examples

Size: 277 KB

Discover the intriguing world of correlation through a collection of examples that illustrate how variables can be linked in research. Explore diverse scenarios where changes in one variable may correspond to changes in another, forming the basis of correlation hypotheses. These real-world instances shed light on the essence of correlation analysis and its role in uncovering connections between different aspects of data.

  • Study Hours and Exam Scores : If students study more hours per week, then their exam scores will show a positive correlation, indicating that increased study time might lead to better performance.
  • Income and Education : If the level of education increases, then income levels will also rise, demonstrating a positive correlation between education attainment and earning potential.
  • Social Media Usage and Well-being : If individuals spend more time on social media platforms, then their self-reported well-being might exhibit a negative correlation, suggesting that excessive use could impact mental health.
  • Temperature and Ice Cream Sales : If temperatures rise, then the sales of ice cream might increase, displaying a positive correlation due to the weather’s influence on consumer behavior.
  • Physical Activity and Heart Rate : If the intensity of physical activity rises, then heart rate might increase, signifying a positive correlation between exercise intensity and heart rate.
  • Age and Reaction Time : If age increases, then reaction time might show a positive correlation, indicating that as people age, their reaction times might slow down.
  • Smoking and Lung Capacity : If the number of cigarettes smoked daily increases, then lung capacity might decrease, suggesting a negative correlation between smoking and respiratory health.
  • Stress and Sleep Quality : If stress levels elevate, then sleep quality might decline, reflecting a negative correlation between psychological stress and restorative sleep.
  • Rainfall and Crop Yield : If the amount of rainfall decreases, then crop yield might also decrease, illustrating a negative correlation between precipitation and agricultural productivity.
  • Screen Time and Academic Performance : If screen time usage increases among students, then academic performance might show a negative correlation, suggesting that excessive screen time could be detrimental to studies.
  • Exercise and Body Weight : If individuals engage in regular exercise, then their body weight might exhibit a negative correlation, implying that physical activity can contribute to weight management.
  • Income and Crime Rates : If income levels decrease in a neighborhood, then crime rates might show a positive correlation, indicating a potential link between socio-economic factors and crime.
  • Social Support and Mental Health : If the level of social support increases, then individuals’ mental health scores may exhibit a positive correlation, highlighting the potential positive impact of strong social networks on psychological well-being.
  • Study Time and GPA : If students spend more time studying, then their Grade Point Average (GPA) might display a positive correlation, suggesting that increased study efforts may lead to higher academic achievement.
  • Parental Involvement and Academic Success : If parents are more involved in their child’s education, then the child’s academic success may show a positive correlation, emphasizing the role of parental support in shaping student outcomes.
  • Alcohol Consumption and Reaction Time : If alcohol consumption increases, then reaction time might slow down, indicating a negative correlation between alcohol intake and cognitive performance.
  • Social Media Engagement and Loneliness : If time spent on social media platforms increases, then feelings of loneliness might show a positive correlation, suggesting a potential connection between excessive online interaction and emotional well-being.
  • Temperature and Insect Activity : If temperatures rise, then the activity of certain insects might increase, demonstrating a potential positive correlation between temperature and insect behavior.
  • Education Level and Voting Participation : If education levels rise, then voter participation rates may also increase, showcasing a positive correlation between education and civic engagement.
  • Work Commute Time and Job Satisfaction : If work commute time decreases, then job satisfaction might show a positive correlation, indicating that shorter commutes could contribute to higher job satisfaction.
  • Sleep Duration and Cognitive Performance : If sleep duration increases, then cognitive performance scores might also rise, suggesting a potential positive correlation between adequate sleep and cognitive functioning.
  • Healthcare Access and Mortality Rate : If access to healthcare services improves, then the mortality rate might decrease, highlighting a potential negative correlation between healthcare accessibility and mortality.
  • Exercise and Blood Pressure : If individuals engage in regular exercise, then their blood pressure levels might exhibit a negative correlation, indicating that physical activity can contribute to maintaining healthy blood pressure.
  • Social Media Use and Academic Distraction : If students spend more time on social media during study sessions, then their academic focus might show a negative correlation, suggesting that excessive online engagement can hinder concentration.
  • Age and Technological Adaptation : If age increases, then the speed of adapting to new technologies might exhibit a negative correlation, suggesting that younger individuals tend to adapt more quickly.
  • Temperature and Plant Growth : If temperatures rise, then the rate of plant growth might increase, indicating a potential positive correlation between temperature and biological processes.
  • Music Exposure and Mood : If individuals listen to upbeat music, then their reported mood might show a positive correlation, suggesting that music can influence emotional states.
  • Income and Healthcare Utilization : If income levels increase, then the frequency of healthcare utilization might decrease, suggesting a potential negative correlation between income and healthcare needs.
  • Distance and Communication Frequency : If physical distance between individuals increases, then their communication frequency might show a negative correlation, indicating that proximity tends to facilitate communication.
  • Study Group Attendance and Exam Scores : If students regularly attend study groups, then their exam scores might exhibit a positive correlation, suggesting that collaborative study efforts could enhance performance.
  • Temperature and Disease Transmission : If temperatures rise, then the transmission of certain diseases might increase, pointing to a potential positive correlation between temperature and disease spread.
  • Interest Rates and Consumer Spending : If interest rates decrease, then consumer spending might show a positive correlation, suggesting that lower interest rates encourage increased economic activity.
  • Digital Device Use and Eye Strain : If individuals spend more time on digital devices, then the occurrence of eye strain might show a positive correlation, suggesting that prolonged screen time can impact eye health.
  • Parental Education and Children’s Educational Attainment : If parents have higher levels of education, then their children’s educational attainment might display a positive correlation, highlighting the intergenerational impact of education.
  • Social Interaction and Happiness : If individuals engage in frequent social interactions, then their reported happiness levels might show a positive correlation, indicating that social connections contribute to well-being.
  • Temperature and Energy Consumption : If temperatures decrease, then energy consumption for heating might increase, suggesting a potential positive correlation between temperature and energy usage.
  • Physical Activity and Stress Reduction : If individuals engage in regular physical activity, then their reported stress levels might display a negative correlation, indicating that exercise can help alleviate stress.
  • Diet Quality and Chronic Diseases : If diet quality improves, then the prevalence of chronic diseases might decrease, suggesting a potential negative correlation between healthy eating habits and disease risk.
  • Social Media Use and Body Image Dissatisfaction : If time spent on social media increases, then feelings of body image dissatisfaction might show a positive correlation, suggesting that online platforms can influence self-perception.
  • Income and Access to Quality Education : If household income increases, then access to quality education for children might improve, suggesting a potential positive correlation between financial resources and educational opportunities.
  • Workplace Diversity and Innovation : If workplace diversity increases, then the rate of innovation might show a positive correlation, indicating that diverse teams often generate more creative solutions.
  • Physical Activity and Bone Density : If individuals engage in weight-bearing exercises, then their bone density might exhibit a positive correlation, suggesting that exercise contributes to bone health.
  • Screen Time and Attention Span : If screen time increases, then attention span might show a negative correlation, indicating that excessive screen exposure can impact sustained focus.
  • Social Support and Resilience : If individuals have strong social support networks, then their resilience levels might display a positive correlation, suggesting that social connections contribute to coping abilities.
  • Weather Conditions and Mood : If sunny weather persists, then individuals’ reported mood might exhibit a positive correlation, reflecting the potential impact of weather on emotional states.
  • Nutrition Education and Healthy Eating : If individuals receive nutrition education, then their consumption of fruits and vegetables might show a positive correlation, suggesting that knowledge influences dietary choices.
  • Physical Activity and Cognitive Aging : If adults engage in regular physical activity, then their cognitive decline with aging might show a slower rate, indicating a potential negative correlation between exercise and cognitive aging.
  • Air Quality and Respiratory Illnesses : If air quality deteriorates, then the incidence of respiratory illnesses might increase, suggesting a potential positive correlation between air pollutants and health impacts.
  • Reading Habits and Vocabulary Growth : If individuals read regularly, then their vocabulary size might exhibit a positive correlation, suggesting that reading contributes to language development.
  • Sleep Quality and Stress Levels : If sleep quality improves, then reported stress levels might display a negative correlation, indicating that sleep can impact psychological well-being.
  • Social Media Engagement and Academic Performance : If students spend more time on social media, then their academic performance might exhibit a negative correlation, suggesting that excessive online engagement can impact studies.
  • Exercise and Blood Sugar Levels : If individuals engage in regular exercise, then their blood sugar levels might display a negative correlation, indicating that physical activity can influence glucose regulation.
  • Screen Time and Sleep Duration : If screen time before bedtime increases, then sleep duration might show a negative correlation, suggesting that screen exposure can affect sleep patterns.
  • Environmental Pollution and Health Outcomes : If exposure to environmental pollutants increases, then the occurrence of health issues might show a positive correlation, suggesting that pollution can impact well-being.
  • Time Management and Academic Achievement : If students improve time management skills, then their academic achievement might exhibit a positive correlation, indicating that effective planning contributes to success.
  • Physical Fitness and Heart Health : If individuals improve their physical fitness, then their heart health indicators might display a positive correlation, indicating that exercise benefits cardiovascular well-being.
  • Weather Conditions and Outdoor Activities : If weather is sunny, then outdoor activities might show a positive correlation, suggesting that favorable weather encourages outdoor engagement.
  • Media Exposure and Body Image Perception : If exposure to media images increases, then body image dissatisfaction might show a positive correlation, indicating media’s potential influence on self-perception.
  • Community Engagement and Civic Participation : If individuals engage in community activities, then their civic participation might exhibit a positive correlation, indicating an active citizenry.
  • Social Media Use and Productivity : If individuals spend more time on social media, then their productivity levels might exhibit a negative correlation, suggesting that online distractions can affect work efficiency.
  • Income and Stress Levels : If income levels increase, then reported stress levels might exhibit a negative correlation, suggesting that financial stability can impact psychological well-being.
  • Social Media Use and Interpersonal Skills : If individuals spend more time on social media, then their interpersonal skills might show a negative correlation, indicating potential effects on face-to-face interactions.
  • Parental Involvement and Academic Motivation : If parents are more involved in their child’s education, then the child’s academic motivation may exhibit a positive correlation, highlighting the role of parental support.
  • Technology Use and Sleep Quality : If screen time increases before bedtime, then sleep quality might show a negative correlation, suggesting that technology use can impact sleep.
  • Outdoor Activity and Mood Enhancement : If individuals engage in outdoor activities, then their reported mood might display a positive correlation, suggesting the potential emotional benefits of nature exposure.
  • Income Inequality and Social Mobility : If income inequality increases, then social mobility might exhibit a negative correlation, suggesting that higher inequality can hinder upward mobility.
  • Vegetable Consumption and Heart Health : If individuals increase their vegetable consumption, then heart health indicators might show a positive correlation, indicating the potential benefits of a nutritious diet.
  • Online Learning and Academic Achievement : If students engage in online learning, then their academic achievement might display a positive correlation, highlighting the effectiveness of digital education.
  • Emotional Intelligence and Workplace Performance : If emotional intelligence improves, then workplace performance might exhibit a positive correlation, indicating the relevance of emotional skills.
  • Community Engagement and Mental Well-being : If individuals engage in community activities, then their reported mental well-being might show a positive correlation, emphasizing social connections’ impact.
  • Rainfall and Agriculture Productivity : If rainfall levels increase, then agricultural productivity might exhibit a positive correlation, indicating the importance of water for crops.
  • Social Media Use and Body Posture : If screen time increases, then poor body posture might show a positive correlation, suggesting that screen use can influence physical habits.
  • Marital Satisfaction and Relationship Length : If marital satisfaction decreases, then relationship length might show a negative correlation, indicating potential challenges over time.
  • Exercise and Anxiety Levels : If individuals engage in regular exercise, then reported anxiety levels might exhibit a negative correlation, indicating the potential benefits of physical activity on mental health.
  • Music Listening and Concentration : If individuals listen to instrumental music, then their concentration levels might display a positive correlation, suggesting music’s impact on focus.
  • Internet Usage and Attention Deficits : If screen time increases, then attention deficits might show a positive correlation, implying that excessive internet use can affect concentration.
  • Financial Literacy and Debt Levels : If financial literacy improves, then personal debt levels might exhibit a negative correlation, suggesting better financial decision-making.
  • Time Spent Outdoors and Vitamin D Levels : If time spent outdoors increases, then vitamin D levels might show a positive correlation, indicating sun exposure’s role in vitamin synthesis.
  • Family Meal Frequency and Nutrition : If families eat meals together frequently, then nutrition quality might display a positive correlation, emphasizing family dining’s impact on health.
  • Temperature and Allergy Symptoms : If temperatures rise, then allergy symptoms might increase, suggesting a potential positive correlation between temperature and allergen exposure.
  • Social Media Use and Academic Distraction : If students spend more time on social media, then their academic focus might exhibit a negative correlation, indicating that online engagement can hinder studies.
  • Financial Stress and Health Outcomes : If financial stress increases, then the occurrence of health issues might show a positive correlation, suggesting potential health impacts of economic strain.
  • Study Hours and Test Anxiety : If students study more hours, then test anxiety might show a negative correlation, suggesting that increased preparation can reduce anxiety.
  • Music Tempo and Exercise Intensity : If music tempo increases, then exercise intensity might display a positive correlation, indicating music’s potential to influence workout vigor.
  • Green Space Accessibility and Stress Reduction : If access to green spaces improves, then reported stress levels might exhibit a negative correlation, highlighting nature’s stress-reducing effects.
  • Parenting Style and Child Behavior : If authoritative parenting increases, then positive child behaviors might display a positive correlation, suggesting parenting’s influence on behavior.
  • Sleep Quality and Productivity : If sleep quality improves, then work productivity might show a positive correlation, emphasizing the connection between rest and efficiency.
  • Media Consumption and Political Beliefs : If media consumption increases, then alignment with specific political beliefs might exhibit a positive correlation, suggesting media’s influence on ideology.
  • Workplace Satisfaction and Employee Retention : If workplace satisfaction increases, then employee retention rates might show a positive correlation, indicating the link between job satisfaction and tenure.
  • Digital Device Use and Eye Discomfort : If screen time increases, then reported eye discomfort might show a positive correlation, indicating potential impacts of screen exposure.
  • Age and Adaptability to Technology : If age increases, then adaptability to new technologies might exhibit a negative correlation, indicating generational differences in tech adoption.
  • Physical Activity and Mental Health : If individuals engage in regular physical activity, then reported mental health scores might exhibit a positive correlation, showcasing exercise’s impact.
  • Video Gaming and Attention Span : If time spent on video games increases, then attention span might display a negative correlation, indicating potential effects on focus.
  • Social Media Use and Empathy Levels : If social media use increases, then reported empathy levels might show a negative correlation, suggesting possible effects on emotional understanding.
  • Reading Habits and Creativity : If individuals read diverse genres, then their creative thinking might exhibit a positive correlation, emphasizing reading’s cognitive benefits.
  • Weather Conditions and Outdoor Exercise : If weather is pleasant, then outdoor exercise might show a positive correlation, suggesting weather’s influence on physical activity.
  • Parental Involvement and Bullying Prevention : If parents are actively involved, then instances of bullying might exhibit a negative correlation, emphasizing parental impact on behavior.
  • Digital Device Use and Sleep Disruption : If screen time before bedtime increases, then sleep disruption might show a positive correlation, indicating technology’s influence on sleep.
  • Friendship Quality and Psychological Well-being : If friendship quality increases, then reported psychological well-being might show a positive correlation, highlighting social support’s impact.
  • Income and Environmental Consciousness : If income levels increase, then environmental consciousness might also rise, indicating potential links between affluence and sustainability awareness.

Correlational Hypothesis Interpretation Statement Examples

Explore the art of interpreting correlation hypotheses with these illustrative examples. Understand the implications of positive, negative, and zero correlations, and learn how to deduce meaningful insights from data relationships.

  • Relationship Between Exercise and Mood : A positive correlation between exercise frequency and mood scores suggests that increased physical activity might contribute to enhanced emotional well-being.
  • Association Between Screen Time and Sleep Quality : A negative correlation between screen time before bedtime and sleep quality indicates that higher screen exposure could lead to poorer sleep outcomes.
  • Connection Between Study Hours and Exam Performance : A positive correlation between study hours and exam scores implies that increased study time might correspond to better academic results.
  • Link Between Stress Levels and Meditation Practice : A negative correlation between stress levels and meditation frequency suggests that engaging in meditation could be associated with lower perceived stress.
  • Relationship Between Social Media Use and Loneliness : A positive correlation between social media engagement and feelings of loneliness implies that excessive online interaction might contribute to increased loneliness.
  • Association Between Income and Happiness : A positive correlation between income and self-reported happiness indicates that higher income levels might be linked to greater subjective well-being.
  • Connection Between Parental Involvement and Academic Performance : A positive correlation between parental involvement and students’ grades suggests that active parental engagement might contribute to better academic outcomes.
  • Link Between Time Management and Stress Levels : A negative correlation between effective time management and reported stress levels implies that better time management skills could lead to lower stress.
  • Relationship Between Outdoor Activities and Vitamin D Levels : A positive correlation between time spent outdoors and vitamin D levels suggests that increased outdoor engagement might be associated with higher vitamin D concentrations.
  • Association Between Water Consumption and Skin Hydration : A positive correlation between water intake and skin hydration indicates that higher fluid consumption might lead to improved skin moisture levels.

Alternative Correlational Hypothesis Statement Examples

Explore alternative scenarios and potential correlations in these examples. Learn to articulate different hypotheses that could explain data relationships beyond the conventional assumptions.

  • Alternative to Exercise and Mood : An alternative hypothesis could suggest a non-linear relationship between exercise and mood, indicating that moderate exercise might have the most positive impact on emotional well-being.
  • Alternative to Screen Time and Sleep Quality : An alternative hypothesis might propose that screen time has a curvilinear relationship with sleep quality, suggesting that moderate screen exposure leads to optimal sleep outcomes.
  • Alternative to Study Hours and Exam Performance : An alternative hypothesis could propose that there’s an interaction effect between study hours and study method, influencing the relationship between study time and exam scores.
  • Alternative to Stress Levels and Meditation Practice : An alternative hypothesis might consider that the relationship between stress levels and meditation practice is moderated by personality traits, resulting in varying effects.
  • Alternative to Social Media Use and Loneliness : An alternative hypothesis could posit that the relationship between social media use and loneliness depends on the quality of online interactions and content consumption.
  • Alternative to Income and Happiness : An alternative hypothesis might propose that the relationship between income and happiness differs based on cultural factors, leading to varying happiness levels at different income ranges.
  • Alternative to Parental Involvement and Academic Performance : An alternative hypothesis could suggest that the relationship between parental involvement and academic performance varies based on students’ learning styles and preferences.
  • Alternative to Time Management and Stress Levels : An alternative hypothesis might explore the possibility of a curvilinear relationship between time management and stress levels, indicating that extreme time management efforts might elevate stress.
  • Alternative to Outdoor Activities and Vitamin D Levels : An alternative hypothesis could consider that the relationship between outdoor activities and vitamin D levels is moderated by sunscreen usage, influencing vitamin synthesis.
  • Alternative to Water Consumption and Skin Hydration : An alternative hypothesis might propose that the relationship between water consumption and skin hydration is mediated by dietary factors, influencing fluid retention and skin health.

Correlational Hypothesis Pearson Interpretation Statement Examples

Discover how the Pearson correlation coefficient enhances your understanding of data relationships with these examples. Learn to interpret correlation strength and direction using this valuable statistical measure.

  • Strong Positive Correlation : A Pearson correlation coefficient of +0.85 between study time and exam scores indicates a strong positive relationship, suggesting that increased study time is strongly associated with higher grades.
  • Moderate Negative Correlation : A Pearson correlation coefficient of -0.45 between screen time and sleep quality reflects a moderate negative correlation, implying that higher screen exposure is moderately linked to poorer sleep outcomes.
  • Weak Positive Correlation : A Pearson correlation coefficient of +0.25 between social media use and loneliness suggests a weak positive correlation, indicating that increased online engagement is weakly related to higher loneliness.
  • Strong Negative Correlation : A Pearson correlation coefficient of -0.75 between stress levels and meditation practice indicates a strong negative relationship, implying that engaging in meditation is strongly associated with lower stress.
  • Moderate Positive Correlation : A Pearson correlation coefficient of +0.60 between income and happiness signifies a moderate positive correlation, suggesting that higher income is moderately linked to greater happiness.
  • Weak Negative Correlation : A Pearson correlation coefficient of -0.30 between parental involvement and academic performance represents a weak negative correlation, indicating that higher parental involvement is weakly associated with lower academic performance.
  • Strong Positive Correlation : A Pearson correlation coefficient of +0.80 between time management and stress levels reveals a strong positive relationship, suggesting that effective time management is strongly linked to lower stress.
  • Weak Negative Correlation : A Pearson correlation coefficient of -0.20 between outdoor activities and vitamin D levels signifies a weak negative correlation, implying that higher outdoor engagement is weakly related to lower vitamin D levels.
  • Moderate Positive Correlation : A Pearson correlation coefficient of +0.50 between water consumption and skin hydration denotes a moderate positive correlation, suggesting that increased fluid intake is moderately linked to better skin hydration.
  • Strong Negative Correlation : A Pearson correlation coefficient of -0.70 between screen time and attention span indicates a strong negative relationship, implying that higher screen exposure is strongly associated with shorter attention spans.

Correlational Hypothesis Statement Examples in Psychology

Explore how correlation hypotheses apply to psychological research with these examples. Understand how psychologists investigate relationships between variables to gain insights into human behavior.

  • Sleep Patterns and Cognitive Performance : There is a positive correlation between consistent sleep patterns and cognitive performance, suggesting that individuals with regular sleep schedules exhibit better cognitive functioning.
  • Anxiety Levels and Social Media Use : There is a positive correlation between anxiety levels and excessive social media use, indicating that individuals who spend more time on social media might experience higher anxiety.
  • Self-Esteem and Body Image Satisfaction : There is a negative correlation between self-esteem and body image satisfaction, implying that individuals with higher self-esteem tend to be more satisfied with their physical appearance.
  • Parenting Styles and Child Aggression : There is a negative correlation between authoritative parenting styles and child aggression, suggesting that children raised by authoritative parents might exhibit lower levels of aggression.
  • Emotional Intelligence and Conflict Resolution : There is a positive correlation between emotional intelligence and effective conflict resolution, indicating that individuals with higher emotional intelligence tend to resolve conflicts more successfully.
  • Personality Traits and Career Satisfaction : There is a positive correlation between certain personality traits (e.g., extraversion, openness) and career satisfaction, suggesting that individuals with specific traits experience higher job contentment.
  • Stress Levels and Coping Mechanisms : There is a negative correlation between stress levels and adaptive coping mechanisms, indicating that individuals with lower stress levels are more likely to employ effective coping strategies.
  • Attachment Styles and Romantic Relationship Quality : There is a positive correlation between secure attachment styles and higher romantic relationship quality, suggesting that individuals with secure attachments tend to have healthier relationships.
  • Social Support and Mental Health : There is a negative correlation between perceived social support and mental health issues, indicating that individuals with strong social support networks tend to experience fewer mental health challenges.
  • Motivation and Academic Achievement : There is a positive correlation between intrinsic motivation and academic achievement, implying that students who are internally motivated tend to perform better academically.

Does Correlational Research Have Hypothesis?

Correlational research involves examining the relationship between two or more variables to determine whether they are related and how they change together. While correlational studies do not establish causation, they still utilize hypotheses to formulate expectations about the relationships between variables. These good hypotheses predict the presence, direction, and strength of correlations. However, in correlational research, the focus is on measuring and analyzing the degree of association rather than establishing cause-and-effect relationships.

How Do You Write a Null-Hypothesis for a Correlational Study?

The null hypothesis in a correlational study states that there is no significant correlation between the variables being studied. It assumes that any observed correlation is due to chance and lacks meaningful association. When writing a null hypothesis for a correlational study, follow these steps:

  • Identify the Variables: Clearly define the variables you are studying and their relationship (e.g., “There is no significant correlation between X and Y”).
  • Specify the Population: Indicate the population from which the data is drawn (e.g., “In the population of [target population]…”).
  • Include the Direction of Correlation: If relevant, specify the direction of correlation (positive, negative, or zero) that you are testing (e.g., “…there is no significant positive/negative correlation…”).
  • State the Hypothesis: Write the null hypothesis as a clear statement that there is no significant correlation between the variables (e.g., “…there is no significant correlation between X and Y”).

What Is Correlation Hypothesis Formula?

The correlation hypothesis is often expressed in the form of a statement that predicts the presence and nature of a relationship between two variables. It typically follows the “If-Then” structure, indicating the expected change in one variable based on changes in another. The correlation hypothesis formula can be written as:

“If [Variable X] changes, then [Variable Y] will also change [in a specified direction] because [rationale for the expected correlation].”

For example, “If the amount of exercise increases, then mood scores will improve because physical activity has been linked to better emotional well-being.”

What Is a Correlational Hypothesis in Research Methodology?

A correlational hypothesis in research methodology is a testable hypothesis statement that predicts the presence and nature of a relationship between two or more variables. It forms the basis for conducting a correlational study, where the goal is to measure and analyze the degree of association between variables. Correlational hypotheses are essential in guiding the research process, collecting relevant data, and assessing whether the observed correlations are statistically significant.

How Do You Write a Hypothesis for Correlation? – A Step by Step Guide

Writing a hypothesis for correlation involves crafting a clear and testable statement about the expected relationship between variables. Here’s a step-by-step guide:

  • Identify Variables : Clearly define the variables you are studying and their nature (e.g., “There is a relationship between X and Y…”).
  • Specify Direction : Indicate the expected direction of correlation (positive, negative, or zero) based on your understanding of the variables and existing literature.
  • Formulate the If-Then Statement : Write an “If-Then” statement that predicts the change in one variable based on changes in the other variable (e.g., “If [Variable X] changes, then [Variable Y] will also change [in a specified direction]…”).
  • Provide Rationale : Explain why you expect the correlation to exist, referencing existing theories, research, or logical reasoning.
  • Quantitative Prediction (Optional) : If applicable, provide a quantitative prediction about the strength of the correlation (e.g., “…for every one unit increase in [Variable X], [Variable Y] is predicted to increase by [numerical value].”).
  • Specify Population : Indicate the population to which your hypothesis applies (e.g., “In a sample of [target population]…”).

Tips for Writing Correlational Hypothesis

  • Base on Existing Knowledge : Ground your hypothesis in existing literature, theories, or empirical evidence to ensure it’s well-informed.
  • Be Specific : Clearly define the variables and direction of correlation you’re predicting to avoid ambiguity.
  • Avoid Causation Claims : Remember that correlational hypotheses do not imply causation. Focus on predicting relationships, not causes.
  • Use Clear Language : Write in clear and concise language, avoiding jargon that may confuse readers.
  • Consider Alternative Explanations : Acknowledge potential confounding variables or alternative explanations that could affect the observed correlation.
  • Be Open to Results : Correlation results can be unexpected. Be prepared to interpret findings even if they don’t align with your initial hypothesis.
  • Test Statistically : Once you collect data, use appropriate statistical tests to determine if the observed correlation is statistically significant.
  • Revise as Needed : If your findings don’t support your hypothesis, revise it based on the data and insights gained.

Crafting a well-structured correlational hypothesis is crucial for guiding your research, conducting meaningful analysis, and contributing to the understanding of relationships between variables.

Twitter

AI Generator

Text prompt

  • Instructive
  • Professional

10 Examples of Public speaking

20 Examples of Gas lighting

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

1.9 - hypothesis test for the population correlation coefficient.

There is one more point we haven't stressed yet in our discussion about the correlation coefficient r and the coefficient of determination \(R^{2}\) — namely, the two measures summarize the strength of a linear relationship in samples only . If we obtained a different sample, we would obtain different correlations, different \(R^{2}\) values, and therefore potentially different conclusions. As always, we want to draw conclusions about populations , not just samples. To do so, we either have to conduct a hypothesis test or calculate a confidence interval. In this section, we learn how to conduct a hypothesis test for the population correlation coefficient \(\rho\) (the greek letter "rho").

In general, a researcher should use the hypothesis test for the population correlation \(\rho\) to learn of a linear association between two variables, when it isn't obvious which variable should be regarded as the response. Let's clarify this point with examples of two different research questions.

Consider evaluating whether or not a linear relationship exists between skin cancer mortality and latitude. We will see in Lesson 2 that we can perform either of the following tests:

  • t -test for testing \(H_{0} \colon \beta_{1}= 0\)
  • ANOVA F -test for testing \(H_{0} \colon \beta_{1}= 0\)

For this example, it is fairly obvious that latitude should be treated as the predictor variable and skin cancer mortality as the response.

By contrast, suppose we want to evaluate whether or not a linear relationship exists between a husband's age and his wife's age ( Husband and Wife data ). In this case, one could treat the husband's age as the response:

husband's age vs wife's age plot

...or one could treat the wife's age as the response:

wife's age vs husband's age plot

In cases such as these, we answer our research question concerning the existence of a linear relationship by using the t -test for testing the population correlation coefficient \(H_{0}\colon \rho = 0\).

Let's jump right to it! We follow standard hypothesis test procedures in conducting a hypothesis test for the population correlation coefficient \(\rho\).

Steps for Hypothesis Testing for \(\boldsymbol{\rho}\) Section  

Step 1: hypotheses.

First, we specify the null and alternative hypotheses:

  • Null hypothesis \(H_{0} \colon \rho = 0\)
  • Alternative hypothesis \(H_{A} \colon \rho ≠ 0\) or \(H_{A} \colon \rho < 0\) or \(H_{A} \colon \rho > 0\)

Step 2: Test Statistic

Second, we calculate the value of the test statistic using the following formula:

Test statistic:  \(t^*=\dfrac{r\sqrt{n-2}}{\sqrt{1-R^2}}\) 

Step 3: P-Value

Third, we use the resulting test statistic to calculate the P -value. As always, the P -value is the answer to the question "how likely is it that we’d get a test statistic t* as extreme as we did if the null hypothesis were true?" The P -value is determined by referring to a t- distribution with n -2 degrees of freedom.

Step 4: Decision

Finally, we make a decision:

  • If the P -value is smaller than the significance level \(\alpha\), we reject the null hypothesis in favor of the alternative. We conclude that "there is sufficient evidence at the\(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y."
  • If the P -value is larger than the significance level \(\alpha\), we fail to reject the null hypothesis. We conclude "there is not enough evidence at the  \(\alpha\) level to conclude that there is a linear relationship in the population between the predictor x and response y ."

Example 1-5: Husband and Wife Data Section  

Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test \(H_{0} \colon \rho = 0\) against the alternative \(H_{A} \colon \rho ≠ 0\), we obtain the following test statistic:

\begin{align} t^*&=\dfrac{r\sqrt{n-2}}{\sqrt{1-R^2}}\\ &=\dfrac{0.939\sqrt{170-2}}{\sqrt{1-0.939^2}}\\ &=35.39\end{align}

To obtain the P -value, we need to compare the test statistic to a t -distribution with 168 degrees of freedom (since 170 - 2 = 168). In particular, we need to find the probability that we'd observe a test statistic more extreme than 35.39, and then, since we're conducting a two-sided test, multiply the probability by 2. Minitab helps us out here:

Student's t distribution with 168 DF

The output tells us that the probability of getting a test-statistic smaller than 35.39 is greater than 0.999. Therefore, the probability of getting a test-statistic greater than 35.39 is less than 0.001. As illustrated in the following video, we multiply by 2 and determine that the P-value is less than 0.002.

Since the P -value is small — smaller than 0.05, say — we can reject the null hypothesis. There is sufficient statistical evidence at the \(\alpha = 0.05\) level to conclude that there is a significant linear relationship between a husband's age and his wife's age.

Incidentally, we can let statistical software like Minitab do all of the dirty work for us. In doing so, Minitab reports:

Correlation: WAge, HAge

Pearson correlation of WAge and HAge = 0.939

P-Value = 0.000

Final Note Section  

One final note ... as always, we should clarify when it is okay to use the t -test for testing \(H_{0} \colon \rho = 0\)? The guidelines are a straightforward extension of the "LINE" assumptions made for the simple linear regression model. It's okay:

  • When it is not obvious which variable is the response.
  • For each x , the y 's are normal with equal variances.
  • For each y , the x 's are normal with equal variances.
  • Either, y can be considered a linear function of x .
  • Or, x can be considered a linear function of y .
  • The ( x , y ) pairs are independent

logo-type-white

AP® Psychology

Correlational study examples: ap® psychology crash course.

  • The Albert Team
  • Last Updated On: March 1, 2022

Correlational Study Examples - AP® Psychology Crash Course

Do you remember what a correlational study is? Knowing the main types of psychology research is a key point for the Advanced Placement (AP) Psychology exam as it makes up for 8-10% of the content in the multiple choice and free response questions. However, understanding the characteristics, advantages and disadvantages of each research method is only half of mastering this subject. The other half is understanding in concrete and practical terms how the research methods have been applied to studies in different fields of psychology. In this AP® Psychology crash course review, we will see three correlational study examples that have contributed to the history of psychology, changing the way we perceive our nature, our personality, and our health.

Review: What is a Correlational Study and why is it Important?

Psychology is a science, and like any other, its knowledge must be scientifically obtained, verified and validated. For this, psychologists conduct three types of research:

  • Experimental research – the most empirical type of research, where variables can be manipulated in laboratory conditions and different situations can be studied and compared to establish relations of cause and effect between variables.
  • Clinical research – done through case studies under the premise that certain individual characteristics can be generalized to the rest of the population.
  • Correlational research – seeks the relationship between two variables. The necessary data is gathered through surveys (questionnaires and interviews), archival research (past studies that present the data) and naturalistic observation (observation of the phenomena as they naturally happen, without intervening). The data is then statistically analyzed to verify the relationship between the variables.

The correlation between the variables is shown through a value that goes from -1.00 to +1.00. This value is called the correlational coefficient . When the correlational coefficient is close to +1.00, there is a positive correlation between the variables. In other words, an increase in X accompanies an increase in Y. When the correlational coefficient is close to -1.00, there is a negative correlation between the variables or an increase in X is followed by a decrease in Y. And when the correlational coefficient is close to 0.00 there is no relationship between the variables. The closer the value is to +1.00 or -1.00, the strongest the relationship is. We will see real examples of this later on this post.

correlational coefficient - AP® Psychology

Now, the most important thing to remember about correlational studies is that correlation does not imply causation . For example, let’s say that “marriage” has a negative correlation with “cancer,” meaning that people who are married are less likely to develop cancer throughout their lives than those who remain single. This doesn’t necessarily mean that one causes the other or that marriage directly avoids cancer. Maybe one variable does cause the other, but even if it does, in correlational studies it is not possible to determine the direction of causation or what is causing what. And it could also be that a third unknown variable is what causes the correlation. Keep this in mind as we see the correlational study examples.

You might be wondering: if correlational studies only show this – correlations – why are they important in the first place if you could just conduct an experiment manipulating the relevant variables and getting to more solid conclusions?

Indeed, the disadvantages of correlational studies are that they cannot establish causal relationships nor direction of causal influence, there is no control of the variables, they don’t explain behavior, and they could result in illusory correlations. Illusory correlation is when there is a perceived relationship between variables that does not exist, like “a higher ice cream consumption leads to higher crime rate.”

On the other hand, one of the main advantages of a correlational study is that it is a useful way to describe and analyze data especially in cases where experimental research would lead to ethical issues. Take for instance a research that aims to investigate the relationship between child abuse and coping abilities later in adulthood. You obviously can’t take a random group of healthy children and expose them to abusive or traumatic situations to compare it with a control group. In the earlier stages of psychology, researchers could get away with teaching a phobia to a baby or leading participants to think they had electrocuted someone to death and get away with it in the name of science. Such practices are no longer acceptable, and correlational studies play an important role in developing knowledge in psychology.

Other advantages are that correlational studies are usually less expensive and easier to conduct than experiments and they allow for general predictions. They can also represent the first steps in a new field of research, leading to further studies and advances.

Now that you’ve reviewed the main concepts of correlational studies and why they matter, let’s see three important research examples in different fields of psychology and understand how all of this comes to life!

Study #1: Biological Basis of Behavior – A Debate on Nature Versus Nurture

We can easily think of how our genetics influence physical traits like height, hair and eye color. But have you ever considered that your genetics might also play a big role on psychological traits like personality and interests? In 1990 psychologists Thomas Bouchard, David Lykken, and their associates investigated the influence our genes have in psychological attributes. This was hard research to accept at the time considering that for the past fifty years, psychology was mainly focused on behaviorism and how the environment determines behavior. Bouchard and Lykken’s study brought the debate of nature versus nurture back to the spotlight, determined to clarify the genes’ and the environment’s role in who we are.

For this, Bouchard and Lykken conducted a study with monozygotic twins (identical twins) who had been separated at birth and raised in different environments and compared the results with identical twins who had been raised together. Note that this is a study in which one couldn’t simply replicate the situation in laboratory conditions, so a correlational study was the best way to analyze the data of real individuals in this situation.

monozygotic twins - AP® Psychology

Bouchard and Lykken gathered a huge amount of data from each pair of twins. They used a variety of personality trait scales, aptitude and occupational interest inventories, intelligence tests, family environment scales and interviews. At the end of the first part of the research, Bouchard and Lykken had information concerning the twins’ physiological traits, intelligence, personality, psychological interests and social attitudes. Next, Bouchard and Lykken analyzed the correlation between the twins in all these fields.

The results were surprising. If the environment were responsible for individual differences, identical twins reared together should be more similar than identical twins reared apart. However, that was not what the results showed. Both categories of twins had a very similar correlational coefficient that neared +1.00. This means that regardless of having being raised in the same or different environments, each person was very similar to his twin in all traits.

Based on this we can say that genetic factors strongly influence human behavior in a variety of ways, both physiological and psychological. This could be seen as a problematic conclusion since we like to put so much importance on environmental factors like education and parenting as if that alone determined who we grow to be, what interests we develop, what careers we choose and so on. However, it is not the case for giving up on all our efforts in life thinking that eventually the genes will just take over and determine our fate.

Bouchard and Lykken emphasize that although intelligence is mainly determined by genetic factors, it can still be enhanced by experiences. Approximately 70% of intelligence is genetically determined, which means there is still 30% that can be worked on or ignored in the environment, either at home with parents or at school with teachers and mentors.

The same can be applied to the other traits. For example, even if your genes hold a natural strength towards communication skills, none of it will matter if you don’t get an opportunity in your environment to make that skill emerge and develop. Recent research on identical twins shows that the older the twins, the more similar they are. Another way to say this is that the more experiences you have, the more your genes can be expressed.

As human beings, we are determined by a combination of genetic and environmental influences. We are nature and nurture. Genes don’t mean destiny, but that doesn’t mean we can ignore their influences on our physiological and psychological characteristics. Let’s truly understand the components of our behavior and overcome the genes versus environment dichotomy.

Study #2: Personality – Who is in Control of Your Life?

Do you think your actions are what matter the most for the outcome of your life? Or do you think that external forces like fate and luck have a major influence in the paths you take? This kind of personal belief, called  locus of control , is associated with all sorts of behaviors we show in different areas of life. The locus of control and its influence on behavior was first studied by the influential psychologist and behaviorist  Julian Rotter in 1966.

Rotter proposed that the way individuals interpret what happens to them and where they put the responsibility for the events in their lives is an important part of the personality that can be used to predict tendencies in certain behaviors. When a person attributes the consequences of their behavior to factors such as luck, fate, and other greater forces, this person believes in an external locus of control . On the other hand, a person that identifies the consequences of her behavior to her own actions believes in an  internal locus of control .

To measure locus of control, Rotter developed a scale called I-E Scale, where “I” stands for “Internal” and “E” for “External.” The scale contains many pairs of statements, and the participant must choose the one that best fits his beliefs. A few examples of the pair of statements are “Many of the unhappy things in people’s lives are partly due to bad luck” versus “People’s misfortunes result from the mistakes they make,” and “Becoming a success is a matter of hard work; luck has little or nothing to do with it” versus “Getting a good job depends mainly on being in the right place at the right time.”

After measuring the locus of control of a relevant quantity of participants, Rotter analyzed the correlation between internal or external locus of control and behaviors such as gambling, persuasion, smoking and achievement motivation. His findings demonstrated that:

• External individuals are more likely to gamble on risky bets while internal individuals prefer “sure things” and moderate odds on the long run.

• Internal individuals are more efficient on persuading peers to change their attitudes and more resistant to manipulation than external individuals.

• Because an internal locus of control is related to self-control, smokers tend to be significantly more external oriented. Those who successfully quit smoking are more internally oriented.

• Internal individuals are more motivated to achieve success than those who believe their lives are ruled by forces outside of their control. Examples of achievements included plans to attend college and time spent on homework.

So translating into terms of correlational studies, there was, for example, a strong correlation between “internal locus of control” and “achievement motivation,” as the correlation coefficient between these two variables neared +1.00.

Furthermore, Rotter identified three sources for the development of an external or internal locus of control: cultural differences, socioeconomic differences, and parenting style. In conclusion, Rotter proposed that locus of control is an important component of personality that explains the differences in behavior between two people who are faced with the same situation. This belief determines the way we interpret the consequences of our behavior and influences the actions we take in our lives.

Study #3: Motivation and Emotion – The Effects of Stress on Our Health

Effects of Stress - AP® Psychology

Nowadays it’s almost common sense that stress has an impact on our health, but this was not always an easily accepted idea. In 1967, Thomas Holmes and Richard Rahe  studied the correlation between stress and illness. This was a psychosomatic  research because it studied the connection between psychological factors and physical problems.

Since it wouldn’t be ethical to put people under stressful situations to study whether or not they developed more health problems than a comfortable control group, this research was made using the correlational method. First, Holmes and Rahe designed a scale to measure stress in a variety of life situations, which included both happy and unhappy events, like Christmas and death of a spouse. This was because, according to Holmes and Rahe, stress happens in any situation where there is a need for psychological readjustment. This scale was called the Social Readjustment Rating Scale (SRRS). After having a huge amount of participants answer the scale, Holmes and Rahe studied the correlations between high levels of stress and illnesses.

As you may have already predicted, a strong positive correlation between stress and illness was found. The participants who had had a low level of stress in the past six months reported an average of 1.4 illnesses for the same period. A medium level of stress had an average of 1.9 illnesses and a high level of stress, 2.1 illnesses.

However, we also know that stress is only one component that influences health, and the connection between stress and illness is way more complex than a correlational study can show. Aware of that, Holmes and Rahe cited other factors that must be taken into consideration to help predict psychosomatic problems. They are:

• Your experience with stressful events

• Your coping skills

• The strength of your immune system

• Your way of dealing with health problems when they occur

Psychologists and doctors now recognize that the vast majority of illnesses are influenced by psychological factors, either at their development or in the way they are treated. This puts an end to Descartes’ classical view of split mind and body. Humans are complex beings, who must be understood and treated in their wholeness for an efficient prevention of illness and promotion of health.

So what do you think of each of these correlational study examples? They are in different areas of psychology (Biological Bases of Behavior, Personality, and Motivation and Emotion), so you can encounter this type of research in many questions of the AP® Psychology exam. How do you understand the influence of genetics on your behavior? Is your locus of control more internal or external? What examples of psychosomatic problems have you seen in your day to day experience? Share in the comments below!

Let’s put everything into practice. Try this AP® Psychology practice question:

Types of Research Methods AP® Psychology Practice Question

Looking for more AP® Psychology practice?

Check out our other articles on  AP® Psychology .

You can also find thousands of practice questions on Albert.io. Albert.io lets you customize your learning experience to target practice where you need the most help. We’ll give you challenging practice questions to help you achieve mastery of AP Psychology.

Start practicing here .

Are you a teacher or administrator interested in boosting AP® Psychology student outcomes?

Learn more about our school licenses here .

Interested in a school license?​

Popular posts.

AP® Physics I score calculator

AP® Score Calculators

Simulate how different MCQ and FRQ scores translate into AP® scores

hypothesis for correlational study example

AP® Review Guides

The ultimate review guides for AP® subjects to help you plan and structure your prep.

hypothesis for correlational study example

Core Subject Review Guides

Review the most important topics in Physics and Algebra 1 .

hypothesis for correlational study example

SAT® Score Calculator

See how scores on each section impacts your overall SAT® score

hypothesis for correlational study example

ACT® Score Calculator

See how scores on each section impacts your overall ACT® score

hypothesis for correlational study example

Grammar Review Hub

Comprehensive review of grammar skills

hypothesis for correlational study example

AP® Posters

Download updated posters summarizing the main topics and structure for each AP® exam.

6.2 Correlational Research

Learning objectives.

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of non-experimental research.
  • Interpret the strength and direction of different correlation coefficients.
  • Explain why correlation does not imply causation.

What Is Correlational Research?

Correlational research is a type of non-experimental research in which the researcher measures two variables and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one or are not interested in causal relationships. Recall two goals of science are to describe and to predict and the correlational research strategy allows researchers to achieve both of these goals. Specifically, this strategy can be used to describe the strength and direction of the relationship between two variables and if there is a relationship between the variables then the researchers can use scores on one variable to predict scores on the other (using a statistical technique called regression).

Another reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher  cannot  manipulate the independent variable because it is impossible, impractical, or unethical. For example, while I might be interested in the relationship between the frequency people use cannabis and their memory abilities I cannot ethically manipulate the frequency that people use cannabis. As such, I must rely on the correlational research strategy; I must simply measure the frequency that people use cannabis and measure their memory abilities using a standardized test of memory and then determine whether the frequency people use cannabis use is statistically related to memory test performance. 

Correlation is also used to establish the reliability and validity of measurements. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms  independent variable  and dependent variabl e  do not apply to this kind of research.

Another strength of correlational research is that it is often higher in external validity than experimental research. Recall there is typically a trade-off between internal validity and external validity. As greater controls are added to experiments, internal validity is increased but often at the expense of external validity. In contrast, correlational studies typically have low internal validity because nothing is manipulated or control but they often have high external validity. Since nothing is manipulated or controlled by the experimenter the results are more likely to reflect relationships that exist in the real world.

Finally, extending upon this trade-off between internal and external validity, correlational research can help to provide converging evidence for a theory. If a theory is supported by a true experiment that is high in internal validity as well as by a correlational study that is high in external validity then the researchers can have more confidence in the validity of their theory. As a concrete example, correlational studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001) [1] .  These converging results provide strong evidence that there is a real relationship (indeed a causal relationship) between watching violent television and aggressive behavior.

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. 

Correlations Between Quantitative Variables

Correlations between quantitative variables are often presented using scatterplots . Figure 6.3 shows some hypothetical data on the relationship between the amount of stress people are under and the number of physical symptoms they have. Each point in the scatterplot represents one person’s score on both variables. For example, the circled point in Figure 6.3 represents a person whose stress score was 10 and who had three physical symptoms. Taking all the points into account, one can see that people under more stress tend to have more physical symptoms. This is a good example of a positive relationship , in which higher scores on one variable tend to be associated with higher scores on the other. A  negative relationship  is one in which higher scores on one variable tend to be associated with lower scores on the other. There is a negative relationship between stress and immune system functioning, for example, because higher stress is associated with lower immune system functioning.

Figure 2.2 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms

Figure 6.3 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms. The circled point represents a person whose stress score was 10 and who had three physical symptoms. Pearson’s r for these data is +.51.

The strength of a correlation between quantitative variables is typically measured using a statistic called  Pearson’s Correlation Coefficient (or Pearson’s  r ) . As Figure 6.4 shows, Pearson’s r ranges from −1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). A value of 0 means there is no relationship between the two variables. When Pearson’s  r  is 0, the points on a scatterplot form a shapeless “cloud.” As its value moves toward −1.00 or +1.00, the points come closer and closer to falling on a single straight line. Correlation coefficients near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s  r  is unrelated to its strength. Pearson’s  r  values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. With the exception of reliability coefficients, most correlations that we find in Psychology are small or moderate in size. The website http://rpsychologist.com/d3/correlation/ , created by Kristoffer Magnusson, provides an excellent interactive visualization of correlations that permits you to adjust the strength and direction of a correlation while witnessing the corresponding changes to the scatterplot.

Figure 2.3 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

Figure 6.4 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

There are two common situations in which the value of Pearson’s  r  can be misleading. Pearson’s  r  is a good measure only for linear relationships, in which the points are best approximated by a straight line. It is not a good measure for nonlinear relationships, in which the points are better approximated by a curved line. Figure 6.5, for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best approximates the points is a curve—a kind of upside-down “U”—because people who get about eight hours of sleep tend to be the least depressed. Those who get too little sleep and those who get too much sleep tend to be more depressed. Even though Figure 6.5 shows a fairly strong relationship between depression and sleep, Pearson’s  r  would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s  r . Nonlinear relationships are fairly common in psychology, but measuring their strength is beyond the scope of this book.

Figure 2.4 Hypothetical Nonlinear Relationship Between Sleep and Depression

Figure 6.5 Hypothetical Nonlinear Relationship Between Sleep and Depression

The other common situations in which the value of Pearson’s  r  can be misleading is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as  restriction of range . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 6.6. Pearson’s  r  here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 6.6—then the relationship would seem to be quite weak. In fact, Pearson’s  r  for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s  r  in light of it. (There are also statistical methods to correct Pearson’s  r  for restriction of range, but they are beyond the scope of this book).

Figure 12.10 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

Figure 6.6 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range.The overall correlation here is −.77, but the correlation for the 18- to 24-year-olds (in the blue box) is 0.

Correlation Does Not Imply Causation

You have probably heard repeatedly that “Correlation does not imply causation.” An amusing example of this comes from a 2012 study that showed a positive correlation (Pearson’s r = 0.79) between the per capita chocolate consumption of a nation and the number of Nobel prizes awarded to citizens of that nation [2] . It seems clear, however, that this does not mean that eating chocolate causes people to win Nobel prizes, and it would not make sense to try to increase the number of Nobel prizes won by recommending that parents feed their children more chocolate.

There are two reasons that correlation does not imply causation. The first is called the  directionality problem . Two variables,  X  and  Y , can be statistically related because X  causes  Y  or because  Y  causes  X . Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the  third-variable problem . Two variables,  X  and  Y , can be statistically related not because  X  causes  Y , or because  Y  causes  X , but because some third variable,  Z , causes both  X  and  Y . For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as  spurious correlations.

Some excellent and funny examples of spurious correlations can be found at http://www.tylervigen.com  (Figure 6.7  provides one such example).

Figure 2.5 Example of a Spurious Correlation Source: http://tylervigen.com/spurious-correlations (CC-BY 4.0)

“Lots of Candy Could Lead to Violence”

Although researchers in psychology know that correlation does not imply causation, many journalists do not. One website about correlation and causation, http://jonathan.mueller.faculty.noctrl.edu/100/correlation_or_causation.htm , links to dozens of media reports about real biomedical and psychological research. Many of the headlines suggest that a causal relationship has been demonstrated when a careful reading of the articles shows that it has not because of the directionality and third-variable problems.

One such article is about a study showing that children who ate candy every day were more likely than other children to be arrested for a violent offense later in life. But could candy really “lead to” violence, as the headline suggests? What alternative explanations can you think of for this statistical relationship? How could the headline be rewritten so that it is not misleading?

As you have learned by reading this book, there are various ways that researchers address the directionality and third-variable problems. The most effective is to conduct an experiment. For example, instead of simply measuring how much people exercise, a researcher could bring people into a laboratory and randomly assign half of them to run on a treadmill for 15 minutes and the rest to sit on a couch for 15 minutes. Although this seems like a minor change to the research design, it is extremely important. Now if the exercisers end up in more positive moods than those who did not exercise, it cannot be because their moods affected how much they exercised (because it was the researcher who determined how much they exercised). Likewise, it cannot be because some third variable (e.g., physical health) affected both how much they exercised and what mood they were in (because, again, it was the researcher who determined how much they exercised). Thus experiments eliminate the directionality and third-variable problems and allow researchers to draw firm conclusions about causal relationships.

Key Takeaways

  • Correlational research involves measuring two variables and assessing the relationship between them, with no manipulation of an independent variable.
  • Correlation does not imply causation. A statistical relationship between two variables,  X  and  Y , does not necessarily mean that  X  causes  Y . It is also possible that  Y  causes  X , or that a third variable,  Z , causes both  X  and  Y .
  • While correlational research cannot be used to establish causal relationships between variables, correlational research does allow researchers to achieve many other important objectives (establishing reliability and validity, providing converging evidence, describing relationships and making predictions)
  • Correlation coefficients can range from -1 to +1. The sign indicates the direction of the relationship between the variables and the numerical value indicates the strength of the relationship.
  • A cognitive psychologist compares the ability of people to recall words that they were instructed to “read” with their ability to recall words that they were instructed to “imagine.”
  • A manager studies the correlation between new employees’ college grade point averages and their first-year performance reports.
  • An automotive engineer installs different stick shifts in a new car prototype, each time asking several people to rate how comfortable the stick shift feels.
  • A food scientist studies the relationship between the temperature inside people’s refrigerators and the amount of bacteria on their food.
  • A social psychologist tells some research participants that they need to hurry over to the next building to complete a study. She tells others that they can take their time. Then she observes whether they stop to help a research assistant who is pretending to be hurt.

2. Practice: For each of the following statistical relationships, decide whether the directionality problem is present and think of at least one plausible third variable.

  • People who eat more lobster tend to live longer.
  • People who exercise more tend to weigh less.
  • College students who drink more alcohol tend to have poorer grades.
  • Bushman, B. J., & Huesmann, L. R. (2001). Effects of televised violence on aggression. In D. Singer & J. Singer (Eds.), Handbook of children and the media (pp. 223–254). Thousand Oaks, CA: Sage. ↵
  • Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367 , 1562-1564. ↵

Creative Commons License

Share This Book

  • Increase Font Size
  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

Correlation Studies in Psychology Research

Determining the relationship between two or more variables.

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

hypothesis for correlational study example

Emily is a board-certified science editor who has worked with top digital publishing brands like Voices for Biodiversity, Study.com, GoodTherapy, Vox, and Verywell.

hypothesis for correlational study example

Verywell / Brianna Gilmartin

  • Characteristics

Potential Pitfalls

Frequently asked questions.

A correlational study is a type of research design that looks at the relationships between two or more variables. Correlational studies are non-experimental, which means that the experimenter does not manipulate or control any of the variables.

A correlation refers to a relationship between two variables. Correlations can be strong or weak and positive or negative. Sometimes, there is no correlation.

There are three possible outcomes of a correlation study: a positive correlation, a negative correlation, or no correlation. Researchers can present the results using a numerical value called the correlation coefficient, a measure of the correlation strength. It can range from –1.00 (negative) to +1.00 (positive). A correlation coefficient of 0 indicates no correlation.

  • Positive correlations : Both variables increase or decrease at the same time. A correlation coefficient close to +1.00 indicates a strong positive correlation.
  • Negative correlations : As the amount of one variable increases, the other decreases (and vice versa). A correlation coefficient close to -1.00 indicates a strong negative correlation.
  • No correlation : There is no relationship between the two variables. A correlation coefficient of 0 indicates no correlation.

Characteristics of a Correlational Study

Correlational studies are often used in psychology, as well as other fields like medicine. Correlational research is a preliminary way to gather information about a topic. The method is also useful if researchers are unable to perform an experiment.

Researchers use correlations to see if a relationship between two or more variables exists, but the variables themselves are not under the control of the researchers.

While correlational research can demonstrate a relationship between variables, it cannot prove that changing one variable will change another. In other words, correlational studies cannot prove cause-and-effect relationships.

When you encounter research that refers to a "link" or an "association" between two things, they are most likely talking about a correlational study.

Types of Correlational Research

There are three types of correlational research: naturalistic observation, the survey method, and archival research. Each type has its own purpose, as well as its pros and cons.

Naturalistic Observation

The naturalistic observation method involves observing and recording variables of interest in a natural setting without interference or manipulation.  

Can inspire ideas for further research

Option if lab experiment not available

Variables are viewed in natural setting

Can be time-consuming and expensive

Extraneous variables can't be controlled

No scientific control of variables

Subjects might behave differently if aware of being observed

This method is well-suited to studies where researchers want to see how variables behave in their natural setting or state.   Inspiration can then be drawn from the observations to inform future avenues of research.

In some cases, it might be the only method available to researchers; for example, if lab experimentation would be precluded by access, resources, or ethics. It might be preferable to not being able to conduct research at all, but the method can be costly and usually takes a lot of time.  

Naturalistic observation presents several challenges for researchers. For one, it does not allow them to control or influence the variables in any way nor can they change any possible external variables.

However, this does not mean that researchers will get reliable data from watching the variables, or that the information they gather will be free from bias.

For example, study subjects might act differently if they know that they are being watched. The researchers might not be aware that the behavior that they are observing is not necessarily the subject's natural state (i.e., how they would act if they did not know they were being watched).

Researchers also need to be aware of their biases, which can affect the observation and interpretation of a subject's behavior.  

Surveys and questionnaires are some of the most common methods used for psychological research. The survey method involves having a  random sample  of participants complete a survey, test, or questionnaire related to the variables of interest.   Random sampling is vital to the generalizability of a survey's results.

Cheap, easy, and fast

Can collect large amounts of data in a short amount of time

Results can be affected by poor survey questions

Results can be affected by unrepresentative sample

Outcomes can be affected by participants

If researchers need to gather a large amount of data in a short period of time, a survey is likely to be the fastest, easiest, and cheapest option.  

It's also a flexible method because it lets researchers create data-gathering tools that will help ensure they get the information they need (survey responses) from all the sources they want to use (a random sample of participants taking the survey).

Survey data might be cost-efficient and easy to get, but it has its downsides. For one, the data is not always reliable—particularly if the survey questions are poorly written or the overall design or delivery is weak.   Data is also affected by specific faults, such as unrepresented or underrepresented samples .

The use of surveys relies on participants to provide useful data. Researchers need to be aware of the specific factors related to the people taking the survey that will affect its outcome.

For example, some people might struggle to understand the questions. A person might answer a particular way to try to please the researchers or to try to control how the researchers perceive them (such as trying to make themselves "look better").

Sometimes, respondents might not even realize that their answers are incorrect or misleading because of mistaken memories .

Archival Research

Many areas of psychological research benefit from analyzing studies that were conducted long ago by other researchers, as well as reviewing historical records and case studies.

For example, in an experiment known as  "The Irritable Heart ," researchers used digitalized records containing information on American Civil War veterans to learn more about post-traumatic stress disorder (PTSD).

Large amount of data

Can be less expensive

Researchers cannot change participant behavior

Can be unreliable

Information might be missing

No control over data collection methods

Using records, databases, and libraries that are publicly accessible or accessible through their institution can help researchers who might not have a lot of money to support their research efforts.

Free and low-cost resources are available to researchers at all levels through academic institutions, museums, and data repositories around the world.

Another potential benefit is that these sources often provide an enormous amount of data that was collected over a very long period of time, which can give researchers a way to view trends, relationships, and outcomes related to their research.

While the inability to change variables can be a disadvantage of some methods, it can be a benefit of archival research. That said, using historical records or information that was collected a long time ago also presents challenges. For one, important information might be missing or incomplete and some aspects of older studies might not be useful to researchers in a modern context.

A primary issue with archival research is reliability. When reviewing old research, little information might be available about who conducted the research, how a study was designed, who participated in the research, as well as how data was collected and interpreted.

Researchers can also be presented with ethical quandaries—for example, should modern researchers use data from studies that were conducted unethically or with questionable ethics?

You've probably heard the phrase, "correlation does not equal causation." This means that while correlational research can suggest that there is a relationship between two variables, it cannot prove that one variable will change another.

For example, researchers might perform a correlational study that suggests there is a relationship between academic success and a person's self-esteem. However, the study cannot show that academic success changes a person's self-esteem.

To determine why the relationship exists, researchers would need to consider and experiment with other variables, such as the subject's social relationships, cognitive abilities, personality, and socioeconomic status.

The difference between a correlational study and an experimental study involves the manipulation of variables. Researchers do not manipulate variables in a correlational study, but they do control and systematically vary the independent variables in an experimental study. Correlational studies allow researchers to detect the presence and strength of a relationship between variables, while experimental studies allow researchers to look for cause and effect relationships.

If the study involves the systematic manipulation of the levels of a variable, it is an experimental study. If researchers are measuring what is already present without actually changing the variables, then is a correlational study.

The variables in a correlational study are what the researcher measures. Once measured, researchers can then use statistical analysis to determine the existence, strength, and direction of the relationship. However, while correlational studies can say that variable X and variable Y have a relationship, it does not mean that X causes Y.

The goal of correlational research is often to look for relationships, describe these relationships, and then make predictions. Such research can also often serve as a jumping off point for future experimental research. 

Heath W. Psychology Research Methods . Cambridge University Press; 2018:134-156.

Schneider FW. Applied Social Psychology . 2nd ed. SAGE; 2012:50-53.

Curtis EA, Comiskey C, Dempsey O. Importance and use of correlational research .  Nurse Researcher . 2016;23(6):20-25. doi:10.7748/nr.2016.e1382

Carpenter S. Visualizing Psychology . 3rd ed. John Wiley & Sons; 2012:14-30.

Pizarro J, Silver RC, Prause J. Physical and mental health costs of traumatic war experiences among civil war veterans .  Arch Gen Psychiatry . 2006;63(2):193. doi:10.1001/archpsyc.63.2.193

Post SG. The echo of Nuremberg: Nazi data and ethics .  J Med Ethics . 1991;17(1):42-44. doi:10.1136/jme.17.1.42

Lau F. Chapter 12 Methods for Correlational Studies . In: Lau F, Kuziemsky C, eds. Handbook of eHealth Evaluation: An Evidence-based Approach . University of Victoria.

Akoglu H. User's guide to correlation coefficients .  Turk J Emerg Med . 2018;18(3):91-93. doi:10.1016/j.tjem.2018.08.001

Price PC. Research Methods in Psychology . California State University.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Logo for Kwantlen Polytechnic University

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Non-Experimental Research

29 Correlational Research

Learning objectives.

  • Define correlational research and give several examples.
  • Explain why a researcher might choose to conduct correlational research rather than experimental research or another type of non-experimental research.
  • Interpret the strength and direction of different correlation coefficients.
  • Explain why correlation does not imply causation.

What Is Correlational Research?

Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical relationships between variables would choose to conduct a correlational study rather than an experiment. The first is that they do not believe that the statistical relationship is a causal one or are not interested in causal relationships. Recall two goals of science are to describe and to predict and the correlational research strategy allows researchers to achieve both of these goals. Specifically, this strategy can be used to describe the strength and direction of the relationship between two variables and if there is a relationship between the variables then the researchers can use scores on one variable to predict scores on the other (using a statistical technique called regression, which is discussed further in the section on Complex Correlation in this chapter).

Another reason that researchers would choose to use a correlational study rather than an experiment is that the statistical relationship of interest is thought to be causal, but the researcher  cannot manipulate the independent variable because it is impossible, impractical, or unethical. For example, while a researcher might be interested in the relationship between the frequency people use cannabis and their memory abilities they cannot ethically manipulate the frequency that people use cannabis. As such, they must rely on the correlational research strategy; they must simply measure the frequency that people use cannabis and measure their memory abilities using a standardized test of memory and then determine whether the frequency people use cannabis is statistically related to memory test performance. 

Correlation is also used to establish the reliability and validity of measurements. For example, a researcher might evaluate the validity of a brief extraversion test by administering it to a large group of participants along with a longer extraversion test that has already been shown to be valid. This researcher might then check to see whether participants’ scores on the brief test are strongly correlated with their scores on the longer one. Neither test score is thought to cause the other, so there is no independent variable to manipulate. In fact, the terms  independent variable  and dependent variabl e  do not apply to this kind of research.

Another strength of correlational research is that it is often higher in external validity than experimental research. Recall there is typically a trade-off between internal validity and external validity. As greater controls are added to experiments, internal validity is increased but often at the expense of external validity as artificial conditions are introduced that do not exist in reality. In contrast, correlational studies typically have low internal validity because nothing is manipulated or controlled but they often have high external validity. Since nothing is manipulated or controlled by the experimenter the results are more likely to reflect relationships that exist in the real world.

Finally, extending upon this trade-off between internal and external validity, correlational research can help to provide converging evidence for a theory. If a theory is supported by a true experiment that is high in internal validity as well as by a correlational study that is high in external validity then the researchers can have more confidence in the validity of their theory. As a concrete example, correlational studies establishing that there is a relationship between watching violent television and aggressive behavior have been complemented by experimental studies confirming that the relationship is a causal one (Bushman & Huesmann, 2001) [1] .

Does Correlational Research Always Involve Quantitative Variables?

A common misconception among beginning researchers is that correlational research must involve two quantitative variables, such as scores on two extraversion tests or the number of daily hassles and number of symptoms people have experienced. However, the defining feature of correlational research is that the two variables are measured—neither one is manipulated—and this is true regardless of whether the variables are quantitative or categorical. Imagine, for example, that a researcher administers the Rosenberg Self-Esteem Scale to 50 American college students and 50 Japanese college students. Although this “feels” like a between-subjects experiment, it is a correlational study because the researcher did not manipulate the students’ nationalities. The same is true of the study by Cacioppo and Petty comparing college faculty and factory workers in terms of their need for cognition. It is a correlational study because the researchers did not manipulate the participants’ occupations.

Figure 6.2 shows data from a hypothetical study on the relationship between whether people make a daily list of things to do (a “to-do list”) and stress. Notice that it is unclear whether this is an experiment or a correlational study because it is unclear whether the independent variable was manipulated. If the researcher randomly assigned some participants to make daily to-do lists and others not to, then it is an experiment. If the researcher simply asked participants whether they made daily to-do lists, then it is a correlational study. The distinction is important because if the study was an experiment, then it could be concluded that making the daily to-do lists reduced participants’ stress. But if it was a correlational study, it could only be concluded that these variables are statistically related. Perhaps being stressed has a negative effect on people’s ability to plan ahead (the directionality problem). Or perhaps people who are more conscientious are more likely to make to-do lists and less likely to be stressed (the third-variable problem). The crucial point is that what defines a study as experimental or correlational is not the variables being studied, nor whether the variables are quantitative or categorical, nor the type of graph or statistics used to analyze the data. What defines a study is how the study is conducted.

hypothesis for correlational study example

Data Collection in Correlational Research

Again, the defining feature of correlational research is that neither variable is manipulated. It does not matter how or where the variables are measured. A researcher could have participants come to a laboratory to complete a computerized backward digit span task and a computerized risky decision-making task and then assess the relationship between participants’ scores on the two tasks. Or a researcher could go to a shopping mall to ask people about their attitudes toward the environment and their shopping habits and then assess the relationship between these two variables. Both of these studies would be correlational because no independent variable is manipulated. 

Correlations Between Quantitative Variables

Correlations between quantitative variables are often presented using scatterplots . Figure 6.3 shows some hypothetical data on the relationship between the amount of stress people are under and the number of physical symptoms they have. Each point in the scatterplot represents one person’s score on both variables. For example, the circled point in Figure 6.3 represents a person whose stress score was 10 and who had three physical symptoms. Taking all the points into account, one can see that people under more stress tend to have more physical symptoms. This is a good example of a positive relationship , in which higher scores on one variable tend to be associated with higher scores on the other. In other words, they move in the same direction, either both up or both down. A negative relationship is one in which higher scores on one variable tend to be associated with lower scores on the other. In other words, they move in opposite directions. There is a negative relationship between stress and immune system functioning, for example, because higher stress is associated with lower immune system functioning.

Figure 6.3 Scatterplot Showing a Hypothetical Positive Relationship Between Stress and Number of Physical Symptoms

The strength of a correlation between quantitative variables is typically measured using a statistic called  Pearson’s Correlation Coefficient (or Pearson's  r ) . As Figure 6.4 shows, Pearson’s r ranges from −1.00 (the strongest possible negative relationship) to +1.00 (the strongest possible positive relationship). A value of 0 means there is no relationship between the two variables. When Pearson’s  r  is 0, the points on a scatterplot form a shapeless “cloud.” As its value moves toward −1.00 or +1.00, the points come closer and closer to falling on a single straight line. Correlation coefficients near ±.10 are considered small, values near ± .30 are considered medium, and values near ±.50 are considered large. Notice that the sign of Pearson’s  r  is unrelated to its strength. Pearson’s  r  values of +.30 and −.30, for example, are equally strong; it is just that one represents a moderate positive relationship and the other a moderate negative relationship. With the exception of reliability coefficients, most correlations that we find in Psychology are small or moderate in size. The website http://rpsychologist.com/d3/correlation/ , created by Kristoffer Magnusson, provides an excellent interactive visualization of correlations that permits you to adjust the strength and direction of a correlation while witnessing the corresponding changes to the scatterplot.

Figure 6.4 Range of Pearson’s r, From −1.00 (Strongest Possible Negative Relationship), Through 0 (No Relationship), to +1.00 (Strongest Possible Positive Relationship)

There are two common situations in which the value of Pearson’s  r  can be misleading. Pearson’s  r  is a good measure only for linear relationships, in which the points are best approximated by a straight line. It is not a good measure for nonlinear relationships, in which the points are better approximated by a curved line. Figure 6.5, for example, shows a hypothetical relationship between the amount of sleep people get per night and their level of depression. In this example, the line that best approximates the points is a curve—a kind of upside-down “U”—because people who get about eight hours of sleep tend to be the least depressed. Those who get too little sleep and those who get too much sleep tend to be more depressed. Even though Figure 6.5 shows a fairly strong relationship between depression and sleep, Pearson’s  r  would be close to zero because the points in the scatterplot are not well fit by a single straight line. This means that it is important to make a scatterplot and confirm that a relationship is approximately linear before using Pearson’s  r . Nonlinear relationships are fairly common in psychology, but measuring their strength is beyond the scope of this book.

Figure 6.5 Hypothetical Nonlinear Relationship Between Sleep and Depression

The other common situations in which the value of Pearson’s  r  can be misleading is when one or both of the variables have a limited range in the sample relative to the population. This problem is referred to as  restriction of range . Assume, for example, that there is a strong negative correlation between people’s age and their enjoyment of hip hop music as shown by the scatterplot in Figure 6.6. Pearson’s  r  here is −.77. However, if we were to collect data only from 18- to 24-year-olds—represented by the shaded area of Figure 6.6—then the relationship would seem to be quite weak. In fact, Pearson’s  r  for this restricted range of ages is 0. It is a good idea, therefore, to design studies to avoid restriction of range. For example, if age is one of your primary variables, then you can plan to collect data from people of a wide range of ages. Because restriction of range is not always anticipated or easily avoidable, however, it is good practice to examine your data for possible restriction of range and to interpret Pearson’s  r  in light of it. (There are also statistical methods to correct Pearson’s  r  for restriction of range, but they are beyond the scope of this book).

Figure 6.6 Hypothetical Data Showing How a Strong Overall Correlation Can Appear to Be Weak When One Variable Has a Restricted Range

Correlation Does Not Imply Causation

You have probably heard repeatedly that “Correlation does not imply causation.” An amusing example of this comes from a 2012 study that showed a positive correlation (Pearson’s r = 0.79) between the per capita chocolate consumption of a nation and the number of Nobel prizes awarded to citizens of that nation [2] . It seems clear, however, that this does not mean that eating chocolate causes people to win Nobel prizes, and it would not make sense to try to increase the number of Nobel prizes won by recommending that parents feed their children more chocolate.

There are two reasons that correlation does not imply causation. The first is called the  directionality problem . Two variables,  X  and  Y , can be statistically related because X  causes  Y  or because  Y  causes  X . Consider, for example, a study showing that whether or not people exercise is statistically related to how happy they are—such that people who exercise are happier on average than people who do not. This statistical relationship is consistent with the idea that exercising causes happiness, but it is also consistent with the idea that happiness causes exercise. Perhaps being happy gives people more energy or leads them to seek opportunities to socialize with others by going to the gym. The second reason that correlation does not imply causation is called the  third-variable problem . Two variables,  X  and  Y , can be statistically related not because  X  causes  Y , or because  Y  causes  X , but because some third variable,  Z , causes both  X  and  Y . For example, the fact that nations that have won more Nobel prizes tend to have higher chocolate consumption probably reflects geography in that European countries tend to have higher rates of per capita chocolate consumption and invest more in education and technology (once again, per capita) than many other countries in the world. Similarly, the statistical relationship between exercise and happiness could mean that some third variable, such as physical health, causes both of the others. Being physically healthy could cause people to exercise and cause them to be happier. Correlations that are a result of a third-variable are often referred to as  spurious correlations .

Some excellent and amusing examples of spurious correlations can be found at http://www.tylervigen.com  (Figure 6.7  provides one such example).

hypothesis for correlational study example

“Lots of Candy Could Lead to Violence”

Although researchers in psychology know that correlation does not imply causation, many journalists do not. One website about correlation and causation, http://jonathan.mueller.faculty.noctrl.edu/100/correlation_or_causation.htm , links to dozens of media reports about real biomedical and psychological research. Many of the headlines suggest that a causal relationship has been demonstrated when a careful reading of the articles shows that it has not because of the directionality and third-variable problems.

One such article is about a study showing that children who ate candy every day were more likely than other children to be arrested for a violent offense later in life. But could candy really “lead to” violence, as the headline suggests? What alternative explanations can you think of for this statistical relationship? How could the headline be rewritten so that it is not misleading?

As you have learned by reading this book, there are various ways that researchers address the directionality and third-variable problems. The most effective is to conduct an experiment. For example, instead of simply measuring how much people exercise, a researcher could bring people into a laboratory and randomly assign half of them to run on a treadmill for 15 minutes and the rest to sit on a couch for 15 minutes. Although this seems like a minor change to the research design, it is extremely important. Now if the exercisers end up in more positive moods than those who did not exercise, it cannot be because their moods affected how much they exercised (because it was the researcher who used random assignment to determine how much they exercised). Likewise, it cannot be because some third variable (e.g., physical health) affected both how much they exercised and what mood they were in. Thus experiments eliminate the directionality and third-variable problems and allow researchers to draw firm conclusions about causal relationships.

Media Attributions

  • Nicholas Cage and Pool Drownings  © Tyler Viegen is licensed under a  CC BY (Attribution)  license
  • Bushman, B. J., & Huesmann, L. R. (2001). Effects of televised violence on aggression. In D. Singer & J. Singer (Eds.), Handbook of children and the media (pp. 223–254). Thousand Oaks, CA: Sage. ↵
  • Messerli, F. H. (2012). Chocolate consumption, cognitive function, and Nobel laureates. New England Journal of Medicine, 367 , 1562-1564. ↵

A graph that presents correlations between two quantitative variables, one on the x-axis and one on the y-axis. Scores are plotted at the intersection of the values on each axis.

A relationship in which higher scores on one variable tend to be associated with higher scores on the other.

A relationship in which higher scores on one variable tend to be associated with lower scores on the other.

A statistic that measures the strength of a correlation between quantitative variables.

When one or both variables have a limited range in the sample relative to the population, making the value of the correlation coefficient misleading.

The problem where two variables, X  and  Y , are statistically related either because X  causes  Y, or because  Y  causes  X , and thus the causal direction of the effect cannot be known.

Two variables, X and Y, can be statistically related not because X causes Y, or because Y causes X, but because some third variable, Z, causes both X and Y.

Correlations that are a result not of the two variables being measured, but rather because of a third, unmeasured, variable that affects both of the measured variables.

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Correlation in Psychology: Meaning, Types, Examples & coefficient

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Correlation means association – more precisely, it measures the extent to which two variables are related. There are three possible results of a correlational study: a positive correlation, a negative correlation, and no correlation.
  • A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

positive correlation

  • A negative correlation is a relationship between two variables in which an increase in one variable is associated with a decrease in the other. An example of a negative correlation would be the height above sea level and temperature. As you climb the mountain (increase in height), it gets colder (decrease in temperature).

negative correlation

  • A zero correlation exists when there is no relationship between two variables. For example, there is no relationship between the amount of tea drunk and the level of intelligence.

zero correlation

Scatter Plots

A correlation can be expressed visually. This is done by drawing a scatter plot (also known as a scattergram, scatter graph, scatter chart, or scatter diagram).

A scatter plot is a graphical display that shows the relationships or associations between two numerical variables (or co-variables), which are represented as points (or dots) for each pair of scores.

A scatter plot indicates the strength and direction of the correlation between the co-variables.

Types of Correlations: Positive, Negative, and Zero

When you draw a scatter plot, it doesn’t matter which variable goes on the x-axis and which goes on the y-axis.

Remember, in correlations, we always deal with paired scores, so the values of the two variables taken together will be used to make the diagram.

Decide which variable goes on each axis and then simply put a cross at the point where the two values coincide.

Uses of Correlations

  • If there is a relationship between two variables, we can make predictions about one from another.
  • Concurrent validity (correlation between a new measure and an established measure).

Reliability

  • Test-retest reliability (are measures consistent?).
  • Inter-rater reliability (are observers consistent?).

Theory verification

  • Predictive validity.

Correlation Coefficients

Instead of drawing a scatter plot, a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.

Correlation Coefficient Interpretation

The correlation coefficient ( r ) indicates the extent to which the pairs of numbers for these two variables lie on a straight line. Values over zero indicate a positive correlation, while values under zero indicate a negative correlation.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up.

There is no rule for determining what correlation size is considered strong, moderate, or weak. The interpretation of the coefficient depends on the topic of study.

When studying things that are difficult to measure, we should expect the correlation coefficients to be lower (e.g., above 0.4 to be relatively strong). When we are studying things that are easier to measure, such as socioeconomic status, we expect higher correlations (e.g., above 0.75 to be relatively strong).)

In these kinds of studies, we rarely see correlations above 0.6. For this kind of data, we generally consider correlations above 0.4 to be relatively strong; correlations between 0.2 and 0.4 are moderate, and those below 0.2 are considered weak.

When we are studying things that are more easily countable, we expect higher correlations. For example, with demographic data, we generally consider correlations above 0.75 to be relatively strong; correlations between 0.45 and 0.75 are moderate, and those below 0.45 are considered weak.

Correlation vs. Causation

Causation means that one variable (often called the predictor variable or independent variable) causes the other (often called the outcome variable or dependent variable).

Experiments can be conducted to establish causation. An experiment isolates and manipulates the independent variable to observe its effect on the dependent variable and controls the environment in order that extraneous variables may be eliminated.

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.

causation correlationg graph

While variables are sometimes correlated because one does cause the other, it could also be that some other factor, a confounding variable , is actually causing the systematic movement in our variables of interest.

Correlation does not always prove causation, as a third variable may be involved. For example, being a patient in a hospital is correlated with dying, but this does not mean that one event causes the other, as another third variable might be involved (such as diet and level of exercise).

“Correlation is not causation” means that just because two variables are related it does not necessarily mean that one causes the other.

A correlation identifies variables and looks for a relationship between them. An experiment tests the effect that an independent variable has upon a dependent variable but a correlation looks for a relationship between two variables.

This means that the experiment can predict cause and effect (causation) but a correlation can only predict a relationship, as another extraneous variable may be involved that it not known about.

1. Correlation allows the researcher to investigate naturally occurring variables that may be unethical or impractical to test experimentally. For example, it would be unethical to conduct an experiment on whether smoking causes lung cancer.

2 . Correlation allows the researcher to clearly and easily see if there is a relationship between variables. This can then be displayed in a graphical form.

Limitations

1 . Correlation is not and cannot be taken to imply causation. Even if there is a very strong association between two variables, we cannot assume that one causes the other.

For example, suppose we found a positive correlation between watching violence on T.V. and violent behavior in adolescence.

It could be that the cause of both these is a third (extraneous) variable – for example, growing up in a violent home – and that both the watching of T.V. and the violent behavior is the outcome of this.

2 . Correlation does not allow us to go beyond the given data. For example, suppose it was found that there was an association between time spent on homework (1/2 hour to 3 hours) and the number of G.C.S.E. passes (1 to 6).

It would not be legitimate to infer from this that spending 6 hours on homework would likely generate 12 G.C.S.E. passes.

How do you know if a study is correlational?

A study is considered correlational if it examines the relationship between two or more variables without manipulating them. In other words, the study does not involve the manipulation of an independent variable to see how it affects a dependent variable.

One way to identify a correlational study is to look for language that suggests a relationship between variables rather than cause and effect.

For example, the study may use phrases like “associated with,” “related to,” or “predicts” when describing the variables being studied.

Another way to identify a correlational study is to look for information about how the variables were measured. Correlational studies typically involve measuring variables using self-report surveys, questionnaires, or other measures of naturally occurring behavior.

Finally, a correlational study may include statistical analyses such as correlation coefficients or regression analyses to examine the strength and direction of the relationship between variables.

Why is a correlational study used?

Correlational studies are particularly useful when it is not possible or ethical to manipulate one of the variables.

For example, it would not be ethical to manipulate someone’s age or gender. However, researchers may still want to understand how these variables relate to outcomes such as health or behavior.

Additionally, correlational studies can be used to generate hypotheses and guide further research.

If a correlational study finds a significant relationship between two variables, this can suggest a possible causal relationship that can be further explored in future research.

What is the goal of correlational research?

The ultimate goal of correlational research is to increase our understanding of how different variables are related and to identify patterns in those relationships.

This information can then be used to generate hypotheses and guide further research aimed at establishing causality.

Print Friendly, PDF & Email

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

hypothesis for correlational study example

Home Market Research

Correlational Research: What it is with Examples

Use correlational research method to conduct a correlational study and measure the statistical relationship between two variables. Learn more.

Our minds can do some brilliant things. For example, it can memorize the jingle of a pizza truck. The louder the jingle, the closer the pizza truck is to us. Who taught us that? Nobody! We relied on our understanding and came to a conclusion. We don’t stop there, do we? If there are multiple pizza trucks in the area and each one has a different jingle, we would memorize it all and relate the jingle to its pizza truck.

This is what correlational research precisely is, establishing a relationship between two variables, “jingle” and “distance of the truck” in this particular example. The correlational study looks for variables that seem to interact with each other. When you see one variable changing, you have a fair idea of how the other variable will change.

What is Correlational research?

Correlational research is a type of non-experimental research method in which a researcher measures two variables and understands and assesses the statistical relationship between them with no influence from any extraneous variable. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities.

Correlational Research Example

The correlation coefficient shows the correlation between two variables (A correlation coefficient is a statistical measure that calculates the strength of the relationship between two variables), a value measured between -1 and +1. When the correlation coefficient is close to +1, there is a positive correlation between the two variables. If the value is relative to -1, there is a negative correlation between the two variables. When the value is close to zero, then there is no relationship between the two variables.

Let us take an example to understand correlational research.

Consider hypothetically, a researcher is studying a correlation between cancer and marriage. In this study, there are two variables: disease and marriage. Let us say marriage has a negative association with cancer. This means that married people are less likely to develop cancer.

However, this doesn’t necessarily mean that marriage directly avoids cancer. In correlational research, it is not possible to establish the fact, what causes what. It is a misconception that a correlational study involves two quantitative variables. However, the reality is two variables are measured, but neither is changed. This is true independent of whether the variables are quantitative or categorical.

Types of correlational research

Mainly three types of correlational research have been identified:

1. Positive correlation: A positive relationship between two variables is when an increase in one variable leads to a rise in the other variable. A decrease in one variable will see a reduction in the other variable. For example, the amount of money a person has might positively correlate with the number of cars the person owns.

2. Negative correlation: A negative correlation is quite literally the opposite of a positive relationship. If there is an increase in one variable, the second variable will show a decrease, and vice versa.

For example, being educated might negatively correlate with the crime rate when an increase in one variable leads to a decrease in another and vice versa. If a country’s education level is improved, it can lower crime rates. Please note that this doesn’t mean that lack of education leads to crimes. It only means that a lack of education and crime is believed to have a common reason – poverty.

3. No correlation: There is no correlation between the two variables in this third type . A change in one variable may not necessarily see a difference in the other variable. For example, being a millionaire and happiness are not correlated. An increase in money doesn’t lead to happiness.

Characteristics of correlational research

Correlational research has three main characteristics. They are: 

  • Non-experimental : The correlational study is non-experimental. It means that researchers need not manipulate variables with a scientific methodology to either agree or disagree with a hypothesis. The researcher only measures and observes the relationship between the variables without altering them or subjecting them to external conditioning.
  • Backward-looking : Correlational research only looks back at historical data and observes events in the past. Researchers use it to measure and spot historical patterns between two variables. A correlational study may show a positive relationship between two variables, but this can change in the future.
  • Dynamic : The patterns between two variables from correlational research are never constant and are always changing. Two variables having negative correlation research in the past can have a positive correlation relationship in the future due to various factors.

Data collection

The distinctive feature of correlational research is that the researcher can’t manipulate either of the variables involved. It doesn’t matter how or where the variables are measured. A researcher could observe participants in a closed environment or a public setting.

Correlational Research

Researchers use two data collection methods to collect information in correlational research.

01. Naturalistic observation

Naturalistic observation is a way of data collection in which people’s behavioral targeting is observed in their natural environment, in which they typically exist. This method is a type of field research. It could mean a researcher might be observing people in a grocery store, at the cinema, playground, or in similar places.

Researchers who are usually involved in this type of data collection make observations as unobtrusively as possible so that the participants involved in the study are not aware that they are being observed else they might deviate from being their natural self.

Ethically this method is acceptable if the participants remain anonymous, and if the study is conducted in a public setting, a place where people would not normally expect complete privacy. As mentioned previously, taking an example of the grocery store where people can be observed while collecting an item from the aisle and putting in the shopping bags. This is ethically acceptable, which is why most researchers choose public settings for recording their observations. This data collection method could be both qualitative and quantitative . If you need to know more about qualitative data, you can explore our newly published blog, “ Examples of Qualitative Data in Education .”

02. Archival data

Another approach to correlational data is the use of archival data. Archival information is the data that has been previously collected by doing similar kinds of research . Archival data is usually made available through primary research .

In contrast to naturalistic observation, the information collected through archived data can be pretty straightforward. For example, counting the number of people named Richard in the various states of America based on social security records is relatively short.

Use the correlational research method to conduct a correlational study and measure the statistical relationship between two variables. Uncover the insights that matter the most. Use QuestionPro’s research platform to uncover complex insights that can propel your business to the forefront of your industry.

Research to make better decisions. Start a free trial today. No credit card required.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

user journey vs user flow

User Journey vs User Flow: Differences and Similarities

Apr 26, 2024

gap analysis tools

Best 7 Gap Analysis Tools to Empower Your Business

Apr 25, 2024

employee survey tools

12 Best Employee Survey Tools for Organizational Excellence

Customer Experience Management Platform

Customer Experience Management Platform: Software & Practices

Apr 24, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence
  • Privacy Policy

Research Method

Home » Correlation Analysis – Types, Methods and Examples

Correlation Analysis – Types, Methods and Examples

Table of Contents

Correlation Analysis

Correlation Analysis

Correlation analysis is a statistical method used to evaluate the strength and direction of the relationship between two or more variables . The correlation coefficient ranges from -1 to 1.

  • A correlation coefficient of 1 indicates a perfect positive correlation. This means that as one variable increases, the other variable also increases.
  • A correlation coefficient of -1 indicates a perfect negative correlation. This means that as one variable increases, the other variable decreases.
  • A correlation coefficient of 0 means that there’s no linear relationship between the two variables.

Correlation Analysis Methodology

Conducting a correlation analysis involves a series of steps, as described below:

  • Define the Problem : Identify the variables that you think might be related. The variables must be measurable on an interval or ratio scale. For example, if you’re interested in studying the relationship between the amount of time spent studying and exam scores, these would be your two variables.
  • Data Collection : Collect data on the variables of interest. The data could be collected through various means such as surveys , observations , or experiments. It’s crucial to ensure that the data collected is accurate and reliable.
  • Data Inspection : Check the data for any errors or anomalies such as outliers or missing values. Outliers can greatly affect the correlation coefficient, so it’s crucial to handle them appropriately.
  • Choose the Appropriate Correlation Method : Select the correlation method that’s most appropriate for your data. If your data meets the assumptions for Pearson’s correlation (interval or ratio level, linear relationship, variables are normally distributed), use that. If your data is ordinal or doesn’t meet the assumptions for Pearson’s correlation, consider using Spearman’s rank correlation or Kendall’s Tau.
  • Compute the Correlation Coefficient : Once you’ve selected the appropriate method, compute the correlation coefficient. This can be done using statistical software such as R, Python, or SPSS, or manually using the formulas.
  • Interpret the Results : Interpret the correlation coefficient you obtained. If the correlation is close to 1 or -1, the variables are strongly correlated. If the correlation is close to 0, the variables have little to no linear relationship. Also consider the sign of the correlation coefficient: a positive sign indicates a positive relationship (as one variable increases, so does the other), while a negative sign indicates a negative relationship (as one variable increases, the other decreases).
  • Check the Significance : It’s also important to test the statistical significance of the correlation. This typically involves performing a t-test. A small p-value (commonly less than 0.05) suggests that the observed correlation is statistically significant and not due to random chance.
  • Report the Results : The final step is to report your findings. This should include the correlation coefficient, the significance level, and a discussion of what these findings mean in the context of your research question.

Types of Correlation Analysis

Types of Correlation Analysis are as follows:

Pearson Correlation

This is the most common type of correlation analysis. Pearson correlation measures the linear relationship between two continuous variables. It assumes that the variables are normally distributed and have equal variances. The correlation coefficient (r) ranges from -1 to +1, with -1 indicating a perfect negative linear relationship, +1 indicating a perfect positive linear relationship, and 0 indicating no linear relationship.

Spearman Rank Correlation

Spearman’s rank correlation is a non-parametric measure that assesses how well the relationship between two variables can be described using a monotonic function. In other words, it evaluates the degree to which, as one variable increases, the other variable tends to increase, without requiring that increase to be consistent.

Kendall’s Tau

Kendall’s Tau is another non-parametric correlation measure used to detect the strength of dependence between two variables. Kendall’s Tau is often used for variables measured on an ordinal scale (i.e., where values can be ranked).

Point-Biserial Correlation

This is used when you have one dichotomous and one continuous variable, and you want to test for correlations. It’s a special case of the Pearson correlation.

Phi Coefficient

This is used when both variables are dichotomous or binary (having two categories). It’s a measure of association for two binary variables.

Canonical Correlation

This measures the correlation between two multi-dimensional variables. Each variable is a combination of data sets, and the method finds the linear combination that maximizes the correlation between them.

Partial and Semi-Partial (Part) Correlations

These are used when the researcher wants to understand the relationship between two variables while controlling for the effect of one or more additional variables.

Cross-Correlation

Used mostly in time series data to measure the similarity of two series as a function of the displacement of one relative to the other.

Autocorrelation

This is the correlation of a signal with a delayed copy of itself as a function of delay. This is often used in time series analysis to help understand the trend in the data over time.

Correlation Analysis Formulas

There are several formulas for correlation analysis, each corresponding to a different type of correlation. Here are some of the most commonly used ones:

Pearson’s Correlation Coefficient (r)

Pearson’s correlation coefficient measures the linear relationship between two variables. The formula is:

   r = Σ[(xi – Xmean)(yi – Ymean)] / sqrt[(Σ(xi – Xmean)²)(Σ(yi – Ymean)²)]

  • xi and yi are the values of X and Y variables.
  • Xmean and Ymean are the mean values of X and Y.
  • Σ denotes the sum of the values.

Spearman’s Rank Correlation Coefficient (rs)

Spearman’s correlation coefficient measures the monotonic relationship between two variables. The formula is:

   rs = 1 – (6Σd² / n(n² – 1))

  • d is the difference between the ranks of corresponding variables.
  • n is the number of observations.

Kendall’s Tau (τ)

Kendall’s Tau is a measure of rank correlation. The formula is:

   τ = (nc – nd) / 0.5n(n-1)

  • nc is the number of concordant pairs.
  • nd is the number of discordant pairs.

This correlation is a special case of Pearson’s correlation, and so, it uses the same formula as Pearson’s correlation.

Phi coefficient is a measure of association for two binary variables. It’s equivalent to Pearson’s correlation in this specific case.

Partial Correlation

The formula for partial correlation is more complex and depends on the Pearson’s correlation coefficients between the variables.

For partial correlation between X and Y given Z:

  rp(xy.z) = (rxy – rxz * ryz) / sqrt[(1 – rxz^2)(1 – ryz^2)]

  • rxy, rxz, ryz are the Pearson’s correlation coefficients.

Correlation Analysis Examples

Here are a few examples of how correlation analysis could be applied in different contexts:

  • Education : A researcher might want to determine if there’s a relationship between the amount of time students spend studying each week and their exam scores. The two variables would be “study time” and “exam scores”. If a positive correlation is found, it means that students who study more tend to score higher on exams.
  • Healthcare : A healthcare researcher might be interested in understanding the relationship between age and cholesterol levels. If a positive correlation is found, it could mean that as people age, their cholesterol levels tend to increase.
  • Economics : An economist may want to investigate if there’s a correlation between the unemployment rate and the rate of crime in a given city. If a positive correlation is found, it could suggest that as the unemployment rate increases, the crime rate also tends to increase.
  • Marketing : A marketing analyst might want to analyze the correlation between advertising expenditure and sales revenue. A positive correlation would suggest that higher advertising spending is associated with higher sales revenue.
  • Environmental Science : A scientist might be interested in whether there’s a relationship between the amount of CO2 emissions and average temperature increase. A positive correlation would indicate that higher CO2 emissions are associated with higher average temperatures.

Importance of Correlation Analysis

Correlation analysis plays a crucial role in many fields of study for several reasons:

  • Understanding Relationships : Correlation analysis provides a statistical measure of the relationship between two or more variables. It helps in understanding how one variable may change in relation to another.
  • Predicting Trends : When variables are correlated, changes in one can predict changes in another. This is particularly useful in fields like finance, weather forecasting, and technology, where forecasting trends is vital.
  • Data Reduction : If two variables are highly correlated, they are conveying similar information, and you may decide to use only one of them in your analysis, reducing the dimensionality of your data.
  • Testing Hypotheses : Correlation analysis can be used to test hypotheses about relationships between variables. For example, a researcher might want to test whether there’s a significant positive correlation between physical exercise and mental health.
  • Determining Factors : It can help identify factors that are associated with certain behaviors or outcomes. For example, public health researchers might analyze correlations to identify risk factors for diseases.
  • Model Building : Correlation is a fundamental concept in building multivariate statistical models, including regression models and structural equation models. These models often require an understanding of the inter-relationships (correlations) among multiple variables.
  • Validity and Reliability Analysis : In psychometrics, correlation analysis is used to assess the validity and reliability of measurement instruments such as tests or surveys.

Applications of Correlation Analysis

Correlation analysis is used in many fields to understand and quantify the relationship between variables. Here are some of its key applications:

  • Finance : In finance, correlation analysis is used to understand the relationship between different investment types or the risk and return of a portfolio. For example, if two stocks are positively correlated, they tend to move together; if they’re negatively correlated, they move in opposite directions.
  • Economics : Economists use correlation analysis to understand the relationship between various economic indicators, such as GDP and unemployment rate, inflation rate and interest rates, or income and consumption patterns.
  • Marketing : Correlation analysis can help marketers understand the relationship between advertising spend and sales, or the relationship between price changes and demand.
  • Psychology : In psychology, correlation analysis can be used to understand the relationship between different psychological variables, such as the correlation between stress levels and sleep quality, or between self-esteem and academic performance.
  • Medicine : In healthcare, correlation analysis can be used to understand the relationships between various health outcomes and potential predictors. For example, researchers might investigate the correlation between physical activity levels and heart disease, or between smoking and lung cancer.
  • Environmental Science : Correlation analysis can be used to investigate the relationships between different environmental factors, such as the correlation between CO2 levels and average global temperature, or between pesticide use and biodiversity.
  • Social Sciences : In fields like sociology and political science, correlation analysis can be used to investigate relationships between different social and political phenomena, such as the correlation between education levels and political participation, or between income inequality and social unrest.

Advantages and Disadvantages of Correlation Analysis

About the author.

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Cluster Analysis

Cluster Analysis – Types, Methods and Examples

Discriminant Analysis

Discriminant Analysis – Methods, Types and...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

Graphical Methods

Graphical Methods – Types, Examples and Guide

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Interpretation of correlations in clinical research

1 University of Utah Department of Orthopaedic Surgery Operations, Salt Lake City, Utah, USA

Jerry Bounsanga

Maren wright voss, background:.

Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence.

As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies.

We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias.

Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice.

Conclusion:

Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.

Introduction

It is common for physicians to underestimate the relevance and importance of their training in statistics and probability, at least until its relevance becomes clear in later clinical practice. 1 Evidence based practice is the standard of care, yet evaluating the quality of evidence can be a difficult process. The British Medical Journal established clear statistical guidelines for contributions to medical journals in the 1980’s. 2 A survey of medical residents showed near perfect agreement (95%) that understanding statistics they encounter in medical journals is important, but 75% said they lacked knowledge. 3 The average score on the basic biostatistics exam for the residents surveyed was 41% correct, showing both the objective and subjective need for a stronger statistical training foundation. An international survey of practicing doctors similarly found that doctors averaged scores of 40% when tested on the basics of statistical methods and epidemiology. 4

The efforts to improve the methods behind clinical science are ongoing. Consolidated Standards of Reporting Trials (CONSORT) have been established, 5 along with multiple other research and statistical method guidelines in recent years. 6 These guidelines are updated periodically 7 and endorsed or extended by specific clinical medical groups, 8 , 9 or medical journals. 10 – 13 Key to many of these guidelines are attempts to ensure that bias and error are minimized in research, ensuring interpretations are meaningful and accurate. It would not be practical to detail an exhaustive list of guidance and recommendations. However, with new information and research approaches constantly coming forward, physicians must have the ability to critically evaluate the quality of evidence presented. Critically analyzing research is a key skill in evidence based practice and requires knowledge of methods, results interpretation, and applicability—all three of which require an understanding of basic statistics. 14

The intention of this article is to serve as a basic primer regarding critical statistical concepts that appear in medical literature with a focus on the concept of correlation and how it is best utilized in clinical interpretation for understanding the relationships between health factors. At its foundation, correlational analysis quantifies the direction and strength of relationships between two variables. Understanding correlations can form the basis for interpreting applications of clinical research.

1. What Does Correlation Tell Us?

Correlation is concerned with association; it can look at any two measured concepts and compare their relationships. These measured concepts are often referred to as variables and are assigned letter labels (X, Y). Thus, the correlation is the measure of the relationship between X and Y, and it ranges from −1 to 1. Its value (or coefficient) is scaled within this range to assist in interpretation, with 0 indicating no relationship between variables X and Y, and −1 or 1 indicating the ability to perfectly predict X from Y or Y from X (see Figure 1 ). A correlation coefficient provides two pieces of information. First, it predicts where X (the measured value of interest) falls on a line given a known value of Y. Second, it expresses a reduction in variability associated with knowing Y, telling us something about the expected range of the X value. 15 Correlation takes into account the full range of scores, but as a statistical tool it is not very sensitive to scores on the very high or very low ends of our X value. The common Pearson correlation is best used to describe linear relationships. If the pattern of association between the two variables is, for example, a “U” shaped curve, the correlation results might be low, even though a defined relationship exists. 16

An external file that holds a picture, illustration, etc.
Object name is nihms-1504734-f0001.jpg

1) Perfect negative correlation between two variables; 2) No patterned relationship between 2 variables; 3) Perfect positive correlation between two variables

Correlation is not easily impacted by skewed or off-center data results. The nature of the data can be described by its parameters, like measures of central tendency, which inform us about the distribution of the data. Parametric tests make assumptions, such as the data are normally distributed, while non-parametric tests are called distribution-free tests because they make no assumptions about the distribution of the data. Yet even if the pattern of scores are not in a normal bell shaped curve or they don’t create a direct linear relationship, correlations can still be reliable. A number of correlation measures have been developed to handle different types of data (non-parametric tests like the kendall rank, spearman rank correlation, phi correlation, biserial correlation, point-biserial correlation and gamma correlation). Yet even the simple Pearson correlation handles extreme violations of normality (no bell shape to the pattern of scores) and scale. The Pearson correlation was tested by randomly drawing 5,000 small samples (n=5 to n=15) from a population of 10,000 to calculate the distribution of r values yielded (small samples might challenge parametric assumptions), and was still found to be a reliable indicator of the relationship between variables. 17 It is also possible to use transformations to normalize the distribution of data. Correlation can be a robust measure, in part from its ability to tolerate these violations of normally distributed data while staying sensitive to the individual case. The properties of correlation make the technique useful in interpreting the meaning derived from clinical data.

2. Use of Correlation

There are many concerns with the statistical techniques that are commonly utilized in the literature. Some concerns arise from a misunderstanding of the statistical measure and others from its misapplication. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Importantly, correlation is a measure of association, which is insufficient to infer causation. Correlation can only measure whether a relationship exists between two variables, but it does not indicate causal relationship. There must be a convincing body of evidence to take the next step on the path to inferring that one variable causes the other. Randomized controlled trials or more advanced statistical methods such as path analysis and structural equation modeling, coupled with proper research design, are needed in order to take the next step of inference in the causal chain, testing a causal hypothesis. While correlational analyses are by definition from non-experimental research, research without carefully controlled experimental conditions, it is nevertheless relevant to evidence-based practice. 18 Observational methods of study can be conducted using either a cross-sectional design (a snap-shot of prevalence at one time-point), a retrospective design (looking back to compare current with past attributes) or prospective design (documenting current occurrences and following up at a future time-point to make comparisons). Correlations drawn from cross-sectional studies cannot establish the temporal relationship that links cause with effect, yet adding a retrospective or prospective observational design provides additional strength to the association and helps support hypothesis generation to then later test the causal assertions with a different research design. 19

When non-experimental methods are used, it means the relationship seen between the two variables is vulnerable to bias from anything that was not measured (unobserved variables). Whether studying pain or function or treatment response, there are a host of possible factors that might be important to the observed correlational relationship between X and Y. Any given study has only measured and reported on a fraction of the potential variables of impact. These unobserved variables could potentially explain the observed relationship, so it would be premature to assume a treatment effect based on correlational data. The unobserved variables might be affecting the study variables, changing the relationship in a way that might alter the interpretation of the data. Thus interpretation of correlational findings must be quite cautious until further research is completed.

Another concern is the use (or abuse) of the term ‘statistically significant’ in correlational analysis. This concern is not new. The abuse of significance testing was noted in a 1987 review published in the New England Journal of Medicine. That article found that a number of components in clinical trials, such as having several measures of the outcome (i.e., multiple tests of function, health, or pain), repeated measures over time, including subgroup analyses, or multiple treatments in the same trial, can lead to a bias in reporting which exaggerates the size or importance of observed differences. 20 It is natural for researchers to want to thoroughly evaluate the potential difference between treatments conditions. This has been sometimes referred to as the kitchen sink approach and it presents a problem for using significance tests. Significance as a statistical procedure addresses the question of the probability of the hypothesized occurrence. If the probability ( p ) is less than say 5% or 1%, the researcher might feel comfortable making the assumption that the observed event was not due to chance. The 0.05 significance value was original proposed by Sir Ronald Fisher in 1925, but the 0.05 value was never intended as a hard and fast rule. 21 If researchers use a cut-off of p=0.05 to determine whether the effect they see in their research has occurred randomly by chance, running multiple tests can quickly move the needle from a rare to an expected event. Every analysis run with a p =0.05 alpha criteria yields a 5% chance that the “significant” finding is actually a chance occurrence, called a Type 1 error.

So we can see the difficulty that occurs when running 20 different analyses on the same data. This would produce a 64% chance that a significant p -value will show up erroneously, when there is no systematic relationship, and it is really just a chance occurrence (see Figure 2 ). This over analysis of the data can possibly create an interpretation that a test or treatment should be used when there was no actual treatment effect. This is the problem of alpha-inflation and it needs to be carefully considered both in conducting and interpreting correlational research. It can be corrected by planning ahead for the analyses that will be run and keeping them limited to key theoretical questions. 20 Additionally, alpha can be adjusted in the statistical calculation, for example with the Bonferroni correction, a procedure used to reduce alpha and statistically correct for the inflation created by multiple comparisons. It is performed by simply dividing the alpha value (α) by the number of hypotheses or measures ( m ) tested. If the study wanted to evaluate 5 different surgical placements, using the Bonferroni correction would adjust an original α=0.05 by applying the formula, α/ m or 0.05/5, yielding α=0.01, a stricter standard before a study finding would be considered significant. A Bonferroni correction is a conservative correction for multiple comparisons that reduces the Type 1 error, 22 though less conservative alternatives exist like the Tukey or the Holm-Sidak corrections. The main idea is that clinicians acknowledge the problem of multiple comparisons made in a single study and address the concern so that spurious relationships are not erroneously reported as significant. A lack of awareness about this issue can lead to naïve interpretations of study findings.

An external file that holds a picture, illustration, etc.
Object name is nihms-1504734-f0002.jpg

As the number of comparisons increases, the alpha error rate increases; running 20 comparisons with no correction factor and a significance level set at 0.05 will result in 64% chance of a relationship appearing to be significant (better than 50/50 odds).

Exaggeration of significance testing leads to a third point - are the findings clinically meaningful? A significant finding does not infer a meaningful finding. This is because factors other than variance in scores influence the p -value or significance in a correlational analysis. Sample size is an important element in whether a non-random effect will be found. Small sample sizes might produce unstable, but significant, correlation estimates, so sample sizes greater than 150 to 200 have been recommended. 23 Yet, it is not uncommon for published papers to report significant effects through correlational analysis of sample sizes of less than 150 patients. 24 – 26 While reporting and publishing both the significant and non-significant results are important, given the instability that comes from a small sample size, there should be caution taken with interpretation until replication studies can verify the findings.

Likewise, large samples can also be problematic. A large sample might reveal a statistically significant difference between groups, but its effect might be minimal. In a classic example, a sample of 22,000 subjects showed a highly significant ( p <.00001) reduction in myocardial infarctions that prompted a general recommendation to take aspirin for myocardial infarction prevention. 27 The effect size, however, was less than a 1% reduction in risk, such that the risk of taking aspirin exceeded the benefit. Effect size is the standardized mean difference between groups and is a measure of the magnitude of between group differences. A significant p -value indicates only that a difference exists with no indication of size of the effect. Additionally, a confidence interval (CI) can be constructed for the effect size. CIs present a lower and upper range where the true population value is most likely to lie. 28 If a zero value is not included within the CI of the effect size, we have added assurance that the effect exists, with the size of the CI helpful in estimating the size of the effect. 29 It has been recommended that effect sizes or confidence intervals be included in all reported medical research so that the clinical significance of findings can be assessed. 20 , 28 , 30

3. Proper Interpretation of Correlation

Correlational analyses have been reported as one of the most common analytic techniques in research at the beginning of the 21 st century, particularly in health and epidemiological research. 15 Thus effective and proper interpretation is critical to understanding the literature. Cautious interpretation is particularly important, not only due to the interpretive concerns just detailed (causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias), but also given the publication bias of journals to accept and publish studies with positive findings. 31 If clinicians are less likely to be exposed to under-published contradictory reports, based on null findings that treatments actually had no effect, the interpretation of the positive results must necessarily be cautious until confirmed through strong evidence.

One recent clinical example of correlational findings is an inference that because Cobb angle and sagittal balance are related to symptom severity in back pain, treatments should be aimed to improve sagittal balance. Studies used to draw these conclusions were making an important first step in identifying potentially relationships, but were not conclusive as they did not establish causal relationships, did not report effect sizes, and did not include control groups in the analyses. 32 – 34 The Pearson correlation or the Spearman correlation are tools that predict X from Y or Y from X. The nature of the correlation is symmetric, so that if the variables are inferred from a reversed direction (pain predicting spine function rather than spine function predicting pain), the same prediction holds true. 15 If one is looking for cause and effect, the correlational statistics cannot help. The mathematics of correlation tells us that Y is just as likely to precede X as to come after, because the prediction is the same regardless of which variable is inputted first. Effects cannot be determined directly through correlational analysis and perhaps the reverse relationship is the true relationship.

Because of the possibility of a bidirectional relationship, causal inference will be premature if relying purely on correlational statistics, no matter how many studies report the correlational finding. Correlation can be interpreted as the association between two variables. It cannot be used to indicate causal relationship. In fact, statistical tests cannot prove causal relationships but can only be used to test causal hypotheses. Misinterpretation of correlation is generally related to a lack of understanding of what a statistical test can or cannot do, as well as lacking knowledge in proper research design. Rather than jumping to an assumption of causality, the correlations should prompt the next stage of clinical research through randomly controlled clinical trials or the application of more complex statistical methods such as causal and path analysis. Perhaps a part of the tendency to jump too quickly to causal assertion arises from the nature of the questions asked in clinical research and the desire to quickly move to enhance patient care. New frameworks are emerging in the health sciences that challenge the appeal to a single cause by considering potential outcomes in a more complex ways. 35 Until then, understanding the nature of correlational analysis allows clinicians to be more cautious in interpreting study results.

Advances in research have led to many significant findings that are shaping how we diagnose and treat patients. As these findings might guide surgeons and clinicians into new treatment directions, it is important to consider the strength and nature of the research. Critically analyzing new evidence requires understanding of research methods and relevant statistical applications, all of which require an understanding of the analytic methodologies that lie behind the study findings.14 Evidence based practice is demanding new skills of trained medical professionals as they are presented with an ever-expanding array of research evidence. This short primer on theassumptions and nature of correlational methods of analysis can assist emerging physicians in understanding and exercising the appropriate caution as they critically analyze the evidence before them.

Acknowledgments

This investigation was supported by the University of Utah Department of Orthopaedics Quality Outcomes Research and Assessment, Study Design and Biostatistics Center, with funding in part from the National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant 5UL1TR001067–02.

Declaration of Interests

The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

Illustration

  • Basics of Research Process
  • Methodology

Correlational Study: Design, Methods and Examples

  • Speech Topics
  • Basics of Essay Writing
  • Essay Topics
  • Other Essays
  • Main Academic Essays
  • Research Paper Topics
  • Basics of Research Paper Writing
  • Miscellaneous
  • Chicago/ Turabian
  • Data & Statistics
  • Admission Writing Tips
  • Admission Advice
  • Other Guides
  • Student Life
  • Studying Tips
  • Understanding Plagiarism
  • Academic Writing Tips
  • Basics of Dissertation & Thesis Writing

Illustration

  • Essay Guides
  • Research Paper Guides
  • Formatting Guides
  • Admission Guides
  • Dissertation & Thesis Guides

Correlational Research

Table of contents

Illustration

Use our free Readability checker

Correlational research is a type of research design used to examine the relationship between two or more variables. In correlational research, researchers measure the extent to which two or more variables are related, without manipulating or controlling any of the variables.

Whether you are a beginner or an experienced researcher, chances are you’ve heard something about correlational research. It’s time that you learn more about this type of study more in-depth, since you will be using it a lot.

  • What is correlation?
  • When to use it?
  • How is it different from experimental studies?
  • What data collection method will work?

Grab your pen and get ready to jot down some notes as our paper writing service is going to cover all questions you may have about this type of study. Let’s get down to business! 

What Is Correlational Research: Definition

A correlational research is a preliminary type of study used to explore the connection between two variables. In this type of research, you won’t interfere with the variables. Instead of manipulating or adjusting them, researchers focus more on observation.  Correlational study is a perfect option if you want to figure out if there is any link between variables. You will conduct it in 2 cases:

  • When you want to test a theory about non-causal connection. For example, you may want to know whether drinking hot water boosts the immune system. In this case, you expect that vitamins, healthy lifestyle and regular exercise are those factors that have a real positive impact. However, this doesn’t mean that drinking hot water isn’t associated with the immune system. So measuring this relationship will be really useful.
  • When you want to investigate a causal link. You want to study whether using aerosol products leads to ozone depletion. You don’t have enough expenses for conducting complex research. Besides, you can’t control how often people use aerosols. In this case, you will opt for a correlational study.

Correlational Study: Purpose

Correlational research is most useful for purposes of observation and prediction. Researcher's goal is to observe and measure variables to determine if any relationship exists. In case there is some association, researchers assess how strong it is. As an initial type of research, this method allows you to test and write the hypotheses. Correlational study doesn’t require much time and is rather cheap.

Correlational Research Design

Correlational research designs are often used in psychology, epidemiology , medicine and nursing. They show the strength of correlation that exists between the variables within a population. For this reason, these studies are also known as ecological studies.  Correlational research design methods are characterized by such traits:

  • Non-experimental method. No manipulation or exposure to extra conditions takes place. Researchers only examine how variables act in their natural environment without any interference.
  • Fluctuating patterns. Association is never the same and can change due to various factors.
  • Quantitative research. These studies require quantitative research methods . Researchers mostly run a statistical analysis and work with numbers to get results.
  • Association-oriented study. Correlational study is aimed at finding an association between 2 or more phenomena or events. This has nothing to do with causal relationships between dependent and independent variables .

Correlational Research Questions

Correlational research questions usually focus on how one variable related to another one. If there is some connection, you will observe how strong it is. Let’s look at several examples.

Correlational Research Types

Depending on the direction and strength of association, there are 3 types of correlational research:

  • Positive correlation If one variable increases, the other one will grow accordingly. If there is any reduction, both variables will decrease.

Positive correlation in research

  • Negative correlation All changes happen in the reverse direction. If one variable increases, the other one should decrease and vice versa.

Negative correlation in research

  • Zero correlation No association between 2 factors or events can be found.

Zero correlation in research

Correlational Research: Data Collection Methods

There are 3 main methods applied to collect data in correlational research:

  • Surveys and polls
  • Naturalistic observation
  • Secondary or archival data.

It’s essential that you select the right study method. Otherwise, it won’t be possible to achieve accurate results and answer the research question correctly. Let’s have a closer look at each of these methods to make sure that you make the right choice.

Surveys in Correlational Study

Survey is an easy way to collect data about a population in a correlational study. Depending on the nature of the question, you can choose different survey variations. Questionnaires, polls and interviews are the three most popular formats used in a survey research study. To conduct an effective study, you should first identify the population and choose whether you want to run a survey online, via email or in person.

Naturalistic Observation: Correlational Research

Naturalistic observation is another data collection approach in correlational research methodology. This method allows us to observe behavioral patterns in a natural setting. Scientists often document, describe or categorize data to get a clear picture about a group of people. During naturalistic observations, you may work with both qualitative and quantitative research information. Nevertheless, to measure the strength of association, you should analyze numeric data. Members of a population shouldn’t know that they are being studied. Thus, you should blend in a target group as naturally as possible. Otherwise, participants may behave in a different way which may cause a statistical error. 

Correlational Study: Archival Data

Sometimes, you may access ready-made data that suits your study. Archival data is a quick correlational research method that allows to obtain necessary details from the similar studies that have already been conducted. You won’t deal with data collection techniques , since most of numbers will be served on a silver platter. All you will be left to do is analyze them and draw a conclusion. Unfortunately, not all records are accurate, so you should rely only on credible sources.

Pros and Cons of Correlational Research

Choosing what study to run can be difficult. But in this article, we are going to take an in-depth look at advantages and disadvantages of correlational research. This should help you decide whether this type of study is the best fit for you. Without any ado, let’s dive deep right in.

Advantages of Correlational Research

Obviously, one of the many advantages of correlational research is that it can be conducted when an experiment can’t be the case. Sometimes, it may be unethical to run an experimental study or you may have limited resources. This is exactly when ecological study can come in handy.  This type of study also has several benefits that have an irreplaceable value:

  • Works well as a preliminary study
  • Allows examining complex connection between multiple variables
  • Helps you study natural behavior
  • Can be generalized to other settings.

If you decide to run an archival study or conduct a survey, you will be able to save much time and expenses.

Disadvantages of Correlational Research

There are several limitations of correlational research you should keep in mind while deciding on the main methodology. Here are the advantages one should consider:

  • No causal relationships can be identified
  • No chance to manipulate extraneous variables
  • Biased results caused by unnatural behavior
  • Naturalistic studies require quite a lot of time.

As you can see, these types of studies aren’t end-all, be-all. They may indicate a direction for further research. Still, correlational studies don’t show a cause-and-effect relationship which is probably the biggest disadvantage. 

Difference Between Correlational and Experimental Research

Now that you’ve come this far, let’s discuss correlational vs experimental research design . Both studies involve quantitative data. But the main difference lies in the aim of research. Correlational studies are used to identify an association which is measured with a coefficient, while an experiment is aimed at determining a causal relationship.  Due to a different purpose, the studies also have different approaches to control over variables. In the first case, scientists can’t control or otherwise manipulate the variables in question. Meanwhile, experiments allow you to control variables without limit. There is a  causation vs correlation  blog on our website. Find out their differences as it will be useful for your research.

Example of Correlational Research

Above, we have offered several correlational research examples. Let’s have a closer look at how things work using a more detailed example.

Example You want to determine if there is any connection between the time employees work in one company and their performance. An experiment will be rather time-consuming. For this reason, you can offer a questionnaire to collect data and assess an association. After running a survey, you will be able to confirm or disprove your hypothesis.

Correlational Study: Final Thoughts

That’s pretty much everything you should know about correlational study. The key takeaway is that this type of study is used to measure the connection between 2 or more variables. It’s a good choice if you have no chance to run an experiment. However, in this case you won’t be able to control for extraneous variables . So you should consider your options carefully before conducting your own research. 

Illustration

We’ve got your back! Entrust your assignment to our skilled paper writers and they will complete a custom research paper with top quality in mind!

Frequently Asked Questions About Correlational Study

1. what is a correlation.

Correlation is a connection that shows to which extent two or more variables are associated. It doesn’t show a causal link and only helps to identify a direction (positive, negative or zero) or the strength of association.

2. How many variables are in a correlation?

There can be many different variables in a correlation which makes this type of study very useful for exploring complex relationships. However, most scientists use this research to measure the association between only 2 variables.

3. What is a correlation coefficient?

Correlation coefficient (ρ) is a statistical measure that indicates the extent to which two variables are related. Association can be strong, moderate or weak. There are different types of p coefficients: positive, negative and zero.

4. What is a correlational study?

Correlational study is a type of statistical research that involves examining two variables in order to determine association between them. It’s a non-experimental type of study, meaning that researchers can’t change independent variables or control extraneous variables.

Joe_Eckel_1_ab59a03630.jpg

Joe Eckel is an expert on Dissertations writing. He makes sure that each student gets precious insights on composing A-grade academic writing.

You may also like

thumbnail@2x.png

Loading metrics

Open Access

Peer-reviewed

Research Article

Correlation-based tests for the formal comparison of polygenic scores in multiple populations

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

¤ Current address: New York Genome Center, New York, New York, United States of America

Affiliation Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America

ORCID logo

Roles Conceptualization, Funding acquisition, Supervision, Writing – review & editing

  • Sophia Gunn, 
  • Kathryn L. Lunetta

PLOS

  • Published: April 26, 2024
  • https://doi.org/10.1371/journal.pgen.1011249
  • Peer Review
  • Reader Comments

This is an uncorrected proof.

Fig 1

Polygenic scores (PGS) are measures of genetic risk, derived from the results of genome wide association studies (GWAS). Previous work has proposed the coefficient of determination ( R 2 ) as an appropriate measure by which to compare PGS performance in a validation dataset. Here we propose correlation-based methods for evaluating PGS performance by adapting previous work which produced a statistical framework and robust test statistics for the comparison of multiple correlation measures in multiple populations. This flexible framework can be extended to a wider variety of hypothesis tests than currently available methods. We assess our proposed method in simulation and demonstrate its utility with two examples, assessing previously developed PGS for low-density lipoprotein cholesterol and height in multiple populations in the All of Us cohort. Finally, we provide an R package ‘coranova’ with both parametric and nonparametric implementations of the described methods.

Author summary

Polygenic scores (PGS) are measures of genetic risk of disease that have been widely embraced by the scientific community. While there are many methods available to develop PGS, we have limited tools by which to compare PGS performance. Previous work has proposed an R 2 -based approach which appropriately accounts for the correlation between PGS when comparing their performance. Here, we propose correlation-based tests which can assess multiple scores in multiple populations while accounting for the correlation between the scores. Our method is highly flexible and can be used by researchers to test any linear hypothesis of PGS performance, though we suggest three ANOVA-like tests as a starting point. We apply our method to PGS developed for LDL cholesterol and height in the All of Us cohort. In these examples, we demonstrate how our method can be used by researchers to compare and evaluate PGS in multiple populations. This approach will be particularly useful as we look to improve PGS performance in underrepresented populations in genetic research and need to evaluate PGS in multiple populations to appropriately assess PGS performance.

Citation: Gunn S, Lunetta KL (2024) Correlation-based tests for the formal comparison of polygenic scores in multiple populations. PLoS Genet 20(4): e1011249. https://doi.org/10.1371/journal.pgen.1011249

Editor: Xiang Zhou, University of Michigan, UNITED STATES

Received: October 24, 2023; Accepted: April 3, 2024; Published: April 26, 2024

Copyright: © 2024 Gunn, Lunetta. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The PGS for LDL cholesterol and height can be downloaded from the PGS catalog with accession numbers PGP000230 and PGP000382, respectively. The coranova R package can be downloaded from github ( https://github.com/gunns2/coranova ). This study used data from the All of Us Research Program’s Controlled Tier Dataset V6, available to authorized users on the Researcher Workbench. Instructions for access to the All of Us Researcher Workbench is available at https://www.researchallofus.org/register/ . We have also included a jupityr notebook script that we used to compute the PGS in the AoU population in the coranova github page, within the folder “analysis”.

Funding: SG and KLL were supported by funding from the National Heart Lung and Blood Institute (SG: F31HL163952; KLL: R01HL092577). SG was additionally supported by National Institute of General Medical Sciences (5T32GM074905-15). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The rise of large genome wide association studies (GWAS) has enabled researchers to build models for individual genetic risk prediction, called polygenic scores (PGS) [ 1 ]. Polygenic scores predict genetic risk for a given trait with a weighted sum of relevant risk alleles. The risk allele weights are derived from GWAS effect estimates. There are many different methods available for PGS development. At minimum, PGS methods require a GWAS from which to derive the weights, and most also require a linkage disequilibrium (LD) reference and training data to optimize parameters [ 2 ]. Thus, for any given trait, many different polygenic scores can be derived. Further, the PGS once developed can also be applied to different populations, with varying performance due to factors like differences in allele frequencies and LD patterns [ 3 ]. With many possible PGS to choose from, there is a need to develop methods to assess and compare the performance of polygenic scores.

A popular measure of assessment for polygenic scores is the R 2 from a linear regression model fit in a validation dataset with the PGS as the primary predictor and the trait of interest as the dependent variable, adjusting for relevant covariates. The R 2 of a linear regression model is the proportion of variance explained by covariates in the model, also called the coefficient of determination [ 4 ]. Thus, the R 2 from such a model can be interpreted as the proportion of variance explained by the PGS. This approach is appealing because of the connection to heritability. The heritability of a trait is the proportion of phenotypic variation that is explained by additive genetic variation [ 5 ]. The R 2 of a polygenic score is limited by the snp-based heritability, or the proportion of phenotypic variation that is explained by SNPs, of its associated trait. The closer the R 2 of a proposed PGS is to the snp-based heritability, the better the score [ 2 , 6 ].

Momin et al. proposed a formal statistical framework for comparing polygenic scores with R 2 , called R2 Redux [ 7 ]. Using results from Olkin & Finn to generate asymptotic distributions of R 2 , they devised methods to compute the variance and generate confidence intervals for the difference between the R 2 of two polygenic scores [ 8 ]. They also proposed methods for determining the difference in R 2 for nested models, for independent groups, and for genomic partitioning analysis.

While Momin et al.’s R 2 -based approach [ 7 ] is ideal in the applications for which it was designed, R2Redux was not designed for testing multiple scores in multiple populations simultaneously. To address this gap, we propose using correlation-based methods to assess the performance of polygenic scores by adapting the work of Olkin & Finn [ 9 ] and Bilker et al. [ 10 ]. Olkin & Finn derived the asymptotic joint distribution of sample correlations between continuous predictors and continuous outcomes, when the predictors themselves are correlated, like in the case of polygenic scores. Further, they demonstrate how to derive linear hypothesis tests of the correlation measures. Bilker et al. adapted this work of Olkin & Finn and proposed an ANOVA-like testing framework for assessing correlation, called Coranova . They proposed specific hypothesis tests researchers can perform on correlated predictors in multiple independent population samples, applying the method to neurological exams.

We can use the Coranova framework to compare multiple PGS in multiple populations with the three Coranova hypothesis tests. We can assess whether the scores have the same correlation with the outcome of interest within population samples, whether the mean score correlation with the outcome differs between population sample and finally whether the pattern of score performance differs by population sample. The Coranova hypothesis tests are an ideal starting point for researchers to analyze the performance of multiple PGS in multiple populations. However, crucially, researchers can also devise contrast matrices to implement correlation-based hypothesis tests specific to their research interests.

hypothesis for correlational study example

We have built an R package with both parametric and nonparametric implementations of the methods proposed by Olkin & Finn and Bilker et al. for polygenic score evaluation and comparison. With simulations we show that our correlation-based tests have well-controlled type 1 error rates and power greater than 80% to detect differences in polygenic score performance in multiple populations at typical sample sizes under reasonable assumptions of parameter values. Finally, we demonstrate our proposed methods with two real world applications to polygenic scores for low-density lipoprotein (LDL) cholesterol and height in the All of Us cohort (AoU) and provide examples for researchers interested in applying our methods.

Description of the method

Correlation-based tests.

To define our correlation-based tests, we will first define our parameters. Let ρ i , j be the population Pearson correlation of PGS i with outcome Y in population j . Let μ denote the vector of these population correlations of P PGS with outcome Y in population j, μ = ( ρ 1, j , ρ 2, j , … ρ P , j ) and let u be the vector of sample estimates of μ , u = ( r 1, j , r 2, j , … r P , j ).

hypothesis for correlational study example

Coranova hypotheses

Bilker et al. introduced three types of hypothesis tests which enable ANOVA-like testing of correlated variables, like polygenic scores, in multiple population groups by specifying three contrast matrices that can be used in Eq (2) .

Suppose we have K population samples, and P PGS for a given continuous trait Y . We are interested in comparing the correlations between the P PGS and the trait Y among the K population samples. Specifically, we can 1) test for differences in the correlations between the trait Y and P PGS 2) test for differences in the associations between trait Y and the P PGS between the K population samples, and 3) test for an interaction effect between the scores and populations, or in other words, test for differences within the pattern of correlations in the K population samples.

Let ρ i , j be the population correlation between PGS i in population j . We have P PGS and K population samples. The three hypothesis tests can be written as follows;

Test for a within effect.

To test for differences in the correlations between trait Y and the P PGS within the K population samples, let

H 0 : mean( ρ i ,1 , …, ρ i , k ) are equal for all i ∈ 1, …, P

H A : mean( ρ i ,1 , …, ρ i , k ) ≠ mean( ρ m ,1 , …, ρ m , k ) for at least one ( i , m ) pair of measures.

Test for a between effect.

To test for differences in the correlations between the trait Y and P PGSs between the K population samples, let

H 0 : mean( ρ 1, j , …, ρ P , j ) are equal for all j ∈ 1, …, K

H A : mean( ρ 1, j , …, ρ P , j ) ≠ mean( ρ 1, l , …, ρ P , l ) for at least one ( j , l ) pair of populations.

Test for an interaction effect.

To test for a difference in the pattern of correlations between the trait Y and P PGSs in the K population samples, let

H 0 : ( ρ 1, j − ρ i , j ) − ( ρ 1, l − ρ i , l ) = 0 for all i = 1, …, P and j , l ∈ 1, …, k , j ≠ l

H A : at least one interaction not equal to 0.

Using the contrast matrices provided by Bilker et al. (with a typo corrected, see Section B in S1 Appendix ), we can conduct these tests by generating χ 2 test statistics following Eq (2) .

Verification and comparison

Description of simulations.

To assess our implementation of these correlation-based hypothesis tests we performed simulations with K = 2 independent population samples with p polygenic scores built with different methods ( X 1 , X 2 , …, X p ) and a continuous outcome Y . We simulated the outcome and polygenic scores using a multivariate normal distribution for each population, with a specified covariance matrix and sample size n .

hypothesis for correlational study example

We used the following correlation matrices to generate simulated data to assess our methods with two populations and three PGS, where correlation matrix Corr ( Z A ) is used to generate the first population sample and correlation matrix Corr ( Z B ) is used to generate the second population sample.

hypothesis for correlational study example

To assess the performance of the methods with multiple population samples and polygenic scores, we simulated one thousand replicates for each hypothesis and combination of τ , ϕ , δ , n . We simulated τ levels of 0.05, 0.1, 0.2, 0.4, and 0.6, δ levels of 0, 0.01, 0.03, 0.05, 0.075 and 0.1, and ϕ levels of 0.3, 0.5, 0.7 and 0.9. We simulated population samples of 500, 1000 and 5000, and assessed the performance our methods under equal and unequal sample sizes.

We also simulated data with a single population and two PGS to compare our method to the method proposed by Momin et al. with R 2 Redux [ 7 ]. Performing the Coranova “within” based hypothesis test on one population with two scores is identical to testing the difference in correlation between two scores. We simulated τ levels of 0.05, 0.1, 0.2, 0.4, 0.6 and 0.8, δ values of 0, 0.01, 0.03, 0.05, 0.075 and 0.1, and ϕ levels of 0.5, 0.7 and 0.9. We simulated population samples of 500, 1000 and 10000.

Simulation results

When applied to two independent population samples and three polygenic scores, we find that the type I error rate of the proposed method is well controlled for all three Coranova hypotheses at alpha = 0.025 and 0.05 ( Fig 1 , Figs A-C in S3 Appendix ).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

Each point represents the proportion of tests in 1000 simulations in which the null hypothesis was rejected at the specified alpha level with sample size 1000. Dashed lines indicate 95% confidence interval for specified alpha given sample size. In all simulations, the three scores have the same correlation with the outcome ( τ ) in both population samples; ϕ is the correlation between the scores themselves within the population samples. In the “between” setting, the null hypothesis is that all the correlation of the scores and outcome is equal across population samples. In the “within” setting, the null hypothesis being tested is that the scores have the same correlation with the outcome within the population samples. In the “interaction” setting, the null hypothesis being tested is that the pattern of score performance is the same across the population samples.

https://doi.org/10.1371/journal.pgen.1011249.g001

As expected, the power of the three Coranova hypothesis tests increases when sample size increases, and when the magnitude of the difference between the scores’ correlation with the outcome ( δ ) increases ( Fig 2 , Figs D and E in S3 Appendix ). We also see an increase in power when the PGS are more correlated with the outcome (high τ ), due to the inverse relationship with the test-statistics and the variance terms of the correlations (see section A in S2 Appendix for further details).

thumbnail

Each point represents the proportion of tests in 1000 simulations with parameters ϕ , τ and δ in which the null hypothesis was rejected at significance level alpha = 0.05. A: “Between” setting , Within each population, the three PGS have equal correlation with outcome Y. This correlation differs between the two populations; τ : correlation of each PGS with Y in population 1; τ + δ : correlation of each PGS with Y in population 2. B: “Within” setting , Each PGS has the same correlation with Y in the two populations, but the correlation of the third PGS with Y differs from the other two in the same way in both populations; τ : correlation between first and second PGS with Y in both populations; τ + δ : correlation of 3rd PGS with Y in both populations. C: “Interaction” setting , One PGS differs from the other two in only one of the two populations; τ : correlation of each of the three PGS with Y in population 1. Correlation of PGS 1 and 2 with Y in population 2; τ + δ : correlation of the third PGS with Y in population 2.

https://doi.org/10.1371/journal.pgen.1011249.g002

hypothesis for correlational study example

With a sample size of 1000 we find we have at least 80% power for the within hypothesis test when testing a difference in correlation of at least 0.075; for the between hypothesis test when testing a difference in correlation of at least 0.1 for scores with at least 0.4 correlation with the outcome; and for the interaction hypothesis when testing a difference in correlation of at least 0.1 for scores with inter-PGS correlation of 0.7 or higher. The power of the tests increases substantially with a sample size of 5000 (Fig E in S3 Appendix ).

Briefly, we also found our correlation-based tests perform very similarly to R2 Redux when assessing the difference in performance between two polygenic scores in a single population (see section B in S2 Appendix , and Figs I-L in S3 Appendix ).

Applications

Examples using all of us cohort data.

Description of the All of Us (AoU) cohort . All of Us is a diverse cohort made up of people living in the United States established by the National Institutes of Health [ 11 ]. Here we use the whole genome sequencing sample contained in All of Us Controlled Tier Dataset V6, released in June 2022. In addition to the quality control performed by the All of Us research team [ 12 ], we restricted the set of variants in this analysis to biallelic variants with a minor allele frequency greater than 0.001. We also restricted our sample to unrelated individuals.

The individuals in All of Us were grouped according to genetic similarity to the superpopulations in the Human Genome Diversity project (HGDP) [ 13 ] and 1000G samples (1KG) [ 14 ]. The AoU research team trained a random forest model on chromosomes 20 and 21 from HGSP and 1KG, and this model was applied to the AoU cohort to generate what we will call the 1KG genetic-similarity groups, in line with recommendations by the National Academies of Sciences, Engineering, and Medicine [ 15 ].

Description of polygenic scores . To demonstrate the utility of our proposed methods, we considered previously defined polygenic scores for two outcomes, LDL cholesterol and height. For both outcomes, we will compare PGS developed with GWAS of varying populations, corresponding to genetic-ancestry groups defined in the original papers. Using the language of the original papers, we will describe scores built with GWAS of a population that is genetically-similar to a single 1KG-superpopulation as an ancestry-specific PGS , and scores built with the GWAS results of a meta-analysis of multiple 1KG-superpopulations as multi-ancestry PGS .

We considered 12 polygenic scores for LDL cholesterol developed and made publicly available by Graham et al. [ 16 ]. These PGS were developed by The Global Lipids Genetics Consortium with data from 201 studies. The twelve polygenic scores were optimized using two methods: PRS-CS [ 17 ] and pruning and thresholding (PT) [ 2 ] with six GWAS performed on samples of varying genetic ancestry, as defined in the original paper. One score for each method is built using multi-ancestry meta-analysis GWAS results and the other five consist of ancestry-specific GWASs of samples from populations of African, East Asian, South Asian, European, and Hispanic ancestry. For the single-ancestry pruning and thresholding scores, a UK Biobank ancestry-matched sample was used to estimate LD, and for the multi-ancestry PT score, a mixed ancestry sample of the UKBB was used to estimate LD. For the single-ancestry PRS-CS scores, the LD reference panels were derived from ancestry-matched samples from 1000 Genomes [ 14 ]. A mixed ancestry sample of the 1000 Genomes was used to estimate LD for the multi-ancestry PRS-CS score. The polygenic scores were downloaded from the PGS catalog (publication ID: PGP000230).

For height, we considered six polygenic scores developed by Yengo et al, on behalf of the GIANT consortium [ 18 ]. All of Us participants were not included in the discovery sample. The scores were developed with SBayesR [ 19 ]. There are five scores built with ancestry-specific GWAS and one score built with a multi-ancestry meta-analysis GWAS. Each of the ancestry-specific scores was built with ancestry-group matched LD matrix, and the multi-ancestry score was built with LD estimated from a European population sample. The ancestry-specific scores correspond to European, African, Hispanic, South Asian, and East Asian populations as defined by the original paper. The scores were downloaded from the PGS catalog (publication ID: PGP000382).

Score calculation.

Polygenic scores were computed in the AoU cohort using the PLINK2 [ 20 ] ––score function. The phenotypes for both height and LDL cholesterol was determined by first computing the mean of the available measurements for each individual and then transforming the values by inverse-rank normalization.

We evaluated the polygenic scores in each AoU 1KG genetic-similarity group separately. Sample sizes of these genetic-similarity groups are included in Table 1 . We computed the LDL cholesterol scores among the AoU individuals classified as similar to the African, Admixed American and European 1KG populations. We computed the height PGS among the AoU individuals classified as similar to the African, admixed American, European and East Asian 1KG populations. The phenotypes and PGS were adjusted for the first 10 ancestry principal components using linear regression within each genetic-similarity group prior to analysis using our correlation-based methods.

thumbnail

https://doi.org/10.1371/journal.pgen.1011249.t001

Application results

Ldl cholesterol examples..

The correlations between the polygenic scores for LDL cholesterol and inverse-rank mean LDL cholesterol in the African, admixed American, and European 1KG genetic-similarity groups in AoU are displayed in Fig 3 . To analyze the performance of the PGS using our proposed methods, we first applied the three Coranova hypotheses to the 12 polygenic scores for LDL cholesterol across the three 1KG genetic-similarity groups. We find significant evidence that at least one score has higher correlation with LDL cholesterol than the others ( p within = 1.9 × e −71 , Fig 3C ), and significant evidence that the pattern of score correlation with LDL cholesterol differs across the 1KG genetic-similarity groups ( p interaction = 1.5 × e −33 ). We do not find significant evidence that the mean PGS correlation with LDL cholesterol differs across the 1KG genetic-similarity groups ( p between = 0.4, Fig 3B ).

thumbnail

A: The correlation with LDL cholesterol for each PGS by 1KG genetic-similarity group, where each PGS is identified by the GWAS base and method with which it was derived. Sample size of each group specified. B: The average score performance in each AoU 1KG similarity group, corresponding to the “between hypothesis test” in which the null hypothesis is that the scores perform on average the same in the three groups. C: The performance of each PGS averaged across the AoU 1KG similarity groups, corresponding to the “within hypothesis test” in which the null hypothesis is that the scores have the same correlation with the outcome when averaged across populations.

https://doi.org/10.1371/journal.pgen.1011249.g003

In addition to the Coranova hypothesis tests, we can use the flexible framework to ask additional questions about the performance of the 12 PGS. One major question we can use the framework to assess is how the scores built with pruning and thresholding compare to the scores built with PRS-CS. We find that the PT PGS have higher correlation with LDL cholesterol than the PRS-CS scores, and at least one of the pairwise differences between the PT and PRS-CS scores built with the same GWAS significantly different from 0 ( p = 1 × e −30 , see section E in S1 Appendix ). We can also compare the correlation of the multi-ancestry PT PGS to the ancestry-specific PT PGSs ( Fig 4 ). Using a contrast matrix designed for this comparison, we fail to reject the null hypothesis that the multi-ancestry PGS and ancestry-specific scores have equal correlation with LDL ( p = 0.16, see section E in S1 Appendix ). Finally, we can also assess whether the multi-ancestry PT PGS differs in correlation with inverse-ranked mean LDL cholesterol across the three 1KG genetic-similarity groups using the Coranova ‘between’ hypothesis test on just the multi-ancestry PT PGS in the three groups. With this test, we find that the performance of the multi-ancestry PT PGS varies across the genetic-similarity groups ( p between = 0.04).

thumbnail

A: Bars represent correlation between PGS and inverse-rank normalized mean LDL cholesterol in each 1KG genetic-similarity group. B: Dot points represent difference in correlation between multi-ancestry and ancestry-specific PGSs and error bars indicate 95% confidence interval of this difference. C: Dot points represent difference in correlation of multi-ancestry PGS between 1KG genetic-similarity groups indicated in x-axis and error bars indicate 95% confidence interval of this difference.

https://doi.org/10.1371/journal.pgen.1011249.g004

Height examples.

The correlations between the polygenic scores for height and inverse-rank mean height in the African, admixed American, European, and East Asian 1KG genetic-similarity groups in AoU are displayed in Fig 5 . Considering the six scores in the four genetic-similarity groups, we find significant evidence that the average score correlation with height differs across the four genetic-similarity groups, ( p between = 1.4 × e −81 , Fig 5B ), that at least one score has higher correlation with height than the others ( p within = 6.2 × e −166 , Fig 5C ), and that the pattern of score correlation with height differs across the genetic-similarity groups ( p interaction < 1 × e −300 ).

thumbnail

A: The correlation with height for each PGS by 1KG genetic-similarity group, where each PGS is identified by the GWAS base with which it was derived. Sample size of each group specified. B: The average score performance in each AoU 1KG similarity group, corresponding to the “between hypothesis test” in which the null hypothesis is that the scores perform on average the same in the three groups. C: The performance of each PGS averaged across the AoU 1KG similarity groups, corresponding to the “within hypothesis test” in which the null hypothesis is that the scores have the same correlation with the outcome when averaged across populations.

https://doi.org/10.1371/journal.pgen.1011249.g005

For height, the multi-ancestry score does not outperform the others in all of the 1KG genetic-similarity groups ( Fig 6 ). We can use the Coranova ‘within’ hypothesis test to compare the multi-ancestry PGS to the ancestry-specific PGS in each genetic-similarity group separately. Among individuals classified as African, the score built with the African GWAS has a 0.115 higher correlation with height than multi-ancestry score, (95% CI: (0.102, 0.127), p within = 4.9 × e −69 ). Among individuals classified as European, the score built with the European GWAS has a 0.016 higher correlation with height than the multi-ancestry score (95% CI: (0.013, 0.018), p within = 1.8 × e −36 ). Among the other genetic-similarity groups the multi-ancestry score did outperform the corresponding ancestry PGSs. Among individuals classified as admixed American, the score built with the Hispanic GWAS has a 0.039 lower correlation with height than the multi-ancestry score (95% CI: (0.025, 0.053), p within = 4.5 × e −08 ). Among individuals classified as East Asian, the score built with East Asian GWAS had a 0.032 lower correlation with height than the multi-ancestry score (95% CI: (0.001, 0.062), p within = 0.04).

thumbnail

A: Bars represent correlation between PGS and inverse-rank normalized mean Height in each 1KG genetic-similarity group. B: Dot points represent difference in correlation between multi-ancestry and ancestry-specific PGSs and error bars indicate 95% confidence interval of this difference. C: Dot points represent difference in correlation of multi-ancestry PGS between 1KG genetic-similarity groups indicated in x-axis and error bars indicate 95% confidence interval of this difference.

https://doi.org/10.1371/journal.pgen.1011249.g006

We propose a flexible formal statistical framework for assessing the performance of two or more PGS in one or more populations with correlation, which was previously unaddressed by available methods, and provide an R package to make it easy for users to implement our proposed methods. We use simulations to evaluate our methods when applied to three PGS in two population samples and find we have well-controlled type I error and power greater than 80% for reasonable values to assess between-group and within-group differences in polygenic score performance as well as differences in the pattern of score performance across groups. Finally, we highlight the utility of our methods to adapt to researcher interests with two examples applying PGS for height and LDL cholesterol to the All of Us cohort.

The methods we propose here are uniquely appropriate for the comparison of performance of polygenic scores. Researchers generally have one of two goals when comparing polygenic scores: 1) selecting an optimal polygenic score for a specific outcome, or 2) determining an optimal polygenic score derivation procedure, comparing scores built with different methods, inputs or both. Both goals often require the comparison of multiple polygenic scores in multiple population samples. Thus, while many methods are available to compare model performance, such as those for nested and non-nested models [ 21 ], these are not generally appropriate for the needs of researchers studying polygenic scores, as they implement pair-wise comparisons. Our proposed methods can be used to compare many scores at one time, as well as perform pair-wise comparisons as necessary.

hypothesis for correlational study example

In our analysis of the 12 PGS developed by Graham et al. for LDL cholesterol, we apply the three Coranova hypothesis tests and first determine that the 12 PGS do not have the same correlation with LDL cholesterol when averaged across 1KG genetic-similarity groups (the “within” hypothesis), the overall the mean of the 12 PGS correlations with LDL does not differ by 1KG genetic-similarity group (the “between” hypothesis), and the pattern of correlations between the 12 PGS and the LDL outcome differs across the 1KG genetic-similarity groups (“interaction” hypothesis). In other words, we find that some PGS perform better than others, but overall, the PGS on average perform the same across the genetic-similarity groups and the pattern of the PGS performance differs by genetic-similarity group. We also use our flexible framework to identify whether we can recommend a single PGS from the 12. First, we conclude that the scores built with pruning and thresholding (PT) outperform the scores built with PRS-CS. We then compare the performance of the multi-ancestry PT PGS with the ancestry-specific PT scores in each genetic-similarity group and find that the multi-ancestry PGS does not have a significantly higher correlation with LDL cholesterol than the ancestry-specific scores. We suspect that if our sample sizes were larger we would be able to detect a significant difference, however, our results at least show that the multi-ancestry PGS performs as well as the ancestry-specific scores in each genetic-similarity group, making it an appropriate PGS to employ in all three groups. When we compare the correlation of the multi-PGS with LDL cholesterol in the three genetic-similarity groups, we find that the score performance varies significantly. It is surprising that the score is more highly correlated with LDL in the African genetic-similarity group than the European genetic-similarity group, considering individuals of African and admixed African ancestry made up only 6% of the original GWAS sample [ 16 ]. This finding may not replicate in other cohorts with different context characteristics [ 22 ]. Still, we conclude multi-ancestry PT PGS is an optimal choice for a LDL PGS when working with a multi-ancestry population.

In contrast, in our analysis of the six PGS developed by Yengo et al. for height, we cannot make such a case for the multi-ancestry PGS, or any of the six PGS. When we apply the three Coranova hypothesis tests, we reject all three null hypotheses and find that the PGS do not have the same correlation with height, the mean correlation of the PGS with height is not consistent across the 1KG genetic-similarity groups, and that the pattern of score performance differs by genetic-similarity group. When we consider the pairwise differences between the multi-ancestry and ancestry-specific PGS in each genetic-similarity group, we find that in the AoU European and African genetic-similarity groups, the ancestry-specific PGS have a higher correlation with height than the multi-ancestry PGS, and in the AoU Admixed American and East Asian genetic-similarity groups, the reverse is true and the multi-ancestry PGS outperform the ancestry-specific scores. Thus, we cannot recommend a single score be used for all populations and recommend that the ancestry-specific PGS are used when working with individuals classified as similar to the 1KG European or African populations, and that the multi-ancestry PGS is used when working with individuals classified as similar to the 1KG admixed American or East Asian populations.

Our two examples comparing PGS performance in AoU highlight the strengths and weaknesses of our proposed approach. One of the main advantages to our proposed method is the ability to assess the performance of multiple PGS in multiple population samples simultaneously before testing more specific hypotheses. In each example, we were able to ask high-level questions about the PGS performance in multiple populations, like whether there was one score that outperformed the others and whether mean PGS correlation with the phenotype of interest differed by genetic-similarity group. Based on the results from the initial hypothesis tests, we then assessed more specific questions like whether the multi-ancestry scores outperformed the ancestry-specific scores. Our framework is highly flexible and can be employed to assess a variety of hypotheses. The hypotheses of interest will be dependent on specifics of the analysis.

However, considering the LDL cholesterol results comparing the multi-ancestry PGS to the ancestry-specific PGS, we can see our methods are sensitive to sample size. The confidence interval around the estimate of the difference between the multi-ancestry PGS and European-specific PGS is much smaller than the confidence interval around the estimate of the difference between the multi-ancestry PGS and admixed American-specific PGS ( Fig 4B ) because there are over seven times as many people classified as European with LDL cholesterol levels in AoU than there are people classified as admixed American with LDL cholesterol levels in AoU. Thus, while our methods can be used to assess PGS performance in different populations, if sample sizes differ, caution must be used when interpreting results.

We also advise caution when performing the between group hypothesis test when comparing many polygenic scores at once, since these tests are designed to detect a difference in mean polygenic score correlation across population groups. Additionally, when applying these tests to multiple genetic-similarity populations, we recommend that both outcome and polygenic score are adjusted for genetic principal components (PCs) with a linear model prior to analysis. This step is important to ensure that the relationship between the polygenic score and outcome is not confounded by population structure [ 23 ]. For this reason, we do not recommend utilizing this method to compare polygenic scores for disease traits, as it is not possible to adjust a binary outcome for covariates with a linear model. If researchers have reason to believe confounding by population structure is not a concern, we have implemented a nonparametric version of Coranova available with the perform_coranova_nonparametric function which does not assume the data is normally distributed (see section D of S1 Appendix and section C of S2 Appendix for more details). Finally, the correlation between a polygenic score and its intended outcome is typically positive. If the correlation is not positive, this may indicate an error in designated effect allele in the polygenic score computation. We provide recommendations for implementation of these methods and substantive examples of application in our user manual available on github.

If polygenic scores are ever to be employed in a clinical setting, it is imperative that we have models that can perform well in all groups [ 3 ]. A necessity for the development of PGS for diverse populations is methodology to assess PGS performance in multiple populations. As more data becomes available, and researchers have greater access to GWAS results from varied populations, our proposed methods will be an important tool in the effort to ensure that PGS perform well in all populations.

Supporting information

S1 appendix. supplemental methods..

https://doi.org/10.1371/journal.pgen.1011249.s001

S2 Appendix. Supplemental results.

https://doi.org/10.1371/journal.pgen.1011249.s002

S3 Appendix. Supplemental figures.

https://doi.org/10.1371/journal.pgen.1011249.s003

S4 Appendix. R package vignettes.

https://doi.org/10.1371/journal.pgen.1011249.s004

Acknowledgments

We thank the Global Lipids Genetics Consortium and GIANT Consortium for making their polygenic scores publicly accessible. We also gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. Finally, we thank the National Institutes of Health’s All of Us Research Program for making available the participant data examined in this study.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 12. All of Us contributors. All Of Us Research Program Genomic Research Data Quality Report; 2023. https://support.researchallofus.org/hc/en-us/article_attachments/14969477805460/All_Of_Us_Q2_2022_Release_Genomic_Quality_Report__1_.pdf .
  • 15. of Sciences Engineering NA, Medicine. Using Population Descriptors in Genetics and Genomics Research. National Academies Press; 2023. Available from: https://doi.org/10.17226/26902 .

COMMENTS

  1. Correlational Study Overview & Examples

    A correlational study is an experimental design that evaluates only the correlation between variables. The researchers record measurements but do not control or manipulate the variables. Correlational research is a form of observational study. A correlation indicates that as the value of one variable increases, the other tends to change in a ...

  2. How to Write a Hypothesis for Correlation

    A hypothesis is a testable statement about how something works in the natural world. While some hypotheses predict a causal relationship between two variables, other hypotheses predict a correlation between them. According to the Research Methods Knowledge Base, a correlation is a single number that describes the relationship between two variables.

  3. 11.2: Correlation Hypothesis Test

    The p-value is calculated using a t -distribution with n − 2 degrees of freedom. The formula for the test statistic is t = r√n − 2 √1 − r2. The value of the test statistic, t, is shown in the computer or calculator output along with the p-value. The test statistic t has the same sign as the correlation coefficient r.

  4. Correlational Research

    Revised on 5 December 2022. A correlational research design investigates relationships between variables without the researcher controlling or manipulating any of them. A correlation reflects the strength and/or direction of the relationship between two (or more) variables. The direction of a correlation can be either positive or negative.

  5. 12.1.2: Hypothesis Test for a Correlation

    The t-test is a statistical test for the correlation coefficient. It can be used when x x and y y are linearly related, the variables are random variables, and when the population of the variable y y is normally distributed. The formula for the t-test statistic is t = r ( n − 2 1 −r2)− −−−−−−−√ t = r ( n − 2 1 − r 2).

  6. Correlational Research

    Correlational research is a type of study that explores how variables are related to each other. It can help you identify patterns, trends, and predictions in your data. In this guide, you will learn when and how to use correlational research, and what its advantages and limitations are. You will also find examples of correlational research questions and designs. If you want to know the ...

  7. How to Write a Strong Hypothesis

    5. Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  8. What is Correlational Research? (+ Design, Examples)

    To test the null hypothesis that the correlation coefficient is zero (i.e., no correlation), you can use inferential statistics such as the t-test or z-test. ... For example, if a correlational study examines the relationship between socioeconomic status (SES) and educational attainment using a sample composed primarily of high-income ...

  9. 7.2 Correlational Research

    13.2 Some Basic Null Hypothesis Tests. 13.3 Additional Considerations. Research Methods in Psychology. 7.2 Correlational Research Learning Objectives. Define correlational research and give several examples. ... An example is a study by Brett Pelham and his colleagues on "implicit egotism"—the tendency for people to prefer people, places ...

  10. Pearson Correlation Coefficient (r)

    Revised on February 10, 2024. The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation. It is a number between -1 and 1 that measures the strength and direction of the relationship between two variables. When one variable changes, the other variable changes in the same direction.

  11. Correlation Hypothesis

    A correlational hypothesis in research methodology is a testable hypothesis statement that predicts the presence and nature of a relationship between two or more variables. It forms the basis for conducting a correlational study, where the goal is to measure and analyze the degree of association between variables.

  12. 1.9

    Let's perform the hypothesis test on the husband's age and wife's age data in which the sample correlation based on n = 170 couples is r = 0.939. To test H 0: ρ = 0 against the alternative H A: ρ ≠ 0, we obtain the following test statistic: t ∗ = r n − 2 1 − R 2 = 0.939 170 − 2 1 − 0.939 2 = 35.39. To obtain the P -value, we need ...

  13. Correlational Study Examples: AP® Psychology Crash Course

    Examples of achievements included plans to attend college and time spent on homework. So translating into terms of correlational studies, there was, for example, a strong correlation between "internal locus of control" and "achievement motivation," as the correlation coefficient between these two variables neared +1.00.

  14. 6.2 Correlational Research

    The other common situations in which the value of Pearson's r can be misleading is when one or both of the variables have a limited range in the sample relative to the population.This problem is referred to as restriction of range.Assume, for example, that there is a strong negative correlation between people's age and their enjoyment of hip hop music as shown by the scatterplot in Figure 6.6.

  15. Correlation Studies in Psychology Research

    A correlational study is a type of research design that looks at the relationships between two or more variables. Correlational studies are non-experimental, which means that the experimenter does not manipulate or control any of the variables. A correlation refers to a relationship between two variables. Correlations can be strong or weak and ...

  16. Correlational Research

    Correlational research is a type of non-experimental research in which the researcher measures two variables (binary or continuous) and assesses the statistical relationship (i.e., the correlation) between them with little or no effort to control extraneous variables. There are many reasons that researchers interested in statistical ...

  17. Correlation: Meaning, Types, Examples & Coefficient

    A positive correlation is a relationship between two variables in which both variables move in the same direction. Therefore, one variable increases as the other variable increases, or one variable decreases while the other decreases. An example of a positive correlation would be height and weight. Taller people tend to be heavier.

  18. Correlational Research

    Correlational research has many practical applications in various fields, including: Psychology: Correlational research is commonly used in psychology to explore the relationships between variables such as personality traits, behaviors, and mental health outcomes. For example, researchers may use correlational research to examine the ...

  19. Correlational Research: What it is with Examples

    Mainly three types of correlational research have been identified: 1. Positive correlation:A positive relationship between two variables is when an increase in one variable leads to a rise in the other variable. A decrease in one variable will see a reduction in the other variable. For example, the amount of money a person has might positively ...

  20. Correlation Analysis

    Here are a few examples of how correlation analysis could be applied in different contexts: Education: A researcher might want to determine if there's a relationship between the amount of time students spend studying each week and their exam scores. The two variables would be "study time" and "exam scores".

  21. Chapter 12 Methods for Correlational Studies

    Correlational studies aim to find out if there are differences in the characteristics of a population depending on whether or not its subjects have been exposed to an event of interest in the naturalistic setting. In eHealth, correlational studies are often used to determine whether the use of an eHealth system is associated with a particular set of user characteristics and/or quality of care ...

  22. Interpretation of correlations in clinical research

    Thus, the correlation is the measure of the relationship between X and Y, and it ranges from −1 to 1. Its value (or coefficient) is scaled within this range to assist in interpretation, with 0 indicating no relationship between variables X and Y, and −1 or 1 indicating the ability to perfectly predict X from Y or Y from X (see Figure 1 ). A ...

  23. Correlational Research: Design, Methods and Examples

    Correlational research designs are often used in psychology, epidemiology, medicine and nursing. They show the strength of correlation that exists between the variables within a population. For this reason, these studies are also known as ecological studies. Correlational research design methods are characterized by such traits:

  24. Correlation-based tests for the formal comparison of polygenic scores

    With a sample size of 1000 we find we have at least 80% power for the within hypothesis test when testing a difference in correlation of at least 0.075; for the between hypothesis test when testing a difference in correlation of at least 0.1 for scores with at least 0.4 correlation with the outcome; and for the interaction hypothesis when ...