Statology

Understanding the Null Hypothesis for Linear Regression

Linear regression is a technique we can use to understand the relationship between one or more predictor variables and a response variable .

If we only have one predictor variable and one response variable, we can use simple linear regression , which uses the following formula to estimate the relationship between the variables:

ŷ = β 0 + β 1 x

  • ŷ: The estimated response value.
  • β 0 : The average value of y when x is zero.
  • β 1 : The average change in y associated with a one unit increase in x.
  • x: The value of the predictor variable.

Simple linear regression uses the following null and alternative hypotheses:

  • H 0 : β 1 = 0
  • H A : β 1 ≠ 0

The null hypothesis states that the coefficient β 1 is equal to zero. In other words, there is no statistically significant relationship between the predictor variable, x, and the response variable, y.

The alternative hypothesis states that β 1 is not equal to zero. In other words, there is a statistically significant relationship between x and y.

If we have multiple predictor variables and one response variable, we can use multiple linear regression , which uses the following formula to estimate the relationship between the variables:

ŷ = β 0 + β 1 x 1 + β 2 x 2 + … + β k x k

  • β 0 : The average value of y when all predictor variables are equal to zero.
  • β i : The average change in y associated with a one unit increase in x i .
  • x i : The value of the predictor variable x i .

Multiple linear regression uses the following null and alternative hypotheses:

  • H 0 : β 1 = β 2 = … = β k = 0
  • H A : β 1 = β 2 = … = β k ≠ 0

The null hypothesis states that all coefficients in the model are equal to zero. In other words, none of the predictor variables have a statistically significant relationship with the response variable, y.

The alternative hypothesis states that not every coefficient is simultaneously equal to zero.

The following examples show how to decide to reject or fail to reject the null hypothesis in both simple linear regression and multiple linear regression models.

Example 1: Simple Linear Regression

Suppose a professor would like to use the number of hours studied to predict the exam score that students will receive in his class. He collects data for 20 students and fits a simple linear regression model.

The following screenshot shows the output of the regression model:

Output of simple linear regression in Excel

The fitted simple linear regression model is:

Exam Score = 67.1617 + 5.2503*(hours studied)

To determine if there is a statistically significant relationship between hours studied and exam score, we need to analyze the overall F value of the model and the corresponding p-value:

  • Overall F-Value:  47.9952
  • P-value:  0.000

Since this p-value is less than .05, we can reject the null hypothesis. In other words, there is a statistically significant relationship between hours studied and exam score received.

Example 2: Multiple Linear Regression

Suppose a professor would like to use the number of hours studied and the number of prep exams taken to predict the exam score that students will receive in his class. He collects data for 20 students and fits a multiple linear regression model.

Multiple linear regression output in Excel

The fitted multiple linear regression model is:

Exam Score = 67.67 + 5.56*(hours studied) – 0.60*(prep exams taken)

To determine if there is a jointly statistically significant relationship between the two predictor variables and the response variable, we need to analyze the overall F value of the model and the corresponding p-value:

  • Overall F-Value:  23.46
  • P-value:  0.00

Since this p-value is less than .05, we can reject the null hypothesis. In other words, hours studied and prep exams taken have a jointly statistically significant relationship with exam score.

Note: Although the p-value for prep exams taken (p = 0.52) is not significant, prep exams combined with hours studied has a significant relationship with exam score.

Additional Resources

Understanding the F-Test of Overall Significance in Regression How to Read and Interpret a Regression Table How to Report Regression Results How to Perform Simple Linear Regression in Excel How to Perform Multiple Linear Regression in Excel

Featured Posts

hypothesis linear regression example

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

2 Replies to “Understanding the Null Hypothesis for Linear Regression”

Thank you Zach, this helped me on homework!

Great articles, Zach.

I would like to cite your work in a research paper.

Could you provide me with your last name and initials.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

  • Prompt Library
  • DS/AI Trends
  • Stats Tools
  • Interview Questions
  • Generative AI
  • Machine Learning
  • Deep Learning

Linear regression hypothesis testing: Concepts, Examples

Simple linear regression model

In relation to machine learning , linear regression is defined as a predictive modeling technique that allows us to build a model which can help predict continuous response variables as a function of a linear combination of explanatory or predictor variables. While training linear regression models, we need to rely on hypothesis testing in relation to determining the relationship between the response and predictor variables. In the case of the linear regression model, two types of hypothesis testing are done. They are T-tests and F-tests . In other words, there are two types of statistics that are used to assess whether linear regression models exist representing response and predictor variables. They are t-statistics and f-statistics. As data scientists , it is of utmost importance to determine if linear regression is the correct choice of model for our particular problem and this can be done by performing hypothesis testing related to linear regression response and predictor variables. Many times, it is found that these concepts are not very clear with a lot many data scientists. In this blog post, we will discuss linear regression and hypothesis testing related to t-statistics and f-statistics . We will also provide an example to help illustrate how these concepts work.

Table of Contents

What are linear regression models?

A linear regression model can be defined as the function approximation that represents a continuous response variable as a function of one or more predictor variables. While building a linear regression model, the goal is to identify a linear equation that best predicts or models the relationship between the response or dependent variable and one or more predictor or independent variables.

There are two different kinds of linear regression models. They are as follows:

  • Simple or Univariate linear regression models : These are linear regression models that are used to build a linear relationship between one response or dependent variable and one predictor or independent variable. The form of the equation that represents a simple linear regression model is Y=mX+b, where m is the coefficients of the predictor variable and b is bias. When considering the linear regression line, m represents the slope and b represents the intercept.
  • Multiple or Multi-variate linear regression models : These are linear regression models that are used to build a linear relationship between one response or dependent variable and more than one predictor or independent variable. The form of the equation that represents a multiple linear regression model is Y=b0+b1X1+ b2X2 + … + bnXn, where bi represents the coefficients of the ith predictor variable. In this type of linear regression model, each predictor variable has its own coefficient that is used to calculate the predicted value of the response variable.

While training linear regression models, the requirement is to determine the coefficients which can result in the best-fitted linear regression line. The learning algorithm used to find the most appropriate coefficients is known as least squares regression . In the least-squares regression method, the coefficients are calculated using the least-squares error function. The main objective of this method is to minimize or reduce the sum of squared residuals between actual and predicted response values. The sum of squared residuals is also called the residual sum of squares (RSS). The outcome of executing the least-squares regression method is coefficients that minimize the linear regression cost function .

The residual e of the ith observation is represented as the following where [latex]Y_i[/latex] is the ith observation and [latex]\hat{Y_i}[/latex] is the prediction for ith observation or the value of response variable for ith observation.

[latex]e_i = Y_i – \hat{Y_i}[/latex]

The residual sum of squares can be represented as the following:

[latex]RSS = e_1^2 + e_2^2 + e_3^2 + … + e_n^2[/latex]

The least-squares method represents the algorithm that minimizes the above term, RSS.

Once the coefficients are determined, can it be claimed that these coefficients are the most appropriate ones for linear regression? The answer is no. After all, the coefficients are only the estimates and thus, there will be standard errors associated with each of the coefficients.  Recall that the standard error is used to calculate the confidence interval in which the mean value of the population parameter would exist. In other words, it represents the error of estimating a population parameter based on the sample data. The value of the standard error is calculated as the standard deviation of the sample divided by the square root of the sample size. The formula below represents the standard error of a mean.

[latex]SE(\mu) = \frac{\sigma}{\sqrt(N)}[/latex]

Thus, without analyzing aspects such as the standard error associated with the coefficients, it cannot be claimed that the linear regression coefficients are the most suitable ones without performing hypothesis testing. This is where hypothesis testing is needed . Before we get into why we need hypothesis testing with the linear regression model, let’s briefly learn about what is hypothesis testing?

Train a Multiple Linear Regression Model using R

Before getting into understanding the hypothesis testing concepts in relation to the linear regression model, let’s train a multi-variate or multiple linear regression model and print the summary output of the model which will be referred to, in the next section. 

The data used for creating a multi-linear regression model is BostonHousing which can be loaded in RStudioby installing mlbench package. The code is shown below:

install.packages(“mlbench”) library(mlbench) data(“BostonHousing”)

Once the data is loaded, the code shown below can be used to create the linear regression model.

attach(BostonHousing) BostonHousing.lm <- lm(log(medv) ~ crim + chas + rad + lstat) summary(BostonHousing.lm)

Executing the above command will result in the creation of a linear regression model with the response variable as medv and predictor variables as crim, chas, rad, and lstat. The following represents the details related to the response and predictor variables:

  • log(medv) : Log of the median value of owner-occupied homes in USD 1000’s
  • crim : Per capita crime rate by town
  • chas : Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  • rad : Index of accessibility to radial highways
  • lstat : Percentage of the lower status of the population

The following will be the output of the summary command that prints the details relating to the model including hypothesis testing details for coefficients (t-statistics) and the model as a whole (f-statistics) 

linear regression model summary table r.png

Hypothesis tests & Linear Regression Models

Hypothesis tests are the statistical procedure that is used to test a claim or assumption about the underlying distribution of a population based on the sample data. Here are key steps of doing hypothesis tests with linear regression models:

  • Hypothesis formulation for T-tests: In the case of linear regression, the claim is made that there exists a relationship between response and predictor variables, and the claim is represented using the non-zero value of coefficients of predictor variables in the linear equation or regression model. This is formulated as an alternate hypothesis. Thus, the null hypothesis is set that there is no relationship between response and the predictor variables . Hence, the coefficients related to each of the predictor variables is equal to zero (0). So, if the linear regression model is Y = a0 + a1x1 + a2x2 + a3x3, then the null hypothesis for each test states that a1 = 0, a2 = 0, a3 = 0 etc. For all the predictor variables, individual hypothesis testing is done to determine whether the relationship between response and that particular predictor variable is statistically significant based on the sample data used for training the model. Thus, if there are, say, 5 features, there will be five hypothesis tests and each will have an associated null and alternate hypothesis.
  • Hypothesis formulation for F-test : In addition, there is a hypothesis test done around the claim that there is a linear regression model representing the response variable and all the predictor variables. The null hypothesis is that the linear regression model does not exist . This essentially means that the value of all the coefficients is equal to zero. So, if the linear regression model is Y = a0 + a1x1 + a2x2 + a3x3, then the null hypothesis states that a1 = a2 = a3 = 0.
  • F-statistics for testing hypothesis for linear regression model : F-test is used to test the null hypothesis that a linear regression model does not exist, representing the relationship between the response variable y and the predictor variables x1, x2, x3, x4 and x5. The null hypothesis can also be represented as x1 = x2 = x3 = x4 = x5 = 0. F-statistics is calculated as a function of sum of squares residuals for restricted regression (representing linear regression model with only intercept or bias and all the values of coefficients as zero) and sum of squares residuals for unrestricted regression (representing linear regression model). In the above diagram, note the value of f-statistics as 15.66 against the degrees of freedom as 5 and 194. 
  • Evaluate t-statistics against the critical value/region : After calculating the value of t-statistics for each coefficient, it is now time to make a decision about whether to accept or reject the null hypothesis. In order for this decision to be made, one needs to set a significance level, which is also known as the alpha level. The significance level of 0.05 is usually set for rejecting the null hypothesis or otherwise. If the value of t-statistics fall in the critical region, the null hypothesis is rejected. Or, if the p-value comes out to be less than 0.05, the null hypothesis is rejected.
  • Evaluate f-statistics against the critical value/region : The value of F-statistics and the p-value is evaluated for testing the null hypothesis that the linear regression model representing response and predictor variables does not exist. If the value of f-statistics is more than the critical value at the level of significance as 0.05, the null hypothesis is rejected. This means that the linear model exists with at least one valid coefficients. 
  • Draw conclusions : The final step of hypothesis testing is to draw a conclusion by interpreting the results in terms of the original claim or hypothesis. If the null hypothesis of one or more predictor variables is rejected, it represents the fact that the relationship between the response and the predictor variable is not statistically significant based on the evidence or the sample data we used for training the model. Similarly, if the f-statistics value lies in the critical region and the value of the p-value is less than the alpha value usually set as 0.05, one can say that there exists a linear regression model.

Why hypothesis tests for linear regression models?

The reasons why we need to do hypothesis tests in case of a linear regression model are following:

  • By creating the model, we are establishing a new truth (claims) about the relationship between response or dependent variable with one or more predictor or independent variables. In order to justify the truth, there are needed one or more tests. These tests can be termed as an act of testing the claim (or new truth) or in other words, hypothesis tests.
  • One kind of test is required to test the relationship between response and each of the predictor variables (hence, T-tests)
  • Another kind of test is required to test the linear regression model representation as a whole. This is called F-test.

While training linear regression models, hypothesis testing is done to determine whether the relationship between the response and each of the predictor variables is statistically significant or otherwise. The coefficients related to each of the predictor variables is determined. Then, individual hypothesis tests are done to determine whether the relationship between response and that particular predictor variable is statistically significant based on the sample data used for training the model. If at least one of the null hypotheses is rejected, it represents the fact that there exists no relationship between response and that particular predictor variable. T-statistics is used for performing the hypothesis testing because the standard deviation of the sampling distribution is unknown. The value of t-statistics is compared with the critical value from the t-distribution table in order to make a decision about whether to accept or reject the null hypothesis regarding the relationship between the response and predictor variables. If the value falls in the critical region, then the null hypothesis is rejected which means that there is no relationship between response and that predictor variable. In addition to T-tests, F-test is performed to test the null hypothesis that the linear regression model does not exist and that the value of all the coefficients is zero (0). Learn more about the linear regression and t-test in this blog – Linear regression t-test: formula, example .

Recent Posts

Ajitesh Kumar

  • Completion Model vs Chat Model: Python Examples - June 30, 2024
  • LLM Hosting Strategy, Options & Cost: Examples - June 30, 2024
  • Application Architecture for LLM Applications: Examples - June 25, 2024

Ajitesh Kumar

One response.

Very informative

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Search for:
  • Excellence Awaits: IITs, NITs & IIITs Journey

ChatGPT Prompts (250+)

  • Generate Design Ideas for App
  • Expand Feature Set of App
  • Create a User Journey Map for App
  • Generate Visual Design Ideas for App
  • Generate a List of Competitors for App
  • Completion Model vs Chat Model: Python Examples
  • LLM Hosting Strategy, Options & Cost: Examples
  • Application Architecture for LLM Applications: Examples
  • Python Pickle Security Issues / Risk
  • Pricing Analytics in Banking: Strategies, Examples

Data Science / AI Trends

  • • Prepend any arxiv.org link with talk2 to load the paper into a responsive chat application
  • • Custom LLM and AI Agents (RAG) On Structured + Unstructured Data - AI Brain For Your Organization
  • • Guides, papers, lecture, notebooks and resources for prompt engineering
  • • Common tricks to make LLMs efficient and stable
  • • Machine learning in finance

Free Online Tools

  • Create Scatter Plots Online for your Excel Data
  • Histogram / Frequency Distribution Creation Tool
  • Online Pie Chart Maker Tool
  • Z-test vs T-test Decision Tool
  • Independent samples t-test calculator

Recent Comments

I found it very helpful. However the differences are not too understandable for me

Very Nice Explaination. Thankyiu very much,

in your case E respresent Member or Oraganization which include on e or more peers?

Such a informative post. Keep it up

Thank you....for your support. you given a good solution for me.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Linear Regression Explained with Examples

By Jim Frost 13 Comments

What is Linear Regression?

Linear regression models the relationships between at least one explanatory variable and an outcome variable. This flexible analysis allows you to separate the effects of complicated research questions, allowing you to isolate each variable’s role. Additionally, linear models can fit curvature and interaction effects.

Statisticians refer to the explanatory variables in linear regression as independent variables (IV) and the outcome as dependent variables (DV). When a linear model has one IV, the procedure is known as simple linear regression. When there are more than one IV, statisticians refer to it as multiple regression. These models assume that the average value of the dependent variable depends on a linear function of the independent variables.

Linear regression has two primary purposes—understanding the relationships between variables and prediction.

  • The coefficients represent the estimated magnitude and direction (positive/negative) of the relationship between each independent variable and the dependent variable.
  • The equation allows you to predict the mean value of the dependent variable given the values of the independent variables that you specify.

Linear regression finds the constant and coefficient values for the IVs for a line that best fit your sample data. The graph below shows the best linear fit for the height and weight data points, revealing the mathematical relationship between them. Additionally, you can use the line’s equation to predict future values of the weight given a person’s height.

Linear regression was one of the earliest types of regression analysis to be rigorously studied and widely applied in real-world scenarios. This popularity stems from the relative ease of fitting linear models to data and the straightforward nature of analyzing the statistical properties of these models. Unlike more complex models that relate to their parameters in a non-linear way, linear models simplify both the estimation and the interpretation of data.

In this post, you’ll learn how to interprete linear regression with an example, about the linear formula, how it finds the coefficient estimates , and its assumptions .

Learn more about when you should use regression analysis  and independent and dependent variables .

Linear Regression Example

Suppose we use linear regression to model how the outside temperature in Celsius and Insulation thickness in centimeters, our two independent variables, relate to air conditioning costs in dollars (dependent variable).

Let’s interpret the results for the following multiple linear regression equation:

Air Conditioning Costs$ = 2 * Temperature C – 1.5 * Insulation CM

The coefficient sign for Temperature is positive (+2), which indicates a positive relationship between Temperature and Costs. As the temperature increases, so does air condition costs. More specifically, the coefficient value of 2 indicates that for every 1 C increase, the average air conditioning cost increases by two dollars.

On the other hand, the negative coefficient for insulation (–1.5) represents a negative relationship between insulation and air conditioning costs. As insulation thickness increases, air conditioning costs decrease. For every 1 CM increase, the average air conditioning cost drops by $1.50.

We can also enter values for temperature and insulation into this linear regression equation to predict the mean air conditioning cost.

Learn more about interpreting regression coefficients and using regression to make predictions .

Linear Regression Formula

Linear regression refers to the form of the regression equations these models use. These models follow a particular formula arrangement that requires all terms to be one of the following:

  • The constant
  • A parameter multiplied by an independent variable (IV)

Then, you build the linear regression formula by adding the terms together. These rules limit the form to just one type:

Dependent variable = constant + parameter * IV + … + parameter * IV

Linear model equation.

This formula is linear in the parameters. However, despite the name linear regression, it can model curvature. While the formula must be linear in the parameters, you can raise an independent variable by an exponent to model curvature . For example, if you square an independent variable, linear regression can fit a U-shaped curve.

Specifying the correct linear model requires balancing subject-area knowledge, statistical results, and satisfying the assumptions.

Learn more about the difference between linear and nonlinear models and specifying the correct regression model .

How to Find the Linear Regression Line

Linear regression can use various estimation methods to find the best-fitting line. However, analysts use the least squares most frequently because it is the most precise prediction method that doesn’t systematically overestimate or underestimate the correct values when you can satisfy all its assumptions.

The beauty of the least squares method is its simplicity and efficiency. The calculations required to find the best-fitting line are straightforward, making it accessible even for beginners and widely used in various statistical applications. Here’s how it works:

  • Objective : Minimize the differences between the observed and the linear regression model’s predicted values . These differences are known as “ residuals ” and represent the errors in the model values.
  • Minimizing Errors : This method focuses on making the sum of these squared differences as small as possible.
  • Best-Fitting Line : By finding the values of the model parameters that achieve this minimum sum, the least squares method effectively determines the best-fitting line through the data points. 

By employing the least squares method in linear regression and checking the assumptions in the next section, you can ensure that your model is as precise and unbiased as possible. This method’s ability to minimize errors and find the best-fitting line is a valuable asset in statistical analysis.

Assumptions

Linear regression using the least squares method has the following assumptions:

  • A linear model satisfactorily fits the relationship.
  • The residuals follow a normal distribution.
  • The residuals have a constant scatter.
  • Independent observations.
  • The IVs are not perfectly correlated.

Residuals are the difference between the observed value and the mean value that the model predicts for that observation. If you fail to satisfy the assumptions, the results might not be valid.

Learn more about the assumptions for ordinary least squares and How to Assess Residual Plots .

Yan, Xin (2009),  Linear Regression Analysis: Theory and Computing

Share this:

hypothesis linear regression example

Reader Interactions

' src=

May 9, 2024 at 9:10 am

Why not perform centering or standardization with all linear regression to arrive at a better estimate of the y-intercept?

' src=

May 9, 2024 at 4:48 pm

I talk about centering elsewhere. This article just covers the basics of what linear regression does.

A little statistical niggle on centering creating a “better estimate” of the y-intercept. In statistics, there’s a specific meaning to “better estimate,” relating to precision and a lack of bias. Centering (or standardizing) doesn’t create a better estimate in that sense. It can create a more interpretable value in some situations, which is better in common usage.

' src=

August 16, 2023 at 5:10 pm

Hi Jim, I’m trying to understand why the Beta and significance changes in a linear regression, when I add another independent variable to the model. I am currently working on a mediation analysis, and as you know the linear regression is part of that. A simple linear regression between the IV (X) and the DV (Y) returns a statistically significant result. But when I add another IV (M), X becomes insignificant. Can you explain this? Seeking some clarity, Peta.

August 16, 2023 at 11:12 pm

This is a common occurrence in linear regression and is crucial for mediation analysis.

By adding M (mediator), it might be capturing some of the variance that was initially attributed to X. If M is a mediator, it means the effect of X on Y is being channeled through M. So when M is included in the model, it’s possible that the direct effect of X on Y becomes weaker or even insignificant, while the indirect effect (through M) becomes significant.

If X and M share variance in predicting Y, when both are in the model, they might “compete” for explaining the variance in Y. This can lead to a situation where the significance of X drops when M is added.

I hope that helps!

' src=

July 31, 2022 at 7:56 pm

July 30, 2022 at 2:49 pm

Jim, Hi! I am working on an interpretation of multiple linear regression. I am having a bit of trouble getting help. is there a way to post the table so that I may initiate a coherent discussion on my interpretation?

' src=

April 28, 2022 at 3:24 pm

Is it possible that we get significant correlations but no significant prediction in a multiple regression analysis? I am seeing that with my data and I am so confused. Could mediation be a factor (i.e IVs are not predicting the outcome variables because the relationship is made possible through mediators)?

April 29, 2022 at 4:37 pm

I’m not sure what you mean by “significant prediction.” Typically, the predictions you obtain from regression analysis will be a fitted value (the prediction) and a prediction interval that indicates the precision of the prediction (how close is it likely to be to the correct value). We don’t usually refer to “significance” when talking about predictions. Can you explain what you mean? Thanks!

' src=

March 25, 2022 at 7:19 am

I want to do a multiple regression analysis is SPSS (creating a predictive model), where IQ is my dependent variable and my independent variables contains of different cognitive domains. The IQ scores are already scaled for age. How can I controlling my independent variables for age, whitout doing it again for the IQ scores? I can’t add age as an independent variable in the model.

I hope that you can give me some advise, thank you so much!

March 28, 2022 at 9:27 pm

If you include age as an independent variable, the model controls for it while calculating the effects of the other IVs. And don’t worry, including age as an IV won’t double count it for IQ because that is your DV.

' src=

March 2, 2022 at 8:23 am

Hi Jim, Is there a reason you would want your covariates to be associated with your independent variable before including them in the model? So in deciding which covariates to include in the model, it was specified that covariates associated with both the dependent variable and independent variable at p<0.10 will be included in the model.

My question is why would you want the covariates to be associated with the independent variable?

March 2, 2022 at 4:38 pm

In some cases, it’s absolutely crucial to include covariates that correlate with other independent variables, although it’s not a sufficient reason by itself. When you have a potential independent variable that correlates with other IVs and it also correlates with the dependent variable, it becomes a confounding variable and omitting it from the model can cause a bias in the variables that you do include. In this scenario, the degree of bias depends on the strengths of the correlations involved. Observational studies are particularly susceptible to this type of omitted variable bias. However, when you’re performing a true, randomized experiment, this type of bias becomes a non-issue.

I’ve never heard of a formalized rule such as the one that you mention. Personally, I wouldn’t use p-values to make this determination. You can have low p-values for weak correlation in some cases. Instead, I’d look at the strength of the correlations between IVs. However, it’s not a simple as a single criterial like that. The strength of the correlation between the potential IV and the DV also plays a role.

I’ve written an article about that discusses these issues in more detail, read Confounding Variables Can Bias Your Results .

' src=

February 28, 2022 at 8:19 am

Jim, as if by serendipity: having been on your mailing list for years, I looked up your information on multiple regression this weekend for a grad school advanced statistics case study. I’m a fan of your admirable gift to make complicated topics approachable and digestible. Specifically, I was looking for information on how pronounced the triangular/funnel shape must be–and in what directions it may point–to suggest heteroscedasticity in a regression scatterplot of standardized residuals vs standardized predicted values. It seemed to me that my resulting plot of a 5 predictor variable regression model featured an obtuse triangular left point that violated homoscedasticity; my professors disagreed, stating the triangular “funnel” aspect would be more prominent and overt. Thus, should you be looking for a new future discussion point, my query to you then might be some pearls on the nature of a qualifying heteroscedastic funnel shape: How severe must it be? Is there a quantifiable magnitude to said severity, and if so, how would one quantify this and/or what numeric outputs in common statistical software would best support or deny a suspicion based on graphical interpretation? What directions can the funnel point; are only some directions suggestive, whereby others are not? Thanks for entertaining my comment, and, as always, thanks for doing what you do.

Comments and Questions Cancel reply

Linear regression - Hypothesis testing

by Marco Taboga , PhD

This lecture discusses how to perform tests of hypotheses about the coefficients of a linear regression model estimated by ordinary least squares (OLS).

Table of contents

Normal vs non-normal model

The linear regression model, matrix notation, tests of hypothesis in the normal linear regression model, test of a restriction on a single coefficient (t test), test of a set of linear restrictions (f test), tests based on maximum likelihood procedures (wald, lagrange multiplier, likelihood ratio), tests of hypothesis when the ols estimator is asymptotically normal, test of a restriction on a single coefficient (z test), test of a set of linear restrictions (chi-square test), learn more about regression analysis.

The lecture is divided in two parts:

in the first part, we discuss hypothesis testing in the normal linear regression model , in which the OLS estimator of the coefficients has a normal distribution conditional on the matrix of regressors;

in the second part, we show how to carry out hypothesis tests in linear regression analyses where the hypothesis of normality holds only in large samples (i.e., the OLS estimator can be proved to be asymptotically normal).

How to choose which test to carry out after estimating a linear regression model.

We also denote:

We now explain how to derive tests about the coefficients of the normal linear regression model.

It can be proved (see the lecture about the normal linear regression model ) that the assumption of conditional normality implies that:

How the acceptance region is determined depends not only on the desired size of the test , but also on whether the test is:

one-tailed (only one of the two things, i.e., either smaller or larger, is possible).

For more details on how to determine the acceptance region, see the glossary entry on critical values .

[eq28]

The F test is one-tailed .

A critical value in the right tail of the F distribution is chosen so as to achieve the desired size of the test.

Then, the null hypothesis is rejected if the F statistics is larger than the critical value.

In this section we explain how to perform hypothesis tests about the coefficients of a linear regression model when the OLS estimator is asymptotically normal.

As we have shown in the lecture on the properties of the OLS estimator , in several cases (i.e., under different sets of assumptions) it can be proved that:

These two properties are used to derive the asymptotic distribution of the test statistics used in hypothesis testing.

The test can be either one-tailed or two-tailed . The same comments made for the t-test apply here.

[eq50]

Like the F test, also the Chi-square test is usually one-tailed .

The desired size of the test is achieved by appropriately choosing a critical value in the right tail of the Chi-square distribution.

The null is rejected if the Chi-square statistics is larger than the critical value.

Want to learn more about regression analysis? Here are some suggestions:

R squared of a linear regression ;

Gauss-Markov theorem ;

Generalized Least Squares ;

Multicollinearity ;

Dummy variables ;

Selection of linear regression models

Partitioned regression ;

Ridge regression .

How to cite

Please cite as:

Taboga, Marco (2021). "Linear regression - Hypothesis testing", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/linear-regression-hypothesis-testing.

Most of the learning materials found on this website are now available in a traditional textbook format.

  • F distribution
  • Beta distribution
  • Conditional probability
  • Central Limit Theorem
  • Binomial distribution
  • Mean square convergence
  • Delta method
  • Almost sure convergence
  • Mathematical tools
  • Fundamentals of probability
  • Probability distributions
  • Asymptotic theory
  • Fundamentals of statistics
  • About Statlect
  • Cookies, privacy and terms of use
  • Loss function
  • Almost sure
  • Type I error
  • Precision matrix
  • Integrable variable
  • To enhance your privacy,
  • we removed the social buttons,
  • but don't forget to share .

logo

Simple linear regression

Simple linear regression #.

Fig. 9 Simple linear regression #

Errors: \(\varepsilon_i \sim N(0,\sigma^2)\quad \text{i.i.d.}\)

Fit: the estimates \(\hat\beta_0\) and \(\hat\beta_1\) are chosen to minimize the (training) residual sum of squares (RSS):

Sample code: advertising data #

Estimates \(\hat\beta_0\) and \(\hat\beta_1\) #.

A little calculus shows that the minimizers of the RSS are:

Assessing the accuracy of \(\hat \beta_0\) and \(\hat\beta_1\) #

Fig. 10 How variable is the regression line? #

Based on our model #

The Standard Errors for the parameters are:

95% confidence intervals:

Hypothesis test #

Null hypothesis \(H_0\) : There is no relationship between \(X\) and \(Y\) .

Alternative hypothesis \(H_a\) : There is some relationship between \(X\) and \(Y\) .

Based on our model: this translates to

\(H_0\) : \(\beta_1=0\) .

\(H_a\) : \(\beta_1\neq 0\) .

Test statistic:

Under the null hypothesis, this has a \(t\) -distribution with \(n-2\) degrees of freedom.

Sample output: advertising data #

Interpreting the hypothesis test #.

If we reject the null hypothesis, can we assume there is an exact linear relationship?

No. A quadratic relationship may be a better fit, for example. This test assumes the simple linear regression model is correct which precludes a quadratic relationship.

If we don’t reject the null hypothesis, can we assume there is no relationship between \(X\) and \(Y\) ?

No. This test is based on the model we posited above and is only powerful against certain monotone alternatives. There could be more complex non-linear relationships.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Lesson 1: simple linear regression, overview section  .

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. This lesson introduces the concept and basic procedures of simple linear regression.

  • Distinguish between a deterministic relationship and a statistical relationship.
  • Understand the concept of the least squares criterion.
  • Interpret the intercept \(b_{0}\) and slope \(b_{1}\) of an estimated regression equation.
  • Know how to obtain the estimates \(b_{0}\) and \(b_{1}\) from Minitab's fitted line plot and regression analysis output.
  • Recognize the distinction between a population regression line and the estimated regression line.
  • Summarize the four conditions that comprise the simple linear regression model.
  • Know what the unknown population variance \(\sigma^{2}\) quantifies in the regression setting.
  • Know how to obtain the estimated MSE of the unknown population variance \(\sigma^{2 }\) from Minitab's fitted line plot and regression analysis output.
  • Know that the coefficient of determination (\(R^2\)) and the correlation coefficient (r) are measures of linear association. That is, they can be 0 even if there is a perfect nonlinear association.
  • Know how to interpret the \(R^2\) value.
  • Understand the cautions necessary in using the \(R^2\) value as a way of assessing the strength of the linear association.
  • Know how to calculate the correlation coefficient r from the \(R^2\) value.
  • Know what various correlation coefficient values mean. There is no meaningful interpretation for the correlation coefficient as there is for the \(R^2\) value.

Lesson 1 Code Files Section  

STAT501_Lesson01.zip

  • bldgstories.txt
  • carstopping.txt
  • drugdea.txt
  • fev_dat.txt
  • heightgpa.txt
  • husbandwife.txt
  • oldfaithful.txt
  • poverty.txt
  • practical.txt
  • signdist.txt
  • skincancer.txt
  • student_height_weight.txt

How To Perform A Linear Regression In Python (With Examples!)

Join over 2 million students who advanced their careers with 365 Data Science. Learn from instructors who have worked at Meta, Spotify, Google, IKEA, Netflix, and Coca-Cola and master Python, SQL, Excel, machine learning, data analysis, AI fundamentals, and more.

hypothesis linear regression example

If you want to become a better statistician, a data scientist, or a machine learning engineer, going over several linear regression examples is inevitable.

They will help you to wrap your head around the whole subject of regressions analysis .

So, to help you understand how linear regression works, in addition to this tutorial, we've also made a video on the topic. You can watch it below, or just scroll down if you prefer reading.

Regression Analysis is coming, linear regression

Regression analysis is one of the most widely used methods for prediction.

It is applied whenever we have a causal relationship between variables .

Cause and effect, linear regression

A large portion of the predictive modeling that occurs in practice is carried out through regression analysis . There are also many academic papers based on it. And it becomes extremely powerful when combined with techniques like factor analysis . Moreover, the fundamentals of regression analysis are used in machine learning .

Machine learning, linear regression

Therefore, it is easy to see why regressions are a must for data science . The general point is the following.

 “The amount of money you spend depends on the amount of money you earn.”

The amount of money you spend depends on the amount of money you earn, linear regression

In the same way, the amount of time you spend reading our tutorials is affected by your motivation to learn additional statistical methods .

Statistical methods, linear regression

You can quantify these relationships and many others using regression analysis .

Regression Analysis

We will use our typical step-by-step approach. We’ll start with the simple linear regression model , and not long after, we’ll be dealing with the multiple regression model . Along the way, we will learn how to build a regression , how to interpret it and how to compare different models.

How to build a regression, linear regression

We will also develop a deep understanding of the fundamentals by going over some linear regression examples.

A quick side note: You can learn more about the geometrical representation of the simple linear regression model in the linked tutorial . 

What is a Linear Regression

Let’s start with some dry theory. A linear regression is a linear approximation of a causal relationship between two or more variables.

A linear regression is a linear approximation of a causal relationship between two or more variables.

Regression models are highly valuable, as they are one of the most common ways to make inferences and predictions.

The Process of Creating a Linear Regression

The process goes like this.

  • First, you get sample data;
  • Then, you can design a model that explains the data;
  • Finally, you use the model you’ve developed to make a prediction for the whole population.

The process, linear regression

There is a dependent variable, labeled Y , being predicted, and independent variables, labeled x1 , x2 , and so forth. These are the predictors. Y is a function of the X variables, and the regression model is a linear approximation of this function.

The regression model is a linear approximation of this function, linear regression

The Simple Linear Regression

The easiest regression model is the simple linear regression :

Y =  β 0 + β 1 * x 1 + ε.

Let’s see what these values mean. Y is the variable we are trying to predict and is called the dependent variable . X is an independent variable .

Simple linear regression model, linear regression

When using regression analysis , we want to predict the value of Y , provided we have the value of X .

But to have a regression , Y must depend on X in some way. Whenever there is a change in X , such change must translate to a change in Y .

Providing a Linear Regression Example

Think about the following equation: the income a person receives depends on the number of years of education that person has received. The dependent variable is income, while the independent variable is years of education.

The dependent variable is income, while the independent variable is years of education, linear regression

There is a causal relationship between the two. The more education you get, the higher the income you are likely to receive. This relationship is so trivial that it is probably the reason you are reading this tutorial, right now. You want to get a higher income, so you are increasing your education.

More education translates into a higher income, linear regression

Is the Reverse Relationship Possible?

Now, let’s pause for a second and think about the reverse relationship. What if education depends on income.

What if education depends on income, linear regression

This would mean the higher your income, the more years you spend educating yourself.

The higher your income, the more years you spend educating yourself, linear regression

Putting high tuition fees aside, wealthier individuals don’t spend more years in school. Moreover, high school and college take the same number of years, no matter your tax bracket. Therefore, a causal relationship like this one is faulty, if not plain wrong. Hence, it is unfit for regression analysis .

It is unfit for regression analysis, linear regression

Let’s go back to the original linear regression example. Income is a function of education. The more years you study, the higher the income you will receive. This sounds about right.

The more years you study, the higher income you will receive, linear regression

The Coefficients

What we haven’t mentioned, so far, is that, in our model, there are coefficients. β 1 is the coefficient that stands before the independent variable. It quantifies the effect of education on income.

It quantifies the effect of education on income, linear regression

If β 1 is 50, then for each additional year of education, your income would grow by $50. In the USA, the number is much bigger, somewhere around 3 to 5 thousand dollars.

For each additional year of education, your income will increase from $3000 to $5000, linear regression

The Constant

The other two components are the constant β 0 and the error – epsilon(ε).

In this linear regression example, you can think of the constant β 0 as the minimum wage. No matter your education, if you have a job, you will get the minimum wage. This is a guaranteed amount.

So, if you never went to school and plug an education value of 0 years in the formula, what could possibly happen? Logically, the regression will predict that your income will be the minimum wage.

The regression will predict that your income will be the minimum wage, linear-regression

The last term is the epsilon(ε). This represents the error of estimation. The error is the actual difference between the observed income and the income the regression predicted. On average, across all observations, the error is 0.

On average the error is 0, linear regression

If you earn more than what the regression has predicted, then someone earns less than what the regression predicted. Everything evens out.

The Linear Regression Equation

The original formula was written with Greek letters. This tells us that it was the population formula. But don’t forget that statistics (and data science) is all about sample data. In practice, we tend to use the linear regression equation .

It is simply ŷ = β 0 + β 1 * x.

 ŷ = β0+ β1* x, linear regression

The ŷ here is referred to as y hat . Whenever we have a hat symbol, it is an estimated or predicted value.

B 0   is the estimate of the  regression  constant  β 0 . Whereas,  b 1  is the estimate of  β 1 , and x is the sample data for the  independent variable .

The Regression Line

You may have heard about the regression line , too. When we plot the data points on an x-y plane, the regression line is the best-fitting line through the data points.

Regression line, linear regression

You can take a look at a plot with some data points in the picture above. We plot the line based on the regression equation .

The grey points that are scattered are the observed values. B 0 , as we said earlier, is a constant and is the intercept of the regression line with the y-axis.

b0 is a constant and is the intercept of the regression line with the y axis, linear regression

The Estimator of the Error

The distance between the observed values and the regression line is the estimator of the error term epsilon . Its point estimate is called residual.

 Its point estimate is called residual, linear regression

Now, suppose we draw a perpendicular from an observed point to the regression line . The intercept between that perpendicular and the regression line will be a point with a y value equal to ŷ .

The intercept between that perpendicular and the regression line will be a point with a y value equal to ŷ, linear regression

As we said earlier, given an x , ŷ is the value predicted by the regression line .

ŷ is the value predicted by the regression line, linear regression

Linear Regression in Python Example

We believe it is high time that we actually got down to it and wrote some code! So, let’s get our hands dirty with our first linear regression example in Python . If this is your first time hearing about Python, don’t worry. We have plenty of tutorials that will give you the base you need to use it for data science and machine learning .

Now, how about we write some code? First off, we will need to use a few libraries.  

Importing the Relevant Libraries

Let’s import the following libraries:

Simple linear regression

The first three are pretty conventional. We won’t even need numpy , but it’s always good to have it there – ready to lend a helping hand for some operations. In addition, the machine learning library we will employ for this linear regression example is: statsmodels. So, we can basically write the following code:

Loading the Data

The data which we will be using for our linear regression example is in a .csv file called: ‘1.01. Simple linear regression.csv’. You can download it from here . Make sure that you save it in the folder of the user.

Now, let’s load it in a new variable called: data using the pandas method: ‘read_csv’. We can write the following code:

After running it, the data from the .csv file will be loaded in the data variable. As we are using pandas , the data variable will be automatically converted into a data frame.

The data variable will be automatically converted into a data frame, linear regression

Visualizing the Data Frame

Let’s see if that’s true. We can write data and run the line. As you can see below, we have indeed displayed the data frame.

The data frame, linear regression

There are two columns - SAT and GPA . And that’s what our linear regression example will be all about. Let’s further check

This is a pandas method which will give us the most useful descriptive statistics for each column in the data frame – number of observations, mean , standard deviation , and so on.

Descriptive statistics for each column in the data frame, linear regression

In this linear regression example we won’t put that to work just yet. However, it’s good practice to use it.

The Problem

Let’s explore the problem with our linear regression example.

So, we have a sample of 84 students, who have studied in college.

We have a sample of 84 students, who have studied in college, linear regression

Their total SAT scores include critical reading, mathematics, and writing. Whereas, the GPA is their Grade Point Average they had at graduation.

SAT and GPA, linear regression

That’s a very famous relationship. We will create a linear regression which predicts the GPA of a student based on their SAT score.

When you think about it, it totally makes sense.

  • You sit the SAT and get a score.
  • With this score, you apply to college.
  • The next 4 years, you attend college and graduate receiving many grades, forming your GPA.

You can see the timeline below.

The timeline, linear regression

Meaningful Regressions

Before we finish this introduction, we want to get this out of the way. Each time we create a regression , it should be meaningful. Why would we predict GPA with SAT? Well, the SAT is considered one of the best estimators of intellectual capacity and capability.

On average, if you did well on your SAT, you will do well in college and at the workplace. Furthermore, almost all colleges across the USA are using the SAT as a proxy for admission.

And last but not least, the SAT stood the test of time and established itself as the leading exam for college admission.

Why would I predict GPA with SAT, linear regression

It is safe to say our regression makes sense.

Creating our First Regression in Python

After we’ve cleared things up, we can start creating our first regression in Python. We will go through the code and in subsequent tutorials, we will clarify each point.

Important: Remember, the equation is:

Our dependent variable is GPA, so let’s create a variable called y which will contain GPA.

Just a reminder - the pandas’ syntax is quite simple.

This is all we need to code:

  • First, we write the name of the data frame, in this case data
  • Then, we add in square brackets the relevant column name, which is GPA in our case.

Similarly, our independent variable is SAT, and we can load it in a variable x1.

Create your first regression

Exploring the Data

It’s always useful to plot our data in order to understand it better and see if there is a relationship to be found.

We will use some conventional matplotlib code.

You can see the result we receive after running it, in the picture below.

The result

Each point on the graph represents a different student. For instance, the highlighted point below is a student who scored around 1900 on the SAT and graduated with a 3.4 GPA.

The highlighted point below is a student who scored around 1900 on the SAT and graduated with a 3.4 GPA

Observing all data points, we can see that there is a strong relationship between SAT and GPA. In general, the higher the SAT of a student, the higher their GPA.

the higher the SAT of a student, the higher their GPA

Adding a Constant

Next, we need to create a new variable, which we’ll call x.

We have our x 1 , but we don’t have an x 0 . In fact, in the regression equation there is no explicit x 0 . The coefficient b 0 is alone.

The coefficient b0 is alone

That can be represented as: b 0 * 1. So, if there was an x 0 , it would always be 1.

If there was an x0, it would always be 1

It is really practical for computational purposes to incorporate this notion into the equation. And that’s how we estimate the intercept b 0 . In terms of code, statsmodels uses the method: .add_constant().

So, let’s declare a new variable:

The Results Variable

The .fit() method

That itself is enough to perform the regression .

Displaying the Regression Results

In any case, results.summary() will display the regression results and organize them into three tables.

So, this is all the code we need to run:

And this is what we get after running it:

Regression results

As you can see, we have a lot of statistics in front of us! And we will examine it in more detail in subsequent tutorials.

Plotting the Regression line

Let’s plot the regression line on the same scatter plot.  We can achieve that by writing the following:

The best fitting line

So that’s how you create a simple linear regression in Python!

How to Interpret the Regression Table

Now, let’s figure out how to interpret the regression table we saw earlier in our linear regression example.

While the graphs we have seen so far are nice and easy to understand. When you perform regression analysis , you’ll find something different than a scatter plot with a regression line . The graph is a visual representation, and what we really want is the equation of the model, and a measure of its significance and explanatory power. This is why the regression summary consists of a few tables, instead of a graph.

The regression summary consists of a few tables, instead of a graph

Let’s find out how to read and understand these tables.

The 3 Main Tables

Typically, when using statsmodels , we’ll have three main tables – a model summary

A model summary

a coefficients table

A coefficients table

and some additional tests.

Additional tests

Certainly, these tables contain a lot of information, but we will focus on the most important parts.

We will start with the coefficients table .

The Coefficients Table

We can see the coefficient of the intercept, or the constant as they’ve named it in our case.

Coefficient of the intercept

Both terms are used interchangeably. In any case, it is 0.275, which means b 0 is 0.275.

b0 is 0.275

Looking below it, we notice the other coefficient is 0.0017. This is our b 1 . These are the only two numbers we need to define the regression equation .

These are the only two numbers we need to define the regression equation

ŷ= 0.275 + 0.0017 * x1.

Or GPA equals 0.275 plus 0.0017 times SAT score.

GPA equals 0.275 plus 0.0017 times SAT score

So, this is how we obtain the regression equation .

A Quick Recap

Let’s take a step back and look at the code where we plotted the regression line . We have plotted the scatter plot of SAT and GPA. That’s clear. After that, we created a variable called: y hat(ŷ). Moreover, we imported the seaborn library as a ‘skin’ for matplotlib . We did that in order to display the regression in a prettier way.

The seaborn library as a ‘skin’ for matplotlib

That’s the regression line - the predicted variables based on the data.

The regression line

Finally, we plot that line using the plot method.

Plot that line using the plot method

Naturally, we picked the coefficients from the coefficients table – we didn’t make them up.

The Predictive Power of Linear Regressions

You might be wondering if that prediction is useful. Well, knowing that a person has scored 1700 on the SAT, we can substitute in the equation and obtain the following:

0.275 + 0.0017 * 1700, which equals 3.165. So, the expected GPA for this student, according to our model is 3.165.

The expected GPA for this student is 3.165

And that’s the predictive power of linear regressions in a nutshell!

The Standard Errors

What about the other cells in the table?

The standard errors show the accuracy of prediction for each variable.

The lower the standard error , the better the estimate!

The lower the standard error, the better the estimate

The T-Statistic

The next two values are a T-statistic and its P-value.

T-statistic and its P-value

If you have gone over our other tutorials, you may know that there is a hypothesis involved here . The null hypothesis of this test is: β = 0. In other words, is the coefficient equal to zero?

Is the coefficient equal to zero?

The Null Hypothesis

If a coefficient is zero for the intercept(b 0 ), then the line crosses the y-axis at the origin. You can get a better understanding of what we are talking about, from the picture below.

If a coefficient is zero for the intercept(b0), then the line crosses the y-axis at the origin

If β 1 is zero, then 0 * x will always be 0 for any x, so this variable will not be considered for the model. Graphically, that would mean that the regression line is horizontal – always going through the intercept value.

The regression line is horizontal

The P-Value

Let’s paraphrase this test. Essentially, it asks, is this a useful variable? Does it help us explain the variability we have in this case? The answer is contained in the P-value column.

The answer is contained in the P-value column

As you may know, a P-value below 0.05 means that the variable is significant. Therefore, the coefficient is most probably different from 0. Moreover, we are longing to see those three zeroes.

The coefficient is most probably different from 0

What does this mean for our linear regression example?

Well, it simply tells us that SAT score is a significant variable when predicting college GPA.

What you may notice is that the intercept p-value is not zero.

The intercept p-value is not zero

Let’s think about this. Does it matter that much? This test is asking the question: Graphically, that would mean that the regression line passes through the origin of the graph.

The regression line passes through the origin of the graph

Usually, this is not essential, as it is causal relationship of the X s we are interested in.

The F-statistic

The last measure we will discuss is the F-statistic. We will explain its essence and see how it can be useful to us.

F-statistic

Much like the Z-statistic which follows a normal distribution   and the T-statistic that follows a Student’s T distribution , the F-statistic follows an F distribution .

F distribution

We are calling it a statistic, which means that it is used for tests. The test is known as the test for overall significance of the model.

The Null Hypothesis and the Alternative Hypothesis

The null hypothesis is: all the β s are equal to zero simultaneously.

The alternative hypothesis is: at least one β differs from zero.

At least one β differs from zero

This is the interpretation: if all β s are zero, then none of the independent variables matter. Therefore, our model has no merit.

In our case, the F-statistic is 56.05.

The F-statistic is 56.05

The cell below is its P-value .

Its P-value

As you can see, the number is really low – it is virtually 0.000. We say the overall model is significant.

Important: Notice how the P-value is a universal measure for all tests. There is an F-table used for the F-statistic, but we don’t need it, because the P-value notion is so powerful.

The F-test is important for regressions , as it gives us some important insights. Remember, the lower the F-statistic, the closer to a non-significant model.

Moreover, don’t forget to look for the three zeroes after the dot!

Create Your Own Linear Regressions

Well, that was a long journey, wasn’t it? We embarked on it by first learning about what a linear regression is. Then, we went over the process of creating one. We also went over a linear regression example. Afterwards, we talked about the simple linear regression where we introduced the linear regression equation . By then, we were done with the theory and got our hands on the keyboard and explored another linear regression example in Python! We imported the relevant libraries and loaded the data. We cleared up when exactly we need to create regressions and started creating our own. The process consisted of several steps which, now, you should be able to perform with ease. Afterwards, we began interpreting the regression table . We mainly discussed the coefficients table. Lastly, we explained why the F-statistic is so important for regressions .

Next Step: Correlation

You thought that was all you need to know about regressions ? Well, seeing a few linear regression examples is not enough. There are many more skills you need to acquire in order to truly understand how to work with linear regressions . The first thing which you can clear up is the misconception that regression and correlation are referring to the same concept.

Interested in learning more? You can take your skills from good to great with our Introduction to Python course! Try Introduction to Python course for free  

Next Tutorial:  The Differences between Correlation and Regression

hypothesis linear regression example

Iliya Valchanov

Co-founder of 365 Data Science

Iliya is a finance graduate with a strong quantitative background who chose the exciting path of a startup entrepreneur. He demonstrated a formidable affinity for numbers during his childhood, winning more than 90 national and international awards and competitions through the years. Iliya started teaching at university, helping other students learn statistics and econometrics. Inspired by his first happy students, he co-founded 365 Data Science to continue spreading knowledge. He authored several of the program’s online courses in mathematics, statistics, machine learning, and deep learning.

We Think you'll also like

Introduction To Python Functions: Definition and Examples

Python Tutorials

Introduction To Python Functions: Definition and Examples

Article by Martin Ganchev

Why Is Linear Algebra Useful in Data Science?

Article by Iliya Valchanov

How to Perform Data Wrangling with Python

Article by The 365 Team

How to Perform Sentiment Analysis with Python?

Article by Natassha Selvaraj

Teach yourself statistics

Linear Regression Example

In this lesson, we apply regression analysis to some fictitious data, and we show how to interpret the results of our analysis.

Note: Your browser does not support HTML5 video. If you view this web page on a different browser (e.g., a recent version of Edge, Chrome, Firefox, or Opera), you can watch a video treatment of this lesson.

Note: Regression computations are usually handled by a software package or a graphing calculator. For this example, however, we will do the computations "manually", since the gory details have educational value.

Problem Statement

Last year, five randomly selected students took a math aptitude test before they began their statistics course. The Statistics Department has three questions.

  • What linear regression equation best predicts statistics performance, based on math aptitude scores?
  • If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics?
  • How well does the regression equation fit the data?

How to Find the Regression Equation

In the table below, the x i column shows scores on the aptitude test. Similarly, the y i column shows statistics grades. The last two columns show deviations scores - the difference between the student's score and the average score on each measurement. The last two rows show sums and mean scores that we will use to conduct the regression analysis.

Student x y (x - ) (y - )
1 95 85 17 8
2 85 95 7 18
3 80 70 2 -7
4 70 65 -8 -12
5 60 70 -18 -7
390 385
78 77

And for each student, we also need to compute the squares of the deviation scores (the last two columns in the table below).

Student x y (x - ) (y - )
1 95 85 289 64
2 85 95 49 324
3 80 70 4 49
4 70 65 64 144
5 60 70 324 49
390 385 730 630
78 77

And finally, for each student, we need to compute the product of the deviation scores (the last column in the table below).

Student x y (x - )(y - )
1 95 85 136
2 85 95 126
3 80 70 -14
4 70 65 96
5 60 70 126
390 385 470
78 77

The regression equation is a linear equation of the form: ŷ = b 0 + b 1 x . To conduct a regression analysis, we need to solve for b 0 and b 1 . Computations are shown below. Notice that all of our inputs for the regression analysis come from the above three tables.

First, we solve for the regression coefficient (b 1 ):

b 1 = Σ [ (x i - x )(y i - y ) ] / Σ [ (x i - x ) 2 ]

b 1 = 470/730

b 1 = 0.644

Once we know the value of the regression coefficient (b 1 ), we can solve for the regression slope (b 0 ):

b 0 = y - b 1 * x

b 0 = 77 - (0.644)(78)

b 0 = 26.768

Therefore, the regression equation is: ŷ = 26.768 + 0.644x .

How to Use the Regression Equation

Once you have the regression equation, using it is a snap. Choose a value for the independent variable ( x ), perform the computation, and you have an estimated value (ŷ) for the dependent variable.

In our example, the independent variable is the student's score on the aptitude test. The dependent variable is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated statistics grade (ŷ) would be:

ŷ = b 0 + b 1 x

ŷ = 26.768 + 0.644x = 26.768 + 0.644 * 80

ŷ = 26.768 + 51.52 = 78.288

Warning: When you use a regression equation, do not use values for the independent variable that are outside the range of values used to create the equation. That is called extrapolation , and it can produce unreasonable estimates.

In this example, the aptitude test scores used to create the regression equation ranged from 60 to 95. Therefore, only use values inside that range to estimate statistics grades. Using values outside that range (less than 60 or greater than 95) is problematic.

How to Find the Coefficient of Determination

Whenever you use a regression equation, you should ask how well the equation fits the data. One way to assess fit is to check the coefficient of determination , which can be computed from the following formula.

R 2 = { ( 1 / N ) * Σ [ (x i - x ) * (y i - y ) ] / (σ x * σ y ) } 2

where N is the number of observations used to fit the model, Σ is the summation symbol, x i is the x value for observation i, x is the mean x value, y i is the y value for observation i, y is the mean y value, σ x is the standard deviation of x, and σ y is the standard deviation of y.

Computations for the sample problem of this lesson are shown below. We begin by computing the standard deviation of x (σ x ):

σ x = sqrt [ Σ ( x i - x ) 2 / N ]

σ x = sqrt( 730/5 ) = sqrt(146) = 12.083

Next, we find the standard deviation of y, (σ y ):

σ y = sqrt [ Σ ( y i - y ) 2 / N ]

σ y = sqrt( 630/5 ) = sqrt(126) = 11.225

R 2 = [ ( 1/5 ) * 470 / ( 12.083 * 11.225 ) ] 2

R 2 = ( 94 / 135.632 ) 2 = ( 0.693 ) 2 = 0.48

A coefficient of determination equal to 0.48 indicates that about 48% of the variation in statistics grades (the dependent variable ) can be explained by the relationship to math aptitude scores (the independent variable ). This would be considered a good fit to the data, in the sense that it would substantially improve an educator's ability to predict student performance in statistics class.

Simple Linear Regression Examples

Many of simple  linear regression examples (problems and solutions) from the real life can be given to help you understand the core meaning.

On this page:

  • Simple linear regression examples: problems with solutions .
  • Infographic in PDF

In our previous post linear regression models , we explained in details what is simple and multiple linear regression. Here, we concentrate on the examples of linear regression from the real life.

Simple Linear Regression Examples, Problems, and Solutions

Simple linear regression allows us to study the correlation between only two variables:

  • One variable (X) is called independent variable or predictor.
  • The other variable (Y), is known as dependent variable or outcome.

and the simple linear regression equation is:

Y = Β 0  + Β 1 X

X  – the value of the independent variable, Y  – the value of the dependent variable. Β 0  – is a constant (shows the value of Y when the value of X=0) Β 1  – the regression coefficient (shows how much Y changes for each unit change in X)

You have to study the relationship between the monthly e-commerce sales and the online advertising costs. You have the survey results for 7 online stores for the last year.

Your task is to find the equation of the straight line that fits the data best.

The following table represents the survey results from the 7 online stores.

13681.7
23401.5
36652.8
49545
53311.3
65562.2
73761.3

We can see that there is a positive relationship between the monthly e-commerce sales (Y) and online advertising costs (X).

The positive correlation means that the values of the dependent variable (y) increase when the values of the independent variable (x) rise.

So, if we want to predict the monthly e-commerce sales from the online advertising costs, the higher the value of advertising costs, the higher our prediction of sales.

We will use the above data to build our Scatter diagram.

Now, let’ see how the Scatter diagram looks like:

The Scatter plot shows how much one variable affects another. In our example, above Scatter plot shows how much online advertising costs affect the monthly e-commerce sales. It shows their correlation.

Let’s see the simple linear regression equation.

Y = 125.8 + 171.5*X

Note : You can find easily the values for Β 0   and Β 1 with the help of paid or free statistical software, online linear regression calculators or Excel. All you need are the values for the independent (x) and dependent (y) variables (as those in the above table).

Now, we have to see our regression line:

Graph of the Regression Line:

Linear regression aims to find the best-fitting straight line through the points. The best-fitting line is known as the regression line.

If data points are closer when plotted to making a straight line, it means the correlation between the two variables is higher. In our example, the relationship is strong.

The orange diagonal line in diagram 2 is the regression line and shows the predicted score on e-commerce sales for each possible value of the online advertising costs.

Interpretation of the results:

The slope of 171.5 shows that each increase of one unit in X, we predict the average of Y to increase by an estimated 171.5 units.

The formula estimates that for each increase of 1 dollar in online advertising costs, the expected monthly e-commerce sales are predicted to increase by $171.5.

This was a simple linear regression example for a positive relationship in business. Let’s see an example of the negative relationship.

You have to examine the relationship between the age and price for used cars sold in the last year by a car dealership company.

Here is the table of the data:

46300
45800
55700
54500
74500
74200
84100
93100
102100
112500
122200

Now, we see that we have a negative relationship between the car price (Y) and car age(X) – as car age increases, price decreases.

When we use the simple linear regression equation, we have the following results:

Y = 7836 – 502.4*X

Let’s use the data from the table and create our Scatter plot and linear regression line:

The above 3 diagrams are made with  Meta Chart .

Result Interpretation:

With an estimated slope of – 502.4, we can conclude that the average car price decreases $502.2 for each year a car increases in age.

If you need more examples in the field of statistics and data analysis or more data visualization types , our posts “ descriptive statistics examples ” and “ binomial distribution examples ” might be useful to you.

Download the following infographic in PDF with the simple linear regression examples:

About The Author

hypothesis linear regression example

Silvia Valcheva

Silvia Valcheva is a digital marketer with over a decade of experience creating content for the tech industry. She has a strong passion for writing about emerging software and technologies such as big data, AI (Artificial Intelligence), IoT (Internet of Things), process automation, etc.

' src=

Hi. I really enjoy your article, seems to me that it can help to many students in order to improve their skills. Thanks,

hypothesis linear regression example

solved perfectly, great article

' src=

Thanks Silvia for your articles. They are quite informative and easy to understand.

Leave a Reply Cancel Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed .

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null & Alternative Hypotheses | Definitions, Templates & Examples

Published on May 6, 2022 by Shaun Turney . Revised on June 22, 2023.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis ( H 0 ): There’s no effect in the population .
  • Alternative hypothesis ( H a or H 1 ) : There’s an effect in the population.

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, similarities and differences between null and alternative hypotheses, how to write null and alternative hypotheses, other interesting articles, frequently asked questions.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”:

  • The null hypothesis ( H 0 ) answers “No, there’s no effect in the population.”
  • The alternative hypothesis ( H a ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research to write strong hypotheses .

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept . Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When you incorrectly reject the null hypothesis, it’s called a type I error . When you incorrectly fail to reject it, it’s a type II error.

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

( )
Does tooth flossing affect the number of cavities? Tooth flossing has on the number of cavities. test:

The mean number of cavities per person does not differ between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ = µ .

Does the amount of text highlighted in the textbook affect exam scores? The amount of text highlighted in the textbook has on exam scores. :

There is no relationship between the amount of text highlighted and exam scores in the population; β = 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression.* test:

The proportion of people with depression in the daily-meditation group ( ) is greater than or equal to the no-meditation group ( ) in the population; ≥ .

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis ( H a ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Does tooth flossing affect the number of cavities? Tooth flossing has an on the number of cavities. test:

The mean number of cavities per person differs between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ ≠ µ .

Does the amount of text highlighted in a textbook affect exam scores? The amount of text highlighted in the textbook has an on exam scores. :

There is a relationship between the amount of text highlighted and exam scores in the population; β ≠ 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression. test:

The proportion of people with depression in the daily-meditation group ( ) is less than the no-meditation group ( ) in the population; < .

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question.
  • They both make claims about the population.
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

A claim that there is in the population. A claim that there is in the population.

Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)
Rejected Supported
Failed to reject Not supported

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis linear regression example

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

General template sentences

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis ( H 0 ): Independent variable does not affect dependent variable.
  • Alternative hypothesis ( H a ): Independent variable affects dependent variable.

Test-specific template sentences

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

( )
test 

with two groups

The mean dependent variable does not differ between group 1 (µ ) and group 2 (µ ) in the population; µ = µ . The mean dependent variable differs between group 1 (µ ) and group 2 (µ ) in the population; µ ≠ µ .
with three groups The mean dependent variable does not differ between group 1 (µ ), group 2 (µ ), and group 3 (µ ) in the population; µ = µ = µ . The mean dependent variable of group 1 (µ ), group 2 (µ ), and group 3 (µ ) are not all equal in the population.
There is no correlation between independent variable and dependent variable in the population; ρ = 0. There is a correlation between independent variable and dependent variable in the population; ρ ≠ 0.
There is no relationship between independent variable and dependent variable in the population; β = 0. There is a relationship between independent variable and dependent variable in the population; β ≠ 0.
Two-proportions test The dependent variable expressed as a proportion does not differ between group 1 ( ) and group 2 ( ) in the population; = . The dependent variable expressed as a proportion differs between group 1 ( ) and group 2 ( ) in the population; ≠ .

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Null & Alternative Hypotheses | Definitions, Templates & Examples. Scribbr. Retrieved July 22, 2024, from https://www.scribbr.com/statistics/null-and-alternative-hypotheses/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, inferential statistics | an easy introduction & examples, hypothesis testing | a step-by-step guide with easy examples, type i & type ii errors | differences, examples, visualizations, what is your plagiarism score.

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Writing hypothesis for linear multiple regression models

I struggle writing hypothesis because I get very much confused by reference groups in the context of regression models.

For my example I'm using the mtcars dataset. The predictors are wt (weight), cyl (number of cylinders), and gear (number of gears), and the outcome variable is mpg (miles per gallon).

Say all your friends think you should buy a 6 cylinder car, but before you make up your mind you want to know how 6 cylinder cars perform miles-per-gallon-wise compared to 4 cylinder cars because you think there might be a difference.

Would this be a fair null hypothesis (since 4 cylinder cars is the reference group)?: There is no difference between 6 cylinder car miles-per-gallon performance and 4 cylinder car miles-per-gallon performance.

Would this be a fair model interpretation ?: 6 cylinder vehicles travel fewer miles per gallon (p=0.010, β -4.00, CI -6.95 - -1.04) as compared to 4 cylinder vehicles when adjusting for all other predictors, thus rejecting the null hypothesis.

Sorry for troubling, and thanks in advance for any feedback!

enter image description here

  • multiple-regression
  • linear-model
  • interpretation

LuizZ's user avatar

Yes, you already got the right answer to both of your questions.

  • Your null hypothesis in completely fair. You did it the right way. When you have a factor variable as predictor, you omit one of the levels as a reference category (the default is usually the first one, but you also can change that). Then all your other levels’ coefficients are tested for a significant difference compared to the omitted category. Just like you did.

If you would like to compare 6-cylinder cars with 8-cylinder car, then you would have to change the reference category. In your hypothesis you just could had added at the end (or as a footnote): "when adjusting for weight and gear", but it is fine the way you did it.

  • Your model interpretation is correct : It is perfect the way you did it. You could even had said: "the best estimate is that 6 cylinder vehicles travel 4 miles per gallon less than 4 cylinder vehicles (p-value: 0.010; CI: -6.95, -1.04), when adjusting for weight and gear, thus rejecting the null hypothesis".

Let's assume that your hypothesis was related to gears, and you were comparing 4-gear vehicles with 3-gear vehicles. Then your result would be β: 0.65; p-value: 0.67; CI: -2.5, 3.8. You would say that: "There is no statistically significant difference between three and four gear cars in fuel consumption, when adjusting for weight and engine power, thus failing to reject the null hypothesis".

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged r regression multiple-regression linear-model interpretation or ask your own question .

  • Featured on Meta
  • Announcing a change to the data-dump process
  • Upcoming initiatives on Stack Overflow and across the Stack Exchange network...

Hot Network Questions

  • Why does this 4-week Treasury bill that I bought have a 17-week term and a much earlier issue date?
  • What are the safe assumptions I can make when installing a wall oven without model/serial?
  • Any tips to differentiate zero and non-zero values on a map with a continuous color scale?
  • Why do three sites (SJ, VY, Norrtåg) offer three different prices for the same train connection?
  • How do you cite an entire magazine/periodical?
  • Why were the names of Western Roman emperors mostly unique?
  • Is the Faith of Jesus Christ, the Faith Given Through the Holy Spirit?
  • Why would these two populations be genetically compatible?
  • Is there anyway a layperson can discern what is true news vs fake news?
  • Civic Responsibility on London Railway regarding others tailgating us
  • Can true ever be false?
  • Who picks up when you call the "phone number for you" following a possible pilot deviation?
  • Pluto has smooth ice plains, almost like lunar maria. Where did the heat come from to melt all that ice?
  • Simulate Text Cursor
  • Can a "read-only" µSD card be salvaged?
  • Incorrect affiliation on publication
  • Can you use a theory or rule to prove that exact theory or rule is wrong?
  • What is this huge mosquito looking insect?
  • From which notable Europa's surface features could be possible to observe Jupiter eclipsing the sun?
  • Uniqueness results that follow from CH
  • How/Why is 皆 (みんな) used as an adverb?
  • How is Agile model more flexible than the Waterfall model?
  • Circle of Stars Druid. What happens when you hit 0 HP in Starry Form?
  • Dividing shape into 2 congruent pieces

hypothesis linear regression example

hypothesis linear regression example

Unlocking the Secrets: What Is a Linear Regression Model and How It Can Predict Your Future

  • July 18, 2024

What Is a Linear Regression Model

One of the key challenges in the rapidly evolving world of Machine Learning (ML) is ensuring interpretability. As ML models become more complex, their decision-making processes often turn into 'black boxes'. This can make it difficult for even experts to understand how predictions are made, posing a significant challenge to trust and widespread adoption, particularly in fields requiring high transparency, such as healthcare, finance, and legal systems.

Fortunately, not all ML models are enigmatic. Transparent models like decision trees and linear regression offer a clearer picture of how predictive analytics work. These models are not only simpler to understand but also provide clear insights into how various input factors influence the output. In this blog, we will demystify one of the most foundational and interpretable models in the ML toolkit: the linear regression model.

This blog will explore what is a linear regression model , how it works, and why it remains a cornerstone of predictive analytics. Additionally, we will delve into practical applications of linear regression, showcasing how it can be used to predict future trends and outcomes in various domains. Learn how to harness the power of linear regression to forecast your future with confidence with this detailed guide.

What is a linear regression model ?

Linear regression models are essential statistical tools employed in predictive analytics to assess the connection between a dependent variable (typically represented as y) and one or multiple independent variables (represented as X). The primary goal of linear regression is to predict the dependent variable's value based on the independent variables' values.

The model assumes a linear relationship between the variables, which can be expressed with the equation:

y = β₀ + β₁X₁ + β₂X₂ + …+ β ₙ X ₙ + ϵ

  • y- dependent variable
  • X₁, X₂,…, X ₙ ​ - independent variables
  • β₀​ - intercept
  • Β₁, β₂, …, β ₙ ​ - coefficients
  • ϵ - error term

The intercept and coefficients are derived from the data, and they define the regression line that best fits the data points.

The simplest form, called simple linear regression, involves one dependent and one independent variable, while multiple linear regression involves multiple independent variables.

Visualisation of linear regression

Visualisation is a powerful tool in linear regression, helping to illustrate the relationship between variables. A scatter plot is often used to display the data points. Each point represents an observation with values for the independent and dependent variables. The regression line is then plotted, showing the best fit through these points. 

This line minimises the sum of the squared differences between the observed and predicted values. Thus, it provides a clear visual representation of the relationship and allows analysts to identify trends and patterns easily.

Importance and relevance of linear regression models in business analytics

Linear regression is a widely popular data science tool due to its simplicity and interpretability. It helps understand how the dependent variable changes with a unit change in the independent variable(s) and is applicable in various fields such as economics, biology, engineering, and social sciences for tasks like forecasting, risk management, and trend analysis. 

In businesses, it helps analysts understand the impact of one or more independent variables on a dependent variable, making it essential for forecasting and decision-making. For instance, a company might use linear regression analysis to predict sales based on advertising spend or understand how economic indicators like GDP influence market performance. 

This predictive capability allows businesses to: 

  • Strategise effectively, 
  • Allocate resources optimally, 
  • Make data-driven decisions, enhancing operational efficiency and profitability.

A business analytics course delves deeper into the models (linear, multiple) and their objectives. It offers an in-depth understanding of how these models are used in various scenarios to predict the future and make better decisions.  

How Linear Regression Analysis Works

Now that we have covered the basics of linear regression let’s take a look at how the analysis actually works. 

Steps involved in linear regression analysis

Linear regression analysis involves several key steps, as mentioned below:

  • Start by clearly defining the problem and formulating a hypothesis.
  • Specify the linear regression model to estimate the relationship between the dependent and independent variables .
  • Estimate the coefficients that represent the relationship between the variables.
  • Evaluate and validate the model to ensure its reliability and accuracy.

Data collection and preparation

Data collection is the foundation of any regression analysis. The quality and relevance of the data significantly impact the model's effectiveness. Business analysts gather data from various sources, ensuring it is accurate and comprehensive. Data preparation involves cleaning the data, handling missing values, and transforming variables if necessary. This step ensures that the dataset is ready for analysis and free from any biases or inconsistencies.

Model estimation and interpretation of coefficients

Once the data is prepared, the next step is model estimation. This involves fitting the linear regression model to the data, typically using methods like least squares to estimate the coefficients. These coefficients represent the relationship between the independent variables and the dependent variable. 

Interpreting these coefficients helps analysts understand how changes in the predictors influence the outcome. For instance, a positive coefficient indicates a direct relationship, whereas a negative one signifies an inverse relationship.

Model validation techniques (R-squared, residual analysis)

Model validation is crucial to ensure the regression model's reliability. One of the key metrics used is R-squared, which measures the proportion of variability in the dependent variable explained by the independent variables. A higher R-squared value indicates a better fit. 

Also, residual analysis involves examining the differences between observed and predicted values to detect patterns or inconsistencies. This helps identify model deficiencies and improves predictive accuracy.

Understanding Linear Regression Statistics

Aspiring business analysts must grasp key statistics to evaluate linear regression models effectively. Here are the essential statistics and how they aid in assessing model performance.

Key statistics: R-squared, p-values, standard error

  • R-squared: This statistic measures the proportion of variance in the dependent variable that is predictable from the independent variables. An R-squared value closer to 1 indicates a strong model fit, meaning the model explains a significant portion of the variability in the response variable.
  • P-values: P-values indicate the significance of each coefficient in the model. A low p-value (typically < 0.05) suggests that the corresponding independent variable has a statistically significant relationship with the dependent variable.
  • Standard Error: This metric measures the average distance that the observed values fall from the regression line. A lower standard error indicates that the model’s predictions are more precise.

How these statistics help in evaluating the model's performance

  • R-squared : Helps determine the model’s explanatory power. A higher R-squared value means better predictive accuracy.
  • P-values : Help identify which variables contribute meaningfully to the model, ensuring the model is robust and reliable.
  • Standard Error : Assists in assessing the precision of predictions. A model with a lower standard error provides more accurate and reliable predictions.

Applications of Linear Regression in Business

Linear regression models help businesses make data-driven decisions. Aspiring business analysts can benefit from understanding its applications across various domains.

Examples of linear regression applications in various business domains (finance, marketing, operations)

  • Finance : Linear regression is used to predict stock prices, assess investment risks, and forecast financial performance. For example, it helps in estimating future sales and revenue by analysing past trends and market conditions.
  • Marketing : Businesses apply linear regression to understand customer behaviour, optimise pricing strategies, and improve marketing campaigns. It helps determine the effectiveness of different marketing channels and predict customer demand.
  • Operations : In operations, linear regression assists in inventory management, demand forecasting, and improving supply chain efficiency. Companies use it to predict product performance and optimise production schedules.

Advantages and Limitations of Linear Regression

Linear regression offers both advantages and limitations that are crucial for making informed decisions in data-driven environments.

Benefits of using linear regression in predictive modelling

  • Interpretability : Linear regression provides a straightforward explanation of coefficients, thus simplifying the illustration of relationships between variables.
  • Simplicity : Its implementation and comprehension are direct, ensuring accessibility even for individuals with minimal statistical expertise.
  • Efficiency : Training and prediction times typically outpace those of more intricate models, rendering it well-suited for extensive datasets.

Common pitfalls and how to address them

  • Assumption of Linearity : Linear regression typically assumes a linear association between variables, though this assumption may not universally apply across all datasets and scenarios. Techniques like polynomial regression or transformations can help address this.
  • Overfitting : Using too many variables can lead to overfitting, where the model performs well on training data but poorly on new data. Regularisation methods like Ridge or Lasso regression can mitigate overfitting.

Comparison with other predictive modeling techniques

  • Versus Non-linear Models : Linear regression is less flexible in capturing complex relationships compared to non-linear models like decision trees or neural networks.
  • Versus Ensemble Methods : While ensemble methods like Random Forests may provide higher accuracy in some cases, linear regression remains valuable for its simplicity and interpretability.

Future Trends and Innovations in Linear Regression

Business analysts exploring the landscape of data science must stay abreast of evolving trends in linear regression. This foundational statistical technique continues to evolve with advancements in machine learning and big data analytics, offering new possibilities and integration pathways.

Advances in linear regression methods and tools

  • Innovations in regularisation techniques like Ridge and Lasso regression improve model performance and robustness.
  • Bayesian linear regression offers probabilistic modelling benefits, enhancing uncertainty quantification in predictions.
  • Non-linear regression methods, such as polynomial regression, are being integrated to capture complex relationships in data.

Integration with other machine learning techniques

  • Ensemble Methods : Hybrid models combining linear regression with ensemble techniques like Random Forests are enhancing prediction accuracy.
  • Deep Learning : Integration of linear regression with neural networks for feature extraction and predictive modelling in complex datasets.

Impact of big data and AI on linear regression analysis

Scalability : Linear regression models are now capable of handling vast amounts of data, leveraging distributed computing frameworks.

Automation : AI-driven tools automate model selection, feature engineering, and hyperparameter tuning, streamlining the linear regression workflow.

Understanding the linear regression meaning and its application is fundamental for anyone involved in data analysis and predictive modeling. By leveraging linear regression statistics , analysts can make accurate predictions and gain valuable insights into their data. Whether you're forecasting sales, analysing economic trends, or exploring scientific phenomena, linear regression provides a powerful and intuitive tool for unlocking the secrets hidden within your data.

The Postgraduate Certificate in Business Analytics offered by XLRI and Imarticus can help professionals acquire industry-relevant knowledge and hands-on skills, helping them hone their data-driven decision-making approach.

  • How is linear regression used to predict future values?

Linear regression is employed to predict future values by establishing a relationship between a dependent variable and one or more independent variables from past data. This statistical method fits a straight line to the data points, enabling predictions of future outcomes based on the established pattern.

  • What does a regression model aim to predict?

Regression models are used to analyse and predict continuous variables, helping businesses and researchers make informed decisions based on data patterns.

  • Is the goal of linear regression for prediction or forecasting?

The primary goal of linear regression is prediction rather than forecasting. It aims to predict the value of a dependent variable based on the values of independent variables, establishing a linear relationship between them. While it can be used for forecasting in some contexts, such as predicting future sales based on historical data, its core purpose is to make predictions about continuous outcomes rather than projecting future trends over time.

  • How is linear regression used in real life?

Some common real-life applications of linear regression include predicting stock prices based on historical data, estimating the impact of advertising spending on sales, predicting patient outcomes based on clinical variables, etc.

Share This Post

Subscribe to our newsletter, get updates and learn from the best, more to explore.

data analytics certification course

Your data analytics course might come with a Job Interview but does it offer these things?

Your data analyst training course is incomplete without these features, our programs.

Postgraduate Program In Data Science And Analytics

Postgraduate Program In Machine Learning And Artificial Intelligence

Certification In Artificial Intelligence & Machine Learning

Keep In Touch

A Multiple Linear Regression Model to Estimate Global, Direct and Diffuse Irradiance in Gurugram, India, Using Python

  • Conference paper
  • First Online: 24 July 2024
  • Cite this conference paper

hypothesis linear regression example

  • Subhayan Das   ORCID: orcid.org/0000-0002-0659-938X 13 &
  • Subhra Das   ORCID: orcid.org/0000-0003-2805-7458 14  

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 970))

Included in the following conference series:

  • International Conference on Soft Computing: Theories and Applications

Estimating solar radiation is important for designing a solar photovoltaic or thermal system. In the present work, multiple linear regression model is used to estimate global horizontal irradiance, direct normal irradiance and diffuse horizontal irradiance using the solar resource assessment data collected from National Institute of Solar Energy located in Gurugram (28.42° N, 77.15° E) which records direct, diffuse, global radiation at an interval of 1 min along with humidity, wind speed, wind direction, temperature and precipitation. Principal component analysis is used to select the dominating variables and then fit a multiple regression model to estimate the components of solar radiation. The model performance is tested by computing the coefficient of determination, root mean square error, mean bias error, mean absolute error and model efficiency for each of the models. Model efficiency of 0. 93, 0.88 and 0.93 respectively are obtained for the multiple regression models for estimating global, direct and diffuse irradiance which suggests that the model fits well with the observed data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reno, M.J., Hansen, C.W., Stein, J.S.: Global Horizontal Irradiance Clear Sky Models: Implementation and Analysis. Sandia Report, Sandia National Laboratory (2012)

Google Scholar  

Das, S.: Short term forecasting of solar radiation and power output of 89.6 kWp solar PV power plant. Mater. Today Proc. 39 (part 5), 1959–1969 (2021)

Sukhatme, S.P., Nayak J.K.: Solar energy, principles of collection and storage, 3rd edn. Tata McGraw Hill Education Pvt Ltd. (2012)

Mubarak, R., Hofmann, M., Riechelmann, S., Seckmeyer, G.: Comparison of modelled and measured tilted solar irradiance for photovoltaic applications. Energies 10 , 1688 (2017)

Maleki, S.A.M., Hizam, H., Gomes, C.: Estimation of hourly, daily and monthly global solar radiation on inclined surfaces: models re-visited. Energies 10 , 134 (2017)

Article   Google Scholar  

Hay, J.E., Davies, J.A.: Calculation of the solar radiation incident on an inclined surface. In: Proceedings of the First Canadian Solar Radiation Data Workshop, Toronto, ON, Canada, 17–19 April, pp. 59–72 (1978)

Reindl, D.T., Beckman, W.A., Duffie, J.A.: Evaluation of hourly tilted surface radiation models. Sol. Energy 45 , 9–17 (1990)

Perez, R., Ineichen, P., Seals, R., Michalsky, J., Stewart, R.: Modeling daylight availability and irradiance components from direct and global irradiance. Sol. Energy 44 , 271–289 (1990)

Davies, J.A., McKay, D.C.: Estimating solar irradiance and components. Sol. Energy 29 , 55–64 (1982)

Kaur, A., Nonnenmacher, L., Pedro, H.T., Coimbra, C.F.: Benefits of solar forecasting for energy imbalance markets. Renew. Energy 86 , 819–830 (2016)

Sengupta, M., Habte, A., Gueymard, C., Wilbert, S., Renne, D.: Best Practices Handbook for the Collection and Use of Solar Resource Data for Solar Energy Applications (NREL/TP-5D00-68886). National Renewable Energy Laboratory, Golden, CO (2017)

Kariniotakis, G.: Renewable Energy Forecasting From Models to Applications. Woodhead Publishing (2017)

Ibrahim, S., Daut, I., Irwan, Y.M., Irwanto, M., Gomesh, N., Farhana, Z.: Linear regression model in estimating solar radiation in Perlis. Energy Procedia 18 , 1402–1412 (2012)

Li, H.S., Ma, W.B., Bu, X.B., Lian, Y.W., Wang, X.L.: A multiple linear regression model for estimating global solar radiation in Guangzhou, China. Energy Sourc. Part A Recov. Utiliz. Environ. Effects 35 (4), 321–327 (2013). https://doi.org/10.1080/15567036.2010.499422

Bocca, A., Bergamasco, L., Fasano, M., Bottaccioli, L., Chiavazzo, E., Macii, A., Asinari, P.: Multiple-regression method for fast estimation of solar irradiation and photovoltaic energy potentials over Europe and Africa. Energies 11 , 3477 (2018)

Download references

Acknowledgements

The authors acknowledge the support provide by Dr. Nikhil P G, Assistant Director (Technical), for providing the SRRA data for 2 days for the study.

Author information

Authors and affiliations.

School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, India

Subhayan Das

Amity School of Engineering & Technology, Amity University Haryana, Manesar, India

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Subhra Das .

Editor information

Editors and affiliations.

Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India

Rajesh Kumar

Faculty of Engineering and Natural Sciences (FIN), Western Norway University of Applied Sciences, Haugesund, Norway

Ajit Kumar Verma

Department of Instrumentation and Control Engineering, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, Punjab, India

Om Prakash Verma

Indian Institute of Information Technology (IIIT) Una, Una, Himachal Pradesh, India

Tanu Wadehra

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Das, S., Das, S. (2024). A Multiple Linear Regression Model to Estimate Global, Direct and Diffuse Irradiance in Gurugram, India, Using Python. In: Kumar, R., Verma, A.K., Verma, O.P., Wadehra, T. (eds) Soft Computing: Theories and Applications. SoCTA 2023. Lecture Notes in Networks and Systems, vol 970. Springer, Singapore. https://doi.org/10.1007/978-981-97-2031-6_18

Download citation

DOI : https://doi.org/10.1007/978-981-97-2031-6_18

Published : 24 July 2024

Publisher Name : Springer, Singapore

Print ISBN : 978-981-97-2030-9

Online ISBN : 978-981-97-2031-6

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

COMMENTS

  1. 12.2.1: Hypothesis Test for Linear Regression

    The formula for the t-test statistic is t = b1 (MSE SSxx)√. Use the t-distribution with degrees of freedom equal to n − p − 1. The t-test for slope has the same hypotheses as the F-test: Use a t-test to see if there is a significant relationship between hours studied and grade on the exam, use α = 0.05.

  2. Understanding the Null Hypothesis for Linear Regression

    The following examples show how to decide to reject or fail to reject the null hypothesis in both simple linear regression and multiple linear regression models. Example 1: Simple Linear Regression. Suppose a professor would like to use the number of hours studied to predict the exam score that students will receive in his class. He collects ...

  3. Linear regression hypothesis testing: Concepts, Examples

    F-statistics for testing hypothesis for linear regression model: F-test is used to test the null hypothesis that a linear regression model does not exist, representing the relationship between the response variable y and the predictor variables x1, x2, x3, x4 and x5. The null hypothesis can also be represented as x1 = x2 = x3 = x4 = x5 = 0.

  4. 3.3.4: Hypothesis Test for Simple Linear Regression

    In simple linear regression, this is equivalent to saying "Are X an Y correlated?". In reviewing the model, Y = β0 +β1X + ε Y = β 0 + β 1 X + ε, as long as the slope ( β1 β 1) has any non‐zero value, X X will add value in helping predict the expected value of Y Y. However, if there is no correlation between X and Y, the value of ...

  5. Simple Linear Regression

    Simple linear regression example. You are a social researcher interested in the relationship between income and happiness. You survey 500 people whose incomes range from 15k to 75k and ask them to rank their happiness on a scale from 1 to 10. Your independent variable (income) and dependent variable (happiness) are both quantitative, so you can ...

  6. PDF Chapter 9 Simple Linear Regression

    c plot.9.2 Statistical hypothesesFor simple linear regression, the chief null hypothesis is H0 : β1 = 0, and the corresponding alter. ative hypothesis is H1 : β1 6= 0. If this null hypothesis is true, then, from E(Y ) = β0 + β1x we can see that the population mean of Y is β0 for every x value, which t.

  7. Linear Regression Explained with Examples

    This formula is linear in the parameters. However, despite the name linear regression, it can model curvature. While the formula must be linear in the parameters, you can raise an independent variable by an exponent to model curvature.For example, if you square an independent variable, linear regression can fit a U-shaped curve.

  8. Linear regression

    The lecture is divided in two parts: in the first part, we discuss hypothesis testing in the normal linear regression model, in which the OLS estimator of the coefficients has a normal distribution conditional on the matrix of regressors; in the second part, we show how to carry out hypothesis tests in linear regression analyses where the ...

  9. PDF Lecture 5 Hypothesis Testing in Multiple Linear Regression

    As in simple linear regression, under the null hypothesis t 0 = βˆ j seˆ(βˆ j) ∼ t n−p−1. We reject H 0 if |t 0| > t n−p−1,1−α/2. This is a partial test because βˆ j depends on all of the other predictors x i, i 6= j that are in the model. Thus, this is a test of the contribution of x j given the other predictors in the model.

  10. PDF Lecture 9: Linear Regression

    Regression. Technique used for the modeling and analysis of numerical data. Exploits the relationship between two or more variables so that we can gain information about one of them through knowing values of the other. Regression can be used for prediction, estimation, hypothesis testing, and modeling causal relationships.

  11. Simple linear regression

    Hypothesis test. Null hypothesis H 0: There is no relationship between X and Y. Alternative hypothesis H a: There is some relationship between X and Y. Based on our model: this translates to. H 0: β 1 = 0. H a: β 1 ≠ 0. Test statistic: t = β ^ 1 − 0 SE ( β ^ 1). Under the null hypothesis, this has a t -distribution with n − 2 degrees ...

  12. 6.4

    For the simple linear regression model, there is only one slope parameter about which one can perform hypothesis tests. For the multiple linear regression model, there are three different hypothesis tests for slopes that one could conduct. They are: Hypothesis test for testing that all of the slope parameters are 0.

  13. Lesson 1: Simple Linear Regression

    Objectives. Upon completion of this lesson, you should be able to: Distinguish between a deterministic relationship and a statistical relationship. Understand the concept of the least squares criterion. Interpret the intercept b 0 and slope b 1 of an estimated regression equation. Know how to obtain the estimates b 0 and b 1 from Minitab's ...

  14. 14.4: Hypothesis Test for Simple Linear Regression

    We will also run this test using the p p ‐value method with statistical software, such as Minitab. Data/Results. F = 341.422/12.859 = 26.551 F = 341.422 / 12.859 = 26.551, which is more than the critical value of 10.13, so Reject Ho H o . Also, the p p ‐value = 0.0142 < 0.05 which also supports rejecting Ho H o .

  15. Linear Regression In Python (With Examples!)

    Make sure that you save it in the folder of the user. Now, let's load it in a new variable called: data using the pandas method: 'read_csv'. We can write the following code: data = pd.read_csv(' 1.01. Simple linear regression.csv') After running it, the data from the .csv file will be loaded in the data variable.

  16. Linear Regression Example

    The regression equation is a linear equation of the form: ŷ = b 0 + b 1 x . To conduct a regression analysis, we need to solve for b 0 and b 1. Computations are shown below. Notice that all of our inputs for the regression analysis come from the above three tables. First, we solve for the regression coefficient (b 1):

  17. 15.5: Hypothesis Tests for Regression Models

    15.5: Hypothesis Tests for Regression Models. So far we've talked about what a regression model is, how the coefficients of a regression model are estimated, and how we quantify the performance of the model (the last of these, incidentally, is basically our measure of effect size). The next thing we need to talk about is hypothesis tests.

  18. Simple Linear Regression Examples

    and the simple linear regression equation is: Y = Β0 + Β1X. Where: X - the value of the independent variable, Y - the value of the dependent variable. Β0 - is a constant (shows the value of Y when the value of X=0) Β1 - the regression coefficient (shows how much Y changes for each unit change in X) Example 1: You have to study the ...

  19. Null & Alternative Hypotheses

    Revised on June 22, 2023. The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis (H0): There's no effect in the population. Alternative hypothesis (Ha or H1): There's an effect in the population. The effect is usually the effect of the ...

  20. Null hypothesis for linear regression

    6. I am confused about the null hypothesis for linear regression. If a variable in a linear model has p < 0.05 p < 0.05 (when R prints out stars), I would say the variable is a statistically significant part of the model. What does that translate to in terms of null hypothesis?

  21. Writing hypothesis for linear multiple regression models

    In your hypothesis you just could had added at the end (or as a footnote): "when adjusting for weight and gear", but it is fine the way you did it. Your model interpretation is correct: It is perfect the way you did it. You could even had said: "the best estimate is that 6 cylinder vehicles travel 4 miles per gallon less than 4 cylinder ...

  22. Unlocking the Secrets: What Is a Linear Regression Model and How It Can

    Linear regression analysis involves several key steps, as mentioned below: Start by clearly defining the problem and formulating a hypothesis. Specify the linear regression model to estimate the relationship between the dependent and independent variables. Estimate the coefficients that represent the relationship between the variables.

  23. A Multiple Linear Regression Model to Estimate Global ...

    4.2 Multiple Linear Regression Model for Direct Normal Irradiance (DNI). The summary of the multiple linear regression models for DNI as a function of time, relative humidity, air temperature and maximum horizontal wind speed is presented in Table 2.Based on the analysis of variance, the F value obtained for the multiple regression model suggests that we can reject the null hypothesis and ...