R: Linear Regression – Predicting Future Data

This article shows how we can predict future data with Linear Regression using R programming language.

Regression Analysis builds a relationship model between two variables.

The general mathematical equation for linear regression is:

y = ax + b

where,

y = response variable (value we need to find out using predictor variable)
x = predictor variable (value that we know/gather through experiments)
a = constant (regression coefficient)
b = constant (regression coefficient)

For this example, we have two variables: height and weight. These variables have some values and they have a relation. In the below example data: 60kg of weight has 150cm in height, 70kg in weight has 160cm in height, and so on.

For linear regression analysis, at first we need to create a relationship model between two variables using the lm() function. After that, we can use the predict() function to predict value. The summary of the relationship can be fetched using summary() function.


> height <- c (150, 160, 140, 155, 148, 177, 167, 126, 149, 131)
> weight <- c (60, 70, 55, 55, 58, 80, 75, 45, 50, 44)

> relation <- lm(weight ~ height)
> print (relation)

Call:
lm(formula = weight ~ height)

Coefficients:
(Intercept)       height  
   -50.9324       0.7328  

> print (summary(relation))

Call:
lm(formula = weight ~ height)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.2474 -0.6721  1.1277  3.5091  3.6923 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -50.9324    15.3672  -3.314   0.0106 *  
height        0.7328     0.1018   7.201 9.23e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.755 on 8 degrees of freedom
Multiple R-squared:  0.8664,    Adjusted R-squared:  0.8496 
F-statistic: 51.86 on 1 and 8 DF,  p-value: 9.233e-05

We now insert new value to height. As you can see below that we have added 170 to the height variable. So, now we have to predict the weight for this new height value. For this, we use the relation created through lm function in above code.


> newData <- data.frame(height = 170)
> print (newData)
  height
1    170
> result <- predict(relation, newData) # predict weight when height = 170
> print (result)
       1 
73.63518 

The prediction of weight is 73.63518kg for the height of 170cm.

Plotting the scatter graph of height and weight data.


> plot(weight, height) # plot scatter graph
> plot(height, weight) # plot scatter graph

linear regression

Hope this helps. Thanks.