## R: Linear Regression – Predicting Future Data

This article shows how we can predict future data with **Linear Regression** using **R** programming language.

**Regression Analysis** builds a relationship model between two variables.

The general mathematical equation for linear regression is:

y = ax + bwhere,

y = response variable (value we need to find out using predictor variable)

x = predictor variable (value that we know/gather through experiments)

a = constant (regression coefficient)

b = constant (regression coefficient)

For this example, we have two variables:

heightandweight. These variables have some values and they have a relation. In the below example data: 60kg of weight has 150cm in height, 70kg in weight has 160cm in height, and so on.

For linear regression analysis, at first we need to create a relationship model between two variables using the

function. After that, we can use the **lm()**

function to predict value. The summary of the relationship can be fetched using **predict()**

function.**summary()**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | > height <- c (150, 160, 140, 155, 148, 177, 167, 126, 149, 131) > weight <- c (60, 70, 55, 55, 58, 80, 75, 45, 50, 44) > relation <- lm(weight ~ height) > print (relation) Call: lm(formula = weight ~ height) Coefficients: (Intercept) height -50.9324 0.7328 > print (summary(relation)) Call: lm(formula = weight ~ height) Residuals: Min 1Q Median 3Q Max -8.2474 -0.6721 1.1277 3.5091 3.6923 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -50.9324 15.3672 -3.314 0.0106 * height 0.7328 0.1018 7.201 9.23e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.755 on 8 degrees of freedom Multiple R-squared: 0.8664, Adjusted R-squared: 0.8496 F-statistic: 51.86 on 1 and 8 DF, p-value: 9.233e-05 |

We now insert new value to height. As you can see below that we have added 170 to the height variable. So, now we have to predict the weight for this new height value. For this, we use the **relation** created through

function in above code.**lm**

1 2 3 4 5 6 7 8 | > newData <- data.frame(height = 170) > print (newData) height 1 170 > result <- predict(relation, newData) # predict weight when height = 170 > print (result) 1 73.63518 |

The prediction of weight is 73.63518kg for the height of 170cm.

Plotting the scatter graph of height and weight data.

1 2 | > plot(weight, height) # plot scatter graph > plot(height, weight) # plot scatter graph |

Hope this helps. Thanks.