## R: Correlation – Finding out relation between data

This article shows how to calculate correlation between two data/variables using R programming language.

**Correlation** calculates the relation between two variables.

**Positive Correlation:** If the value of one variable increases with the increase in value of the another variable then the two variables are said to be positively correlated.

**Negative Correlation:** If the value of one variable decreases with the decrease in value of the another variable then the two variables are said to be negatively correlated.

For this example, we have two variables:

heightandweight. These variables have some values and they have a relation. In the below example data: 60kg of weight has 150cm in height, 70kg in weight has 160cm in height, and so on.

To calculate correlation between two variables, we use

function. It shows how positively or negatively the two variables are correlated.**cor()**

1 2 3 4 5 6 | > height <- c (150, 160, 140, 155, 148, 177, 167, 126, 149, 131) > weight <- c (60, 70, 55, 55, 58, 80, 75, 45, 50, 44) > cor.result = cor(height, weight) # calculate correlation between two variables > print (cor.result) # height and weight are positively correlated [1] 0.9307801 |

We can also specify the correlation method which can be either pearson, spearman or kendall.

1 2 3 | > cor.result = cor(height, weight, method = 'pearson') # pearson, spearman, kendall > print (cor.result) [1] 0.9307801 |

Below, we create a data frame of height and weight. Then, we calculate the correlation of the data frame. This is useful to show how height is correlated with weight and how weight is correlated with height in a single result.

1 2 3 4 5 6 | > df = data.frame(ht = height, wt = weight) # create data frame of height and weight > cor.result = cor(df) # calculate correlation > print (cor.result) ht wt ht 1.0000000 0.9307801 wt 0.9307801 1.0000000 |

Below, we create a new variable named

exercise. Exercise variable contains exercise done by people in number of minutes. So, taking all three variables, it can be observed that people with 60kg of weight has 150cm in height and does 10 minutes of exercise, 70kg in weight has 160cm in height and does 7 minutes of exercise, and so on.

Now, we calculate the correlation between three different variables: height, weight, and exercise. We use data frame to show the result. We show how height is correlated with weight and exercise, how weight is correlated with height and exercise, and how exercise is correlated with height and weight.

1 2 3 4 5 6 7 8 | > exercise <- c (10, 7, 6, 8, 9, 2, 4, 20, 15, 18) > df = data.frame(height = height, weight = weight, exercise = exercise) # create data frame with height, weight, and exercise > cor.result = cor(df) > print (cor.result) # result shows height and weight are negatively correlated with exercise height weight exercise height 1.0000000 0.9307801 -0.8539712 weight 0.9307801 1.0000000 -0.8779157 exercise -0.8539712 -0.8779157 1.0000000 |

Plotting two graphs. One with the correlation between height and exercise. And, the other with correlation between weight and exercise.

1 2 | > plot (height, exercise) # plot chart > plot (weight, exercise) # plot chart |

Hope this helps. Thanks.