R: Correlation – Finding out relation between data

This article shows how to calculate correlation between two data/variables using R programming language.

Correlation calculates the relation between two variables.

Positive Correlation: If the value of one variable increases with the increase in value of the another variable then the two variables are said to be positively correlated.

Negative Correlation: If the value of one variable decreases with the decrease in value of the another variable then the two variables are said to be negatively correlated.

For this example, we have two variables: height and weight. These variables have some values and they have a relation. In the below example data: 60kg of weight has 150cm in height, 70kg in weight has 160cm in height, and so on.

To calculate correlation between two variables, we use cor() function. It shows how positively or negatively the two variables are correlated.


> height <- c (150, 160, 140, 155, 148, 177, 167, 126, 149, 131)
> weight <- c (60, 70, 55, 55, 58, 80, 75, 45, 50, 44)

> cor.result = cor(height, weight) # calculate correlation between two variables
> print (cor.result) # height and weight are positively correlated
[1] 0.9307801

We can also specify the correlation method which can be either pearson, spearman or kendall.


> cor.result = cor(height, weight, method = 'pearson') # pearson, spearman, kendall
> print (cor.result)
[1] 0.9307801

Below, we create a data frame of height and weight. Then, we calculate the correlation of the data frame. This is useful to show how height is correlated with weight and how weight is correlated with height in a single result.


> df = data.frame(ht = height, wt = weight) # create data frame of height and weight
> cor.result = cor(df) # calculate correlation
> print (cor.result)
          ht        wt
ht 1.0000000 0.9307801
wt 0.9307801 1.0000000

Below, we create a new variable named exercise. Exercise variable contains exercise done by people in number of minutes. So, taking all three variables, it can be observed that people with 60kg of weight has 150cm in height and does 10 minutes of exercise, 70kg in weight has 160cm in height and does 7 minutes of exercise, and so on.

Now, we calculate the correlation between three different variables: height, weight, and exercise. We use data frame to show the result. We show how height is correlated with weight and exercise, how weight is correlated with height and exercise, and how exercise is correlated with height and weight.


> exercise <- c (10, 7, 6, 8, 9, 2, 4, 20, 15, 18)
> df = data.frame(height = height, weight = weight, exercise = exercise) # create data frame with height, weight, and exercise
> cor.result = cor(df)
> print (cor.result) # result shows height and weight are negatively correlated with exercise
             height     weight   exercise
height    1.0000000  0.9307801 -0.8539712
weight    0.9307801  1.0000000 -0.8779157
exercise -0.8539712 -0.8779157  1.0000000

Plotting two graphs. One with the correlation between height and exercise. And, the other with correlation between weight and exercise.


> plot (height, exercise) # plot chart
> plot (weight, exercise) # plot chart

correlation

correlation

Hope this helps. Thanks.