Basic Introduction to R Programming [Beginner Tutorial]

This article aims to introduce the very basics of R programming. This artilce includes R programming introduction, installation, basic syntax, running code, data types, relational operators and functions.

1. Introduction

1.1 What is it?

– Programming Language Software
– Used for Statistical Analysis & Graphical Representation
– Freely available

1.2 What can it do?

– Data Analysis / Data Mining
– Data Visualization
– Data Reporting

2. Basic Coding

2.1 Installation

Here’s the list of R download package for Windows, Linux & Mac OS: https://cran.r-project.org/bin/

For Linux, you can also install R via command line. Here’s an example of installing R on Ubuntu Linux:

sudo apt-get update && sudo apt-get install r-base

Once R is installed, you can run it from terminal by running “R” command.

To run R in GUI version, you need to download and install R-Studio from https://www.rstudio.com/products/rstudio/download

2.2 Basic Syntax


> myVar <- 'Hello World!'
> print (myVar)
[1] "Hello World!"

> myVar = "Hello World!"
> print (myVar)
[1] "Hello World!"

The operator <- can be used anywhere, whereas the operator = is only allowed at the top level.

Running R file:

Rscript myScript.R

3. Data Types (R-Objects)

In R, variables are not declared as data types. R-Objects are assigned to a variable. The data type of the R-Object becomes the data type of the variable. Below are the frequently used R-Objects:

3.1 Vectors

We use c() function which combines elements into vector.


> color <- c('red', 'green', 'blue');
> print (color)
[1] "red"   "green" "blue" 

> x <- c('a','b','c','d')
> print (x[1])
[1] "a"

> x <- c(2,4,6,1,8,3)
> sortResult <- sort(x)
> print (sortResult)
[1] 1 2 3 4 6 8
> sortResult <- sort(x, decreasing = TRUE)
> print (sortResult)
[1] 8 6 4 3 2 1

3.2 Lists

List can contain different types of elements within it. A single list can be created with different elements like vectors, another list, integer, string, etc.


> myList <- list(c(1,3,4), 22, 'Kathmandu')
> print (myList)
[[1]]
[1] 1 3 4

[[2]]
[1] 22

[[3]]
[1] "Kathmandu"

> print (myList[1])
[[1]]
[1] 1 3 4

> print (myList[2])
[[1]]
[1] 22

> yourList <- list(TRUE, 5.5)
> mergedList <- c(myList, yourList) # merge lists
> print (mergedList)
[[1]]
[1] 1 3 4

[[2]]
[1] 22

[[3]]
[1] "Kathmandu"

[[4]]
[1] TRUE

[[5]]
[1] 5.5

3.3 Matrix

A matrix is a two dimensional rectangular dataset. The rows and columns number should be specified while creating the matrix.


> myMatrix = matrix( c('a','b','c',1,2,3), nrow = 2, ncol = 3, byrow = TRUE)
> print (myMatrix)
     [,1] [,2] [,3]
[1,] "a"  "b"  "c" 
[2,] "1"  "2"  "3" 

> myMatrix = matrix( c('a','b','c',1,2,3), nrow = 2, ncol = 3, byrow = FALSE)
> print (myMatrix)
     [,1] [,2] [,3]
[1,] "a"  "c"  "2" 
[2,] "b"  "1"  "3" 

3.4 Arrays

Arrays are like matrix. In array, you can specify the dimension of the array. Matrix is two dimensional only but array can be multi-dimensional. In the second example below, we create a 3x3x2 dimensional array.


> myArray <- array(data =  c('a', 'b', 'c', 1, 2, 3), dim = c(3,3))
> print (myArray)
     [,1] [,2] [,3]
[1,] "a"  "1"  "a" 
[2,] "b"  "2"  "b" 
[3,] "c"  "3"  "c" 

> myArray <- array(data =  c('a', 'b', 'c', 1, 2, 3), dim = c(3,3,2))
> print (myArray)
, , 1

     [,1] [,2] [,3]
[1,] "a"  "1"  "a" 
[2,] "b"  "2"  "b" 
[3,] "c"  "3"  "c" 

, , 2

     [,1] [,2] [,3]
[1,] "1"  "a"  "1" 
[2,] "2"  "b"  "2" 
[3,] "3"  "c"  "3" 

> marks <- c(44, 54, 67, 89, 92, 55, 88, 77, 79)
> subjects <- c('Math', 'English', 'Science') # column names/heading
> students <- c('Ram', 'Sita', 'Gita') # row names/heading
> semester <- c('Semester I', 'Semester II') # matrix names
> result <- array( data = marks, dim = c(3,3,2), dimnames = list(students, subjects, semester) )
> print (result)
, , Semester I

     Math English Science
Ram    44      89      88
Sita   54      92      77
Gita   67      55      79

, , Semester II

     Math English Science
Ram    44      89      88
Sita   54      92      77
Gita   67      55      79

3.5 Data Frame

Data Frames are tabular data objects. The difference between array & data frame is that each column of a data frame can contain different types of data. For example, we can specifically assign character data for first column and numeric data for second column.


> students <- data.frame(
+                 rollno = c(7,9,14,22),
+                 name = c('Ram', 'Sita', 'Hari', 'Radha'),
+                 gender = c('M', 'F', 'M', 'F')
+             )
> print (students)
  rollno  name gender
1      7   Ram      M
2      9  Sita      F
3     14  Hari      M
4     22 Radha      F

> students$age <- c(22,23,24,25) # add new columns to the data frame
> print (students)
  rollno  name gender age
1      7   Ram      M  22
2      9  Sita      F  23
3     14  Hari      M  24
4     22 Radha      F  25

> print (students$name) # print name column
[1] Ram   Sita  Hari  Radha
Levels: Hari Radha Ram Sita

> print (students$gender) # print gender column
[1] M F M F
Levels: F M

> studentsNew <- data.frame(
+     rollno = c(3,11),
+     name = c('John', 'Jia'),
+     gender = c('M', 'F'),
+     age = c(23, 21)
+ )
> studentsFinal <- rbind(students, studentsNew) # add new rows to the data frame
> print (studentsFinal)
  rollno  name gender age
1      7   Ram      M  22
2      9  Sita      F  23
3     14  Hari      M  24
4     22 Radha      F  25
5      3  John      M  23
6     11   Jia      F  21

4. Operators

Like other programming languages, R also has different mathematical and logical operators.

4.1 Arithmetic Operators

Used for adding, subtracting, multiplying, dividing, etc. of two or more vectors.


> a <- c(4,5,6)
> b <- c(1,2,3)
> print (a+b)
[1] 5 7 9
> print (a-b)
[1] 3 3 3
> print (a*b)
[1]  4 10 18
> print (a/b)
[1] 4.0 2.5 2.0
> print (a%b)
Error: unexpected input in "print (a%b)"
> print (a%%b)
[1] 0 1 0
> print (a^b)
[1]   4  25 216

4.2 Relational Operators

Used for comparing two vectors. Each element of first vector is compared with the corresponding element of second vector. The result is in boolean value.


> a <- c(3,5,7)
> b <- c(2,5,9)
> print (a > b) # greater than
[1]  TRUE FALSE FALSE
> print (a < b) # less than
[1] FALSE FALSE  TRUE
> print (a == b) # equal to
[1] FALSE  TRUE FALSE
> print (a <= b) # less than or equal to
[1] FALSE  TRUE  TRUE
> print (a >= b) # greater than or equal to
[1]  TRUE  TRUE FALSE
> print (a != b) # not equal to
[1]  TRUE FALSE  TRUE

4.3 Logical Operators

Compares two vectors. Each element of first vector is compared with the corresponding element of second vector. The result is in boolean value. All number greater than 1 are considered to be TRUE.


> a <- c(5, 1, TRUE, FALSE, 0)
> b <- c(9, 1, FALSE, TRUE, 0)
> print (a & b)
[1]  TRUE  TRUE FALSE FALSE FALSE
> print (a & b) # TRUE if both elements are TRUE
[1]  TRUE  TRUE FALSE FALSE FALSE
> print (a | b) # TRUE if one element is TRUE
[1]  TRUE  TRUE  TRUE  TRUE FALSE
> print (a && b) # TRUE if first element of both vectors are TRUE
[1] TRUE
> print (a || b) # TRUE if first element of any one vector is TRUE
[1] TRUE
> a <- c(5, 1, TRUE, FALSE, 0)
> b <- c(0, 1, FALSE, TRUE, 0)
> print (a && b) # TRUE if first element of both vectors are TRUE
[1] FALSE
> print (a || b) # TRUE if first element of any one vector is TRUE
[1] TRUE

5. Conditional Statements


> passedStudents <- c('Ram', 'Sita')
> if ('Ram' %in% passedStudents) {
+     print ('Ram is passed.')
+ } else if ('Hari' %in% passedStudents) {
+     print ('Hari is passed.')
+ } else {
+     print ('Ram and Hari both are failed.')
+ }
[1] "Ram is passed."

6. Loops


> num = c(1,2,3,4,5,6,7,8,9)
> for (i in num) {
+     if (i == 3) {
+         next
+     }
+     if (i == 7) {
+         break
+     }
+     print (i)
+ }
[1] 1
[1] 2
[1] 4
[1] 5
[1] 6

7. Functions

There are different built-in R functions. You can create your own function as well.

Some of the built-in functions are seq(x), mean(x), median(x), max(x), sum(x), paste(x,y), nchar(x), substring(string, first, last), etc. Here’s a quick reference to different R functions: https://cran.r-project.org/doc/contrib/Short-refcard.pdf


> print (seq(1:9))
[1] 1 2 3 4 5 6 7 8 9
> print (mean(1:5))
[1] 3
> print (median(1:5))
[1] 3
> print (sum(1:5))
[1] 15
> print (max(1,4,6,7,8,88,99,45))
[1] 99

> firstName <- 'Mukesh'
> lastName <- 'Chapagain'
> print (paste(firstName, lastName)) # concat strings
[1] "Mukesh Chapagain"
> print (paste(firstName,lastName, sep='-')) # concat strings using custom separator
[1] "Mukesh-Chapagain"
> print (nchar(firstName)) # number of character in the string
[1] 6
> print (substring(lastName, 1, 4)) # extract part of a string
[1] "Chap"

You can create your own custom function as well.


> myFunction <- function(a, b) {
+     if (a > b) {
+         return (a*b)
+     } else {
+         return (a+b)
+     }
+ }
> print (myFunction(2,3))
[1] 5
> print (myFunction(3,2))
[1] 6

Hope this helps. Thanks.