This article aims to introduce the very basics of R programming. This artilce includes R programming introduction, installation, basic syntax, running code, data types, relational operators and functions.
1. Introduction
1.1 What is it?
– Programming Language Software
– Used for Statistical Analysis & Graphical Representation
– Freely available
1.2 What can it do?
– Data Analysis / Data Mining
– Data Visualization
– Data Reporting
2. Basic Coding
2.1 Installation
Here’s the list of R download package for Windows, Linux & Mac OS: https://cran.r-project.org/bin/
For Linux, you can also install R via command line. Here’s an example of installing R on Ubuntu Linux:
sudo apt-get update && sudo apt-get install r-base
Once R is installed, you can run it from terminal by running “R” command.
To run R in GUI version, you need to download and install R-Studio from https://www.rstudio.com/products/rstudio/download
2.2 Basic Syntax
> myVar <- 'Hello World!'
> print (myVar)
[1] "Hello World!"
> myVar = "Hello World!"
> print (myVar)
[1] "Hello World!"
The operator <-
can be used anywhere, whereas the operator =
is only allowed at the top level.
Running R file:
Rscript myScript.R
3. Data Types (R-Objects)
In R, variables are not declared as data types. R-Objects are assigned to a variable. The data type of the R-Object becomes the data type of the variable. Below are the frequently used R-Objects:
3.1 Vectors
We use c()
function which combines elements into vector.
> color <- c('red', 'green', 'blue');
> print (color)
[1] "red" "green" "blue"
> x <- c('a','b','c','d')
> print (x[1])
[1] "a"
> x <- c(2,4,6,1,8,3)
> sortResult <- sort(x)
> print (sortResult)
[1] 1 2 3 4 6 8
> sortResult <- sort(x, decreasing = TRUE)
> print (sortResult)
[1] 8 6 4 3 2 1
3.2 Lists
List can contain different types of elements within it. A single list can be created with different elements like vectors, another list, integer, string, etc.
> myList <- list(c(1,3,4), 22, 'Kathmandu')
> print (myList)
[[1]]
[1] 1 3 4
[[2]]
[1] 22
[[3]]
[1] "Kathmandu"
> print (myList[1])
[[1]]
[1] 1 3 4
> print (myList[2])
[[1]]
[1] 22
> yourList <- list(TRUE, 5.5)
> mergedList <- c(myList, yourList) # merge lists
> print (mergedList)
[[1]]
[1] 1 3 4
[[2]]
[1] 22
[[3]]
[1] "Kathmandu"
[[4]]
[1] TRUE
[[5]]
[1] 5.5
3.3 Matrix
A matrix is a two dimensional rectangular dataset. The rows and columns number should be specified while creating the matrix.
> myMatrix = matrix( c('a','b','c',1,2,3), nrow = 2, ncol = 3, byrow = TRUE)
> print (myMatrix)
[,1] [,2] [,3]
[1,] "a" "b" "c"
[2,] "1" "2" "3"
> myMatrix = matrix( c('a','b','c',1,2,3), nrow = 2, ncol = 3, byrow = FALSE)
> print (myMatrix)
[,1] [,2] [,3]
[1,] "a" "c" "2"
[2,] "b" "1" "3"
3.4 Arrays
Arrays are like matrix. In array, you can specify the dimension of the array. Matrix is two dimensional only but array can be multi-dimensional. In the second example below, we create a 3x3x2 dimensional array.
> myArray <- array(data = c('a', 'b', 'c', 1, 2, 3), dim = c(3,3))
> print (myArray)
[,1] [,2] [,3]
[1,] "a" "1" "a"
[2,] "b" "2" "b"
[3,] "c" "3" "c"
> myArray <- array(data = c('a', 'b', 'c', 1, 2, 3), dim = c(3,3,2))
> print (myArray)
, , 1
[,1] [,2] [,3]
[1,] "a" "1" "a"
[2,] "b" "2" "b"
[3,] "c" "3" "c"
, , 2
[,1] [,2] [,3]
[1,] "1" "a" "1"
[2,] "2" "b" "2"
[3,] "3" "c" "3"
> marks <- c(44, 54, 67, 89, 92, 55, 88, 77, 79)
> subjects <- c('Math', 'English', 'Science') # column names/heading
> students <- c('Ram', 'Sita', 'Gita') # row names/heading
> semester <- c('Semester I', 'Semester II') # matrix names
> result <- array( data = marks, dim = c(3,3,2), dimnames = list(students, subjects, semester) )
> print (result)
, , Semester I
Math English Science
Ram 44 89 88
Sita 54 92 77
Gita 67 55 79
, , Semester II
Math English Science
Ram 44 89 88
Sita 54 92 77
Gita 67 55 79
3.5 Data Frame
Data Frames are tabular data objects. The difference between array & data frame is that each column of a data frame can contain different types of data. For example, we can specifically assign character data for first column and numeric data for second column.
> students <- data.frame(
+ rollno = c(7,9,14,22),
+ name = c('Ram', 'Sita', 'Hari', 'Radha'),
+ gender = c('M', 'F', 'M', 'F')
+ )
> print (students)
rollno name gender
1 7 Ram M
2 9 Sita F
3 14 Hari M
4 22 Radha F
> students$age <- c(22,23,24,25) # add new columns to the data frame
> print (students)
rollno name gender age
1 7 Ram M 22
2 9 Sita F 23
3 14 Hari M 24
4 22 Radha F 25
> print (students$name) # print name column
[1] Ram Sita Hari Radha
Levels: Hari Radha Ram Sita
> print (students$gender) # print gender column
[1] M F M F
Levels: F M
> studentsNew <- data.frame(
+ rollno = c(3,11),
+ name = c('John', 'Jia'),
+ gender = c('M', 'F'),
+ age = c(23, 21)
+ )
> studentsFinal <- rbind(students, studentsNew) # add new rows to the data frame
> print (studentsFinal)
rollno name gender age
1 7 Ram M 22
2 9 Sita F 23
3 14 Hari M 24
4 22 Radha F 25
5 3 John M 23
6 11 Jia F 21
4. Operators
Like other programming languages, R also has different mathematical and logical operators.
4.1 Arithmetic Operators
Used for adding, subtracting, multiplying, dividing, etc. of two or more vectors.
> a <- c(4,5,6)
> b <- c(1,2,3)
> print (a+b)
[1] 5 7 9
> print (a-b)
[1] 3 3 3
> print (a*b)
[1] 4 10 18
> print (a/b)
[1] 4.0 2.5 2.0
> print (a%b)
Error: unexpected input in "print (a%b)"
> print (a%%b)
[1] 0 1 0
> print (a^b)
[1] 4 25 216
4.2 Relational Operators
Used for comparing two vectors. Each element of first vector is compared with the corresponding element of second vector. The result is in boolean value.
> a <- c(3,5,7)
> b <- c(2,5,9)
> print (a > b) # greater than
[1] TRUE FALSE FALSE
> print (a < b) # less than
[1] FALSE FALSE TRUE
> print (a == b) # equal to
[1] FALSE TRUE FALSE
> print (a <= b) # less than or equal to
[1] FALSE TRUE TRUE
> print (a >= b) # greater than or equal to
[1] TRUE TRUE FALSE
> print (a != b) # not equal to
[1] TRUE FALSE TRUE
4.3 Logical Operators
Compares two vectors. Each element of first vector is compared with the corresponding element of second vector. The result is in boolean value. All number greater than 1 are considered to be TRUE.
> a <- c(5, 1, TRUE, FALSE, 0)
> b <- c(9, 1, FALSE, TRUE, 0)
> print (a & b)
[1] TRUE TRUE FALSE FALSE FALSE
> print (a & b) # TRUE if both elements are TRUE
[1] TRUE TRUE FALSE FALSE FALSE
> print (a | b) # TRUE if one element is TRUE
[1] TRUE TRUE TRUE TRUE FALSE
> print (a && b) # TRUE if first element of both vectors are TRUE
[1] TRUE
> print (a || b) # TRUE if first element of any one vector is TRUE
[1] TRUE
> a <- c(5, 1, TRUE, FALSE, 0)
> b <- c(0, 1, FALSE, TRUE, 0)
> print (a && b) # TRUE if first element of both vectors are TRUE
[1] FALSE
> print (a || b) # TRUE if first element of any one vector is TRUE
[1] TRUE
5. Conditional Statements
> passedStudents <- c('Ram', 'Sita')
> if ('Ram' %in% passedStudents) {
+ print ('Ram is passed.')
+ } else if ('Hari' %in% passedStudents) {
+ print ('Hari is passed.')
+ } else {
+ print ('Ram and Hari both are failed.')
+ }
[1] "Ram is passed."
6. Loops
> num = c(1,2,3,4,5,6,7,8,9)
> for (i in num) {
+ if (i == 3) {
+ next
+ }
+ if (i == 7) {
+ break
+ }
+ print (i)
+ }
[1] 1
[1] 2
[1] 4
[1] 5
[1] 6
7. Functions
There are different built-in R functions. You can create your own function as well.
Some of the built-in functions are seq(x), mean(x), median(x), max(x), sum(x), paste(x,y), nchar(x), substring(string, first, last)
, etc. Here’s a quick reference to different R functions: https://cran.r-project.org/doc/contrib/Short-refcard.pdf
> print (seq(1:9))
[1] 1 2 3 4 5 6 7 8 9
> print (mean(1:5))
[1] 3
> print (median(1:5))
[1] 3
> print (sum(1:5))
[1] 15
> print (max(1,4,6,7,8,88,99,45))
[1] 99
> firstName <- 'Mukesh'
> lastName <- 'Chapagain'
> print (paste(firstName, lastName)) # concat strings
[1] "Mukesh Chapagain"
> print (paste(firstName,lastName, sep='-')) # concat strings using custom separator
[1] "Mukesh-Chapagain"
> print (nchar(firstName)) # number of character in the string
[1] 6
> print (substring(lastName, 1, 4)) # extract part of a string
[1] "Chap"
You can create your own custom function as well.
> myFunction <- function(a, b) {
+ if (a > b) {
+ return (a*b)
+ } else {
+ return (a+b)
+ }
+ }
> print (myFunction(2,3))
[1] 5
> print (myFunction(3,2))
[1] 6
Hope this helps. Thanks.