“If” and list structures are sometimes useful, but not always necessary. If you’re short on time, you can skip this part.
In part 5, we learned about missing values:
inf
that represents an infinitely big number. This can sometimes appear for example if we try to divide something by zero.R includes some basic programming tools like if
and for
statements, and another data structure called a “list”. These aren’t always necessary, but sometimes come in helpful.
If statements run a piece of code only if a statement evaluates to TRUE
.
In the code below, we pick 5 random numbers (from the normal distribution, mean of 0 and standard deviation of 1), then “print” something to the console if the first number is bigger than zero.
# First, we set a random seed, so that my results are the same as yours.
set.seed(992)
# choose 5 random numbers
x <- rnorm(5)
# take a look
x
## [1] 0.23211938 0.99640012 0.03064485 -0.32369705 -1.99207361
# is the first bigger than 0?
x[1] > 0
## [1] TRUE
if(x[1] > 0){
x[1] <- x * 1000
}
## Warning in x[1] <- x * 1000: number of items to replace is not a multiple of
## replacement length
# take a look at x again
x
## [1] 232.11937848 0.99640012 0.03064485 -0.32369705 -1.99207361
We can also optionally include a bit of code to run if the statement is FALSE
. Here we test the 5th number:
if(x[5] > 0){
print("First number is bigger than 0")
} else{
print("First number is not bigger than 0")
}
## [1] "First number is not bigger than 0"
A for
loop is a statement which creates a vector of variables, then runs a bit of code .
# Make a list of numbers from 1 to 10
myVector <- 1:10
for(i in myVector){
# double the number
doublei <- i * 2
# print the number
print(doublei)
}
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10
## [1] 12
## [1] 14
## [1] 16
## [1] 18
## [1] 20
We read this as “for each i in myVector, do the following: double i , then print that value.”
The variable i
is not defined before the for loop. Instead, i
is set to the first item in the vector, then the code within the for loop is run with i
having that value. Then i
is given the next value in the vector, and the code is run again.
While i
is still accessible outside the for loop, it is best practice to pretend that i
only exists inside the loop.
Note that you can name the variable anything, not just i
.
This is sometimes useful when we want to make a new list. For example, if we want to add numbers to a vector from another vector, but only if they are greater than zero.
set.seed(992)
# choose 5 random numbers
x <- rnorm(5)
# create empty vector for positive numbers
posNums <- c()
# for each item in x
for(i in x){
# if the item is more than zero,
if(i > 0){
# add the item to the list posNums
# (add a number to the list by using 'c')
posNums <- c(posNums,i)
}
}
# Look at x
x
## [1] 0.23211938 0.99640012 0.03064485 -0.32369705 -1.99207361
# Look at posNums
posNums
## [1] 0.23211938 0.99640012 0.03064485
However, in R there is almost always an easier and faster way to do things than using a for loop. For example, the code below does the same thing:
set.seed(992)
# choose 5 random numbers
x <- rnorm(5)
# use indexing to get positive numbers
posNums = x[x>0]
x
## [1] 0.23211938 0.99640012 0.03064485 -0.32369705 -1.99207361
posNums
## [1] 0.23211938 0.99640012 0.03064485
Previously, we’ve been using vectors of numbers (1 dimensional) and data frames (2 dimensional). This is usually enough for most needs, but R also has a data structure called a list
. Lists work in a similar way to vectors, but the items inside them can be vectors themselves, and of varying lengths.
Imagine we have 3 students who have sat various exams. In the code below, we define an empty list, then add three vectors of scores to it (one for each student):
scores = list()
scores[[1]] <- c(76, 86, 90)
scores[[2]] <- c(70, 88)
scores[[3]] <- c(50, 59, 72, 66)
We can now access members of the list with double square bracksts:
scores[[2]]
## [1] 70 88
Or use them in a function like lapply
, which applies a function to each member of the list, then returns the result.
lapply(scores, mean)
## [[1]]
## [1] 84
##
## [[2]]
## [1] 79
##
## [[3]]
## [1] 61.75
This returns a list where each member of the list contains the mean score for the student.
Go to the next tutorial
Back to the index