“If” and list structures are sometimes useful, but not always necessary. If you’re short on time, you can skip this part.

1 Recap

In part 5, we learned about missing values:

  • The “NA” value which represents values that are ‘not available’.
  • Applying functions to vectors with NA values causes some problems.
  • How to filter vectors for NA values.
  • The value inf that represents an infinitely big number. This can sometimes appear for example if we try to divide something by zero.

2 If

R includes some basic programming tools like if and for statements, and another data structure called a “list”. These aren’t always necessary, but sometimes come in helpful.

If statements run a piece of code only if a statement evaluates to TRUE.

In the code below, we pick 5 random numbers (from the normal distribution, mean of 0 and standard deviation of 1), then “print” something to the console if the first number is bigger than zero.

# First, we set a random seed, so that my results are the same as yours.
set.seed(992)
# choose 5 random numbers
x <- rnorm(5)
# take a look
x
## [1]  0.23211938  0.99640012  0.03064485 -0.32369705 -1.99207361
# is the first bigger than 0?
x[1] > 0
## [1] TRUE
if(x[1] > 0){
  x[1] <- x * 1000
}
## Warning in x[1] <- x * 1000: number of items to replace is not a multiple of
## replacement length
# take a look at x again
x
## [1] 232.11937848   0.99640012   0.03064485  -0.32369705  -1.99207361

We can also optionally include a bit of code to run if the statement is FALSE. Here we test the 5th number:

if(x[5] > 0){
  print("First number is bigger than 0")
} else{
  print("First number is not bigger than 0")
}
## [1] "First number is not bigger than 0"

3 For

A for loop is a statement which creates a vector of variables, then runs a bit of code .

# Make a list of numbers from 1 to 10
myVector <- 1:10
for(i in myVector){
  # double the number
  doublei <- i * 2
  # print the number
  print(doublei)
}
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10
## [1] 12
## [1] 14
## [1] 16
## [1] 18
## [1] 20

We read this as “for each i in myVector, do the following: double i , then print that value.”

The variable i is not defined before the for loop. Instead, i is set to the first item in the vector, then the code within the for loop is run with i having that value. Then i is given the next value in the vector, and the code is run again.

While i is still accessible outside the for loop, it is best practice to pretend that i only exists inside the loop.

Note that you can name the variable anything, not just i.

This is sometimes useful when we want to make a new list. For example, if we want to add numbers to a vector from another vector, but only if they are greater than zero.

set.seed(992)
# choose 5 random numbers
x <- rnorm(5)

# create empty vector for positive numbers
posNums <- c()

# for each item in x
for(i in x){
  # if the item is more than zero,
  if(i > 0){
    # add the item to the list posNums
    # (add a number to the list by using 'c')
    posNums <- c(posNums,i)
  }
}

# Look at x
x
## [1]  0.23211938  0.99640012  0.03064485 -0.32369705 -1.99207361
# Look at posNums
posNums
## [1] 0.23211938 0.99640012 0.03064485

However, in R there is almost always an easier and faster way to do things than using a for loop. For example, the code below does the same thing:

set.seed(992)
# choose 5 random numbers
x <- rnorm(5)
# use indexing to get positive numbers
posNums = x[x>0]

x
## [1]  0.23211938  0.99640012  0.03064485 -0.32369705 -1.99207361
posNums
## [1] 0.23211938 0.99640012 0.03064485

4 Lists

Previously, we’ve been using vectors of numbers (1 dimensional) and data frames (2 dimensional). This is usually enough for most needs, but R also has a data structure called a list. Lists work in a similar way to vectors, but the items inside them can be vectors themselves, and of varying lengths.

Imagine we have 3 students who have sat various exams. In the code below, we define an empty list, then add three vectors of scores to it (one for each student):

scores = list()
scores[[1]] <- c(76, 86, 90)
scores[[2]] <- c(70, 88)
scores[[3]] <- c(50, 59, 72, 66)

We can now access members of the list with double square bracksts:

scores[[2]]
## [1] 70 88

Or use them in a function like lapply, which applies a function to each member of the list, then returns the result.

lapply(scores, mean)
## [[1]]
## [1] 84
## 
## [[2]]
## [1] 79
## 
## [[3]]
## [1] 61.75

This returns a list where each member of the list contains the mean score for the student.


Go to the next tutorial

Back to the index