This is a short tutorial on how to use R.

This format indicates that the text is R code:

2 + 16

This format indicates that the text is the output of running some code:

## [1] 18

There are some tasks below. If you get stuck, try looking up the answers.

1 Recap

In part zero, we insalled R and R studio, and learned the difference between the console window (the ‘stage’ where commands are ‘performed’) and the scrip window (where we can edit scripts).

There’s a “cheatsheet” of all the commands we’ll use here.

Start R studio now.

2 Basic assignment

You can run commands in R in two ways.

First, type the following into the console window.

3 + 4

Then press the “Enter” key:

You should see the response like this:

The “>” is the console prompt. The first line shows the code you typed. The second shows the answer: 7! The third line is a new console line, with a cursor blinking. Each time you send a command to the console or get some output, this will be listed in the console window. You can see the history of what you did, but this is only a temporary store (it won’t be there when you return to R studio).

The second way to run a line of code is to write code in a script file, and then send lines from there to the console window. I reccommend the second option, because it makes editing previous lines easier, and you have a record of what you did.

There’s an example of an finished R script in this file.

Let’s make a new script. Click File > New File > R script. A blank script should appear in the script window.

The first thing we can do is to add a note at the start of the file to say what it’s about. Any text after a hashtag will be ignored by R - these are called comments, and are useful for explaining our code to others (and ourselves!).

So we can add the following line at the top of the file (including the hashtag at the start):

# My first R script

You can assign values to variables with the assignment operator <-. Type this into the script window on a new line (each new line is a new command):

x <- 4

Then place your cursor on the line you just typed, and click the Run button in the script window (or press Control + Enter, or Command + Enter on a mac).

The line of code should appear in the console window. If you have the ‘Environment’ window open, you should see a listing for x appear.

x is now a variable which stores the number 4. Programmers call this “assignment”, as in “I assigned the value 4 to the variable x”.

We can look at the value that is stored in x by running:

x
## [1] 4

Variables can have any name (e.g. “y”, “myVariable”, “numberOfParticipants”, “experiment556”), though there are a few restrictions:

  • They can’t include special characters (parentheses + - / * etc.)
  • They can’t include spaces (you can use . or _, e.g. my.variable or my_variable)
  • They are case sensitive (so x would be different to X)

Ideally, the name of the variable should make it obvious what it stores.

You can run operations on this variable, like adding, dividing or multiplying. Try typing these into the console:

x + 1
## [1] 5
x - 3
## [1] 1
x / 2
## [1] 2
x * 3
## [1] 12

Note: Some programming languages make a distinction between integer and continuous numbers, but R does not differentiate - all numbers are treated as continuous.

Note that the output shows a [1]. You can ignore this - it’s there because R likes to do things with sequences. The code below creates a list of numbers from 1 to 40. If we run it, we see that the numbers in brackets just help us keep track of how many items are in the sequence:

1:100
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100

Let’s make a list of numbers. To do that, we use the combine function c:

nums <- c(1,4,5,5,3)

This is called a numeric vector. A vector is just a fancy word for a list of things. In the example above, we’ve created an ordered sequence of numbers. We can also run operations on this variable:

nums *2
## [1]  2  8 10 10  6

Note that the operation applies to each item in the vector.

Often, we want to do particular things with the numbers in the list, such as work out the sum or the mean. To do this, we can use functions. These are little programs that take some input (called arguments) and produce some output.

For example, sum is a function which takes one argument (a numeric vector), and returns the sum of all the items in that argument. You use the function by writing the name of the function, then adding the arguments in round brackets:

sum(nums)
## [1] 18

The funciton length takes one argument (a numeric vector), and returns the length of the vector:

length(nums)
## [1] 5

We can also make a character vector which is just an ordered sequence of character strings (typed words).

number.names <- c("one",'two','three','four','five')

Individual strings can be surrounded by single or double qutotes, as long as they match up.

You can’t use numeric operators with character vectors, but you can use other functions. nchar is a function that takes a single argument (a charater vector) and returns the number of characters in each item:

nchar(number.names)
## [1] 3 3 5 4 4

3 Indexing

Indexing is how we find specific data points in our data in order to do something with it. There are three ways to index items in vectors:

  • by index number
  • with a boolean vector
  • by name

3.1 Indexing by index number

What is the first item in the vector nums? We can access it by indexing the vector:

nums[1]
## [1] 1
nums[3]
## [1] 5

Note that indexing in R starts from 1 (asking for item 0 will return an empty item).

We can also extract several items at a time. Here, we ask for the first three items of nums

nums[c(1,2,3)]
## [1] 1 4 5

Note that, in R, sequentially ordered numbers can be quickly made using the colon operator:

nums[1:3]
## [1] 1 4 5
nums[2:4]
## [1] 4 5 5
nums[3:1]
## [1] 5 4 1

3.2 Indexing with a boolean vector

A boolean vector is a sequence of True and False values. For example:

y <- c(TRUE, TRUE, FALSE, FALSE, TRUE)

You can pass boolean vectors to some funcitons. For example, sum will return the number of TRUE values:

sum(y)
## [1] 3

Task: can you apply the functions sum and length to boolean vectors?

You can use a boolean vector to index another vector:

nums[y]
## [1] 1 4 3

This returns each item in nums, as long as the corresponding item in y is TRUE.

This is useful because boolean values can be generated by applying logical operators to vectors.

nums > 3
## [1] FALSE  TRUE  TRUE  TRUE FALSE

This returned a boolean vector where each item was TRUE if the corresponding item in nums was greater than 3.

Here are some logical operators:

  • > : greater than
  • < : less than
  • >= : greater or equal to
  • <= : less or equal to
  • == : is equal to (note two equals signs)
  • != : is not equal to

Task: make a list of booleans where the items are TRUE if each corresponding item of nums is less or equal to 3

Now we can use a boolean vector to index another vector. For example, this code returns the items of nums where nums is greater than 3.

nums[nums > 3]
## [1] 4 5 5

It’s worth spending some time understanding how indexing works, because it’s used often.

Another useful operator is %in%. It tests whether items in the vector before it exist in the vector after it:

AshleyFaveNums <- c(1,3,5,7,9)
BrettFaveNums <- c(4,5,2,3)
BrettFaveNums %in% AshleyFaveNums
## [1] FALSE  TRUE FALSE  TRUE

Task: make a list of booleans where the items are TRUE if each corresponding item of nums is either 5 or 1. Remember, if you get stuck, you can look up the answers here: answers.

Task : Using y to index nums worked because the length of the two vectors are the same. What happens if you make another variable y2 which only has 3 boolean items, and try to index nums using that?

This task can be done with code like this:

y2 = c(TRUE, FALSE, TRUE)
nums[y2]
## [1] 1 5 5

You might have expected an error, but R assumes you mean that y2 applies to just the first three elements of nums. This is sometimes helpful, but sometimes causes a problem that you don’t spot till later. That’s why it’s always a good idea to check often that your variables contain the data you expect when you’re building your script.

3.3 Indexing by name

Data structures can recieve names. For example, our nums variable could be the favourite numbers of our friends. We can create a character vector of our friends’ names, then set this as the names of the items in nums.

friends = c("Ashley","Brett","Casey","Drew","Emery")
names(nums) = friends
nums
## Ashley  Brett  Casey   Drew  Emery 
##      1      4      5      5      3

Now that each item in nums has a name, we can index nums by name:

nums["Brett"]
## Brett 
##     4
nums[c("Brett","Casey")]
## Brett Casey 
##     4     5

Task: Find the favourite numbers of all my friends who have more than 4 characters in their name.

Hint: build this up step by step. First of all, get the number of characters in each name, then test whether this number is greater than 4. This should result in a vector of booleans. Then index nums using this vector.

There’s a “cheatsheet” of all the commands we use in this course here.


Go to the next tutorial

Back to the index