This is a short tutorial on how to use R.
This format indicates that the text is R code:
2 + 16
This format indicates that the text is the output of running some code:
## [1] 18
There are some tasks below. If you get stuck, try looking up the answers.
In part zero, we insalled R and R studio, and learned the difference between the console window (the ‘stage’ where commands are ‘performed’) and the scrip window (where we can edit scripts).
Start R studio now.
You can run commands in R in two ways.
First, type the following into the console window.
3 + 4
Then press the “Enter” key:
You should see the response like this:
The answer is 7!
The second way is to write code in a script file, and then send lines from there to the console window. I reccommend the second option, because it makes editing previous lines easier, and you have a record of what you did.
There’s an example of an R script in this file.
Let’s make a new script. Click File > New File > R script. A blank script should appear in the script window.
The first thing we can do is to add a note at the start of the file to say what it’s about. Any text after a hashtag will be ignored by R - these are called comments, and are useful for explaining our code to others (and ourselves!).
So we can add the following line at the top of the file (including the hashtag at the start):
# My first R script
You can assign values to variables with the assignment operator <-. Type this into the script window on a new line (each new line is a new command):
x <- 4
Then place your cursor on the line you just typed, and click the Run
button in the script window (or press Control + Enter, or Command + Enter on a mac).
The line of code should appear in the console window. If you have the ‘Environment’ window open, you should see a listing for x
appear.
x
is now a variable which stores the number 4. Programmers call this “assignment”, as in “I assigned the value 4 to the variable x”.
We can look at the value that is stored in x
by running:
x
## [1] 4
Variables can have any name (e.g. “y”, “myVariable”, “numberOfParticipants”, “experiment556”), though there are a few restrictions:
.
or _
, e.g. my.variable
or my_variable
)x
would be different to X
)Ideally, the name of the variable should make it obvious what it stores.
You can run operations on this variable, like adding, dividing or multiplying. Try typing these into the console:
x + 1
## [1] 5
x - 3
## [1] 1
x / 2
## [1] 2
x * 3
## [1] 12
Note: Some programming languages make a distinction between integer and continuous numbers, but R does not differentiate - all numbers are treated as continuous.
Note that the output shows a [1]
. You can ignore this - it’s there because R likes to do things with sequences. The code below creates a list of numbers from 1 to 40. If we run it, we see that the numbers in brackets just help us keep track of how many items are in the sequence:
1:100
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
Let’s make a list of numbers. To do that, we use the combine function c
:
nums <- c(1,4,5,5,3)
This is called a numeric vector. It’s an ordered sequence of numbers. We can also run operations on this variable:
nums *2
## [1] 2 8 10 10 6
Note that the operation applies to each item in the list.
Some functions can also be used with numeric vectors, for example getting the sum of the items. sum
is a function which takes one argument (a numeric vector), and returns the sum of all the items in that argument.
sum(nums)
## [1] 18
The funciton length
takes one argument (a numeric vector), and returns the length of the vector:
length(nums)
## [1] 5
We can also make a character vector which is just an ordered sequence of strings:
number.names <- c("one",'two','three','four','five')
Individual strings can be surrounded by single or double qutotes, as long as they match up.
You can’t use numeric operators with character vectors, but you can use other functions. nchar
is a function that takes a single argument (a charater vector) and returns the number of characters in each item:
nchar(number.names)
## [1] 3 3 5 4 4
Indexing is how we find specific data points in our data in order to do something with it. There are three ways to index items in vectors:
What is the first item in the vector nums
? We can access it by indexing the vector:
nums[1]
## [1] 1
nums[3]
## [1] 5
Note that indexing in R starts from 1 (asking for item 0 will return an empty item).
We can also extract several items at a time. Here, we ask for the first three items of nums
nums[c(1,2,3)]
## [1] 1 4 5
Note that, in R, sequentially ordered numbers can be quickly made using the colon operator:
nums[1:3]
## [1] 1 4 5
nums[2:4]
## [1] 4 5 5
nums[3:1]
## [1] 5 4 1
A boolean vector is a sequence of True and False values. For example:
y <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
You can pass boolean vectors to some funcitons. For example, sum
will return the number of TRUE
values:
sum(y)
## [1] 3
Task: can you apply the functions
sum
andlength
to boolean vectors?
You can use a boolean vector to index another vector:
nums[y]
## [1] 1 4 3
This returns each item in nums
, as long as the corresponding item in y
is TRUE
.
This is useful because boolean values can be generated by applying logical operators to vectors.
nums > 3
## [1] FALSE TRUE TRUE TRUE FALSE
This returned a boolean vector where each item was TRUE
if the corresponding item in nums
was greater than 3.
Here are some logical operators:
>
: greater than<
: less than>=
: greater or equal to<=
: less or equal to==
: is equal to (note two equals signs)!=
: is not equal toTask: make a list of booleans where the items are
TRUE
if each corresponding item ofnums
is less or equal to 3
Now we can use a boolean vector to index another vector. For example, this code returns the items of nums
where nums
is greater than 3.
nums[nums > 3]
## [1] 4 5 5
It’s worth spending some time understanding how indexing works, because it’s used often.
Another useful operator is %in%
. It tests whether items in the vector before it exist in the vector after it:
AshleyFaveNums <- c(1,3,5,7,9)
BrettFaveNums <- c(4,5,2,3)
BrettFaveNums %in% AshleyFaveNums
## [1] FALSE TRUE FALSE TRUE
Task: make a list of booleans where the items are
TRUE
if each corresponding item ofnums
is either 5 or 1. Remember, if you get stuck, you can look up the answers here: answers.
Task : Using y to index
nums
worked because the length of the two vectors are the same. What happens if you make another variabley2
which only has 3 boolean items, and try to indexnums
using that?
This task can be done with code like this:
y2 = c(TRUE, FALSE, TRUE)
nums[y2]
## [1] 1 5 5
You might have expected an error, but R assumes you mean that y2 applies to just the first three elements of nums
. This is sometimes helpful, but sometimes causes a problem that you don’t spot till later. That’s why it’s always a good idea to check often that your variables contain the data you expect when you’re building your script.
Data structures can recieve names. For example, our nums
variable could be the favourite numbers of our friends. We can create a character vector of our friends’ names, then set this as the names of the items in nums
.
friends = c("Ashley","Brett","Casey","Drew","Emery")
names(nums) = friends
nums
## Ashley Brett Casey Drew Emery
## 1 4 5 5 3
Now that each item in nums
has a name, we can index nums
by name:
nums["Brett"]
## Brett
## 4
nums[c("Brett","Casey")]
## Brett Casey
## 4 5
Task: Find the favourite numbers of all my friends who have more than 4 characters in their name.
Hint: build this up step by step. First of all, get the number of characters in each name, then test whether this number is greater than 4. This should result in a vector of booleans. Then index nums using this vector.
Go to the next tutorial
Back to the index