Using square bracket notation []
or [[]]
,
we can retrieve specific values from an object (e.g. vector, list,
data.frame, tibble, matrix or list). Within the brackets we can put the
index position, its name or a logical vector of
appropriate length. Suppose we have a vector x
and we want
to retrieve the 5th value, we can use:
x <- 1:10
x[5]
## [1] 5
which returns the value 5. Using negative index position value(s), we can drop elements from a vector. Fro example, we could drop the first element:
x[-1]
## [1] 2 3 4 5 6 7 8 9 10
Lists are also 1-dimensional vectors, and they can be indexed in the same way. However, compare the following code:
x <- list("a", "b", "c")
x[2]
## [[1]]
## [1] "b"
x[[2]]
## [1] "b"
When using single square brackets to retrieve the
second element in the list x
, we actually get back a
list (here of length 1) with the selected element (the
2nd). If the list elements contain names, this slice of the
list would also still contain the name of the second element. However,
using double square brackets, we do not get back a list
but now only the value that was stored in the second element of the
list!
1-dimensional objects like vectors and lists have to be indexed by 1
index position. 2-dimensional objects like data.frames, tibbles and
matrices have to be indexed by 2 positions:
[row-index, col-index]
. In week 4 we will see objects
(arrays) of more than 2 dimensions, so these have to be indexed by more
than 2 index positions.
Control structures allow you to control the flow of execution of a
script. They are extremely useful if you want to run a piece of code
multiple times, or if you want to run a piece a code if a certain
condition is met. Helpful control structures are conditional
statements such as the if
, if-else
,
and ifelse
statements, and loops such as
the for
, while
and repeat
loops.
For the R help documentation on these functions, see
?Control
.
Quite often, we want to execute some commands only if some specific condition is met. For this, we can use the if, if-else or ifelse statements. These statements will only execute some specified commands when a certain condition is met. However, when this condition is not met, the if statement will not do anything and thus move on to the next commands in the script, whereas the ifelse and if-else statement will specify a specific (set of) commands explicitly for when the condition is not met.
The if
, if-else
and ifelse
statements have the following syntax:
if(condition) {
# do something
}
if(condition) {
# do something
}else {
# do something else
}
ifelse(condition, statement1, statement2)
Condition
should be an expression that evaluates to
TRUE
or FALSE
(i.e., a logical
expression using relational and logical operators). If the
expression returns TRUE
, then the script between the first
set of curly brackets { }
will be executed for
if
and if-else
, or statement1
will be executed for ifelse
. If the expression returns
FALSE
, then either no code will be executed (for
if
), or the else
script between the second set
of curly brackets will be executed (for if-else
), or
statement2
will be executed (for ifelse
).
Multiple lines of code can be enclosed by the curly brackets, which are
all executed as one block. Thus, the if
and
if-else
statements are very suited for executing different
(possibly complex) scripts given a condition, yet the
ifelse
statement is very suited for simpler statements and
are well suited for vectorization.
For example, an if-else
statement could be:
x <- 3
if(x < 5) {
print("x < 5")
}else {
print("x >= 5")
}
## [1] "x < 5"
In this very simple example, when this script is executed when the
variable x
has a value that is lower than 5, the
conditional statement will print "x < 5"
to the screen,
while when the variable x
has a value that is not lower
than 5, the statement will print "x >= 5"
to the
screen.
An example of vectorization with the ifelse
statement
is:
x <- 1:4
ifelse(x <= 2, "x < 3", "x > 2")
## [1] "x < 3" "x < 3" "x > 2" "x > 2"
A loop is a sequence of instructions that is repeated until
a certain condition is been reached. for
,
while
and repeat
, with the additional clauses
break
and next
, are used to construct
loops.
In earlier tutorials, we used the functions of the tidyverse
packages that allow us to easily iterate some commands over different
sets of the data, e.g. using group_by
in combination with
mutate
or summarise
, or using the
map
functions from the purrr package. Under the hood, such
functions use for-loops to iterate over a set of values (see
here
for a section in R4DS on for-loops).
This example demonstrates the very basic use of a for-loop: it
specifies that an iterator that we call i
(but it
can be any other name) will get the values 1, 2, …, 10, and for each
iteration (thus value that object i
will get) some commands
are being executed (all commands between the curly brackets
{ }
). Here, only its value is printed to the console:
for(i in 1:10) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
Note that a for
loop will always execute the code at
least once when you use the sequence operator :
! Namely,
any sequence will than have a length of at least 1,
e.g. 1:1
or 0:0
. It can even lead to unwanted
behaviour when not being careful, as is the case in the following
code:
nrRepetitions <- 0
for(i in 1:nrRepetitions) {
# some code, here just printing the value i to the console
print(i)
}
## [1] 1
## [1] 0
Instead of repeating the code 0 times, this code actually executes
the code twice, since the :
operator can and will count
backwards!
Thus, always check whether the number of iterations in a
for
loop is zero before you run your loop if you use
:
operator. If you’re iterating over the indices of a
vector, always use function seq_along()
in preference to
the :
operator. For example, for a vector x
the function call seq_along(x)
generates a regular sequence
of the same length of x
(including 0 length!):
x <- 1:4
seq_along(x)
## [1] 1 2 3 4
seq_along(x) == c(1:4)
## [1] TRUE TRUE TRUE TRUE
A while
loop executes some code as long as a condition
is met, with the syntax:
while(condition) {
expression
}
where condition is a logical expression evaluating to
TRUE
or FALSE
, and expression can be
multiple lines of code. The while
loop will continue to
execute as long as the condition evaluates as TRUE
, for
example:
x <- 1
while(x < 5) {
print(x)
x <- x + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
Be careful with while
loops: always make sure that the
loop will end at some point!
A repeat
loop is used to iterate over a block of code
multiple numbers of times. There is no condition check in
repeat
loop to exit the loop. We thus have to put a
condition check explicitly inside the body of the loop, and can use the
break
statement to exit the loop. Failing to do so will
result in an infinite loop! An example of a repeat
loop:
x <- 1
repeat {
if(x < 5) {
print(x)
x <- x + 1
}else {
break
}
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
Another statement that allows for the control of execution of code is
the next
statement, which enables to skip the current
iteration of a loop without terminating it. Note that loops can be
nested within loops, which can be very helpful is some
iteration code has to be iterated itself, for example:
for(i in 1:4) {
# print i to screen
cat("i =", i, ">> ") # the function cat() allows to print multiple items
for(j in 1:3) {
# print j to screen
cat(j, " ")
}
# print newline
cat("\n")
}
## i = 1 >> 1 2 3
## i = 2 >> 1 2 3
## i = 3 >> 1 2 3
## i = 4 >> 1 2 3
On the other hand, some loops are based on the onset and verification of a logical condition. The condition is tested at the start or the end of the loop construct. These variants belong to the while or repeat family of loops, respectively.