Indexing

Using square bracket notation [] or [[]], we can retrieve specific values from an object (e.g. vector, list, data.frame, tibble, matrix or list). Within the brackets we can put the index position, its name or a logical vector of appropriate length. Suppose we have a vector x and we want to retrieve the 5th value, we can use:

x <- 1:10
x[5]
## [1] 5

which returns the value 5. Using negative index position value(s), we can drop elements from a vector. Fro example, we could drop the first element:

x[-1]
## [1]  2  3  4  5  6  7  8  9 10

Lists are also 1-dimensional vectors, and they can be indexed in the same way. However, compare the following code:

x <- list("a", "b", "c")
x[2]
## [[1]]
## [1] "b"
x[[2]]
## [1] "b"

When using single square brackets to retrieve the second element in the list x, we actually get back a list (here of length 1) with the selected element (the 2nd). If the list elements contain names, this slice of the list would also still contain the name of the second element. However, using double square brackets, we do not get back a list but now only the value that was stored in the second element of the list!

1-dimensional objects like vectors and lists have to be indexed by 1 index position. 2-dimensional objects like data.frames, tibbles and matrices have to be indexed by 2 positions: [row-index, col-index]. In week 4 we will see objects (arrays) of more than 2 dimensions, so these have to be indexed by more than 2 index positions.

Control structures

Control structures allow you to control the flow of execution of a script. They are extremely useful if you want to run a piece of code multiple times, or if you want to run a piece a code if a certain condition is met. Helpful control structures are conditional statements such as the if, if-else, and ifelse statements, and loops such as the for, while and repeat loops. For the R help documentation on these functions, see ?Control.

Conditional statements

Quite often, we want to execute some commands only if some specific condition is met. For this, we can use the if, if-else or ifelse statements. These statements will only execute some specified commands when a certain condition is met. However, when this condition is not met, the if statement will not do anything and thus move on to the next commands in the script, whereas the ifelse and if-else statement will specify a specific (set of) commands explicitly for when the condition is not met.

The if, if-else and ifelse statements have the following syntax:

if(condition) {
  # do something
}

if(condition) {
  # do something
}else {
  # do something else
}

ifelse(condition, statement1, statement2)

Condition should be an expression that evaluates to TRUE or FALSE (i.e., a logical expression using relational and logical operators). If the expression returns TRUE, then the script between the first set of curly brackets { } will be executed for if and if-else, or statement1 will be executed for ifelse. If the expression returns FALSE, then either no code will be executed (for if), or the else script between the second set of curly brackets will be executed (for if-else), or statement2 will be executed (for ifelse). Multiple lines of code can be enclosed by the curly brackets, which are all executed as one block. Thus, the if and if-else statements are very suited for executing different (possibly complex) scripts given a condition, yet the ifelse statement is very suited for simpler statements and are well suited for vectorization.

For example, an if-else statement could be:

x <- 3
if(x < 5) {
  print("x < 5")
}else {
  print("x >= 5")
}
## [1] "x < 5"

In this very simple example, when this script is executed when the variable x has a value that is lower than 5, the conditional statement will print "x < 5" to the screen, while when the variable x has a value that is not lower than 5, the statement will print "x >= 5" to the screen.

An example of vectorization with the ifelse statement is:

x <- 1:4
ifelse(x <= 2, "x < 3", "x > 2")
## [1] "x < 3" "x < 3" "x > 2" "x > 2"

Loops

A loop is a sequence of instructions that is repeated until a certain condition is been reached. for, while and repeat, with the additional clauses break and next, are used to construct loops.

For-loops

In earlier tutorials, we used the functions of the tidyverse packages that allow us to easily iterate some commands over different sets of the data, e.g. using group_by in combination with mutate or summarise, or using the map functions from the purrr package. Under the hood, such functions use for-loops to iterate over a set of values (see here for a section in R4DS on for-loops).

This example demonstrates the very basic use of a for-loop: it specifies that an iterator that we call i (but it can be any other name) will get the values 1, 2, …, 10, and for each iteration (thus value that object i will get) some commands are being executed (all commands between the curly brackets { }). Here, only its value is printed to the console:

for(i in 1:10) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Note that a for loop will always execute the code at least once when you use the sequence operator :! Namely, any sequence will than have a length of at least 1, e.g. 1:1 or 0:0. It can even lead to unwanted behaviour when not being careful, as is the case in the following code:

nrRepetitions <- 0
for(i in 1:nrRepetitions) {
  # some code, here just printing the value i to the console
  print(i)
}
## [1] 1
## [1] 0

Instead of repeating the code 0 times, this code actually executes the code twice, since the : operator can and will count backwards!

Thus, always check whether the number of iterations in a for loop is zero before you run your loop if you use : operator. If you’re iterating over the indices of a vector, always use function seq_along() in preference to the : operator. For example, for a vector x the function call seq_along(x) generates a regular sequence of the same length of x (including 0 length!):

x <- 1:4
seq_along(x)
## [1] 1 2 3 4
seq_along(x) == c(1:4)
## [1] TRUE TRUE TRUE TRUE

while-loops

A while loop executes some code as long as a condition is met, with the syntax:

while(condition) {
  expression
}

where condition is a logical expression evaluating to TRUE or FALSE, and expression can be multiple lines of code. The while loop will continue to execute as long as the condition evaluates as TRUE, for example:

x <- 1
while(x < 5) {
  print(x)
  x <- x + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4

Be careful with while loops: always make sure that the loop will end at some point!

Repeat-loops

A repeat loop is used to iterate over a block of code multiple numbers of times. There is no condition check in repeat loop to exit the loop. We thus have to put a condition check explicitly inside the body of the loop, and can use the break statement to exit the loop. Failing to do so will result in an infinite loop! An example of a repeat loop:

x <- 1
repeat {
  if(x < 5) {
    print(x)
    x <- x + 1
  }else {
    break
  }
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4

Another statement that allows for the control of execution of code is the next statement, which enables to skip the current iteration of a loop without terminating it. Note that loops can be nested within loops, which can be very helpful is some iteration code has to be iterated itself, for example:

for(i in 1:4) {
  # print i to screen
  cat("i =", i, ">> ") # the function cat() allows to print multiple items
  
  for(j in 1:3) {
    # print j to screen
    cat(j, " ")
  }
  
  # print newline
  cat("\n")
}
## i = 1 >> 1  2  3  
## i = 2 >> 1  2  3  
## i = 3 >> 1  2  3  
## i = 4 >> 1  2  3

On the other hand, some loops are based on the onset and verification of a logical condition. The condition is tested at the start or the end of the loop construct. These variants belong to the while or repeat family of loops, respectively.

Although loops can be very helpful in iteration and controlling the flow of code, in many cases there are alternatives that may be more efficient and practical, e.g. the apply family of functions: apply, sapply, mapply, lapply, rapply, tapply, vapply. See here and here for more information.