Chapter 1 R and RStudio
The free and open-source software R is widely used in many fields of science and beyond. It is an extremely versatile programming language, as well as an interactive environment for data exploration and statistical computing, especially when combined with the functionality that RStudio provides.
If you have not installed R and RStudio yet, you can download the latest versions from the R and RStudio websites.
R is basically a scripting language, providing a means to make and run scripts. Scripting is essential for quality control and transparency of data processing, and it is more and more a requirement to ensure transparency and repeatability of data processing in science. Our end goal should not just be to “do stuff”, but to do it in a way that anyone can easily and exactly replicate our workflow and results. The best way to achieve this is to write scripts.
R is a dynamic or interpreted programming language, which means that - contrary to compiled languages like C++ - you don’t need a compiler to first create a program from your code before you can use it. R interprets your code directly, so that you simply can write code and run it. This makes the development cycle fast and easy.
RStudio is more than simply a graphical user interface (GUI) for R; it is an open source integrated development environment (IDE) that includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging, workspace management and version control.
Always load R through the RStudio IDE!
1.1 Pane overview
RStudio displays 4 panels (or panes, windows) in which different types of content is displayed. In the default setting, the top-left window contains a script editor, the console is at the bottom-left, the environment window (showing what is stored in memory) is at the top-right, and a plotting window is bottom-right. Some panels have multiple tabs that include other useful features such as help and information on (available and loaded) packages etc:
The console is the heart of R. Here is where R actually evaluates code. If the last character you see is >
(a prompt) it indicates that R is waiting for new input (and thus has finished any prior task). You can type code directly into the console after the prompt and get an immediate response. For example, if you type 1+1
into the console and press enter, you’ll see that R immediately gives an output of 2.
In the console, if instead of R’s prompt symbol >
you see the symbol +
) then it means that R expects you to complete the current command! If you want to abort the command (e.g. when the script is wrong, for example when you do not have matching closing brackets), you can hit the Esc
key on the keyboard when your cursor is at the console.
The script editor that lets you work with source script files. Here, you can enter multiple lines of code, and save your script file to disk (R scripts are just text files; save them with the .r extension). The RStudio script editor recognizes and highlights various elements of your code, for example using different colours for different elements, and it also helps you find matching brackets in your scripts.
You can change the location of panels and what tabs are shown under View > Panes > Pane Layout. In Tools > Global options > Appearance, you can change to looks of the GUI, e.g., used colours.
Instead of typing directly into the console, it is thus better to enter the commands in the script editor. This way, R commands can be recorded for future reference. To execute some code, you can either select the code you wish to execute and click on the Run button on the top right of the script panel, or press a hot-key such as “Ctrl + Enter” or “Ctrl + r” on a Windows pc (“Command + Return” on an Apple pc; below we will assume you work on a Windows machine). To see all shortcuts in RStudio, check “Alt + Shift + k” (or Tools > Keyboard Shortcut help). To facilitate reproducibility of your project, write most of your code in script, and only type directly into the console to de-bug or do quick analyses (i.e., small tasks that do not need to be saved for future reference).
Most of the time working in a project you will be working on scripts in the scripts editor panel, and check output in the console or plot panel. It is always good practice to make ample use of the help tab, which provides the help menu for R functions. To quickly access the help file associated to some function, use the The help()
function and ?
help operator. For example, if you want to retrieve the documentation of the function lm
, you could enter the command help(lm)
or help("lm")
, or ?lm
or ?"lm"
(i.e., the quotes are optional).
1.2 Configuring RStudio
It is advised to configure the settings of RStudio before you start working on a project. By default, RStudio re-uses / restores projects, saves history, and asks on exit whether or not to save the workspace to file. Via Tools > Global Options > General you can configure RStudio (see here for explanation of the options). If you keep things around in your workspace, things will get messy, and unexpected things will happen. It is therefore good practice to uncheck all restore checkboxes and set the Save workspace to .RData on exit to Never, so that when you start RStudio you start with a clean sheet:
This forces you to work in a clean and structured way, thereby increasing the transparency and reproducibility of your project! This is not only benefiting reproducible science, it also will help your future self: when you resume a project after some break, you can easily pick up right where you left off when you work in a transparent and reproducible way.