Introduction to R and RStudio

Chapter 16 Reading/writing data/scripts

Instead of just having objects in memory in R, there are often reasons to read and/or write data or objects from/to a file. Files can be referenced using either absolute or relative paths. An absolute path contains the complete directory list required to locate a file, for example, "C:/users/clarkson/myproject/data/samples.csv" is an absolute path to the file “samples.csv” in folder “C:/users/clarkson/myproject/data/”. On the other hand, a relative path points to a specific location relative to the current working directory (see section 13.3). For example, if the working directory is set to folder “C:/users/clarkson/myproject”, then the same file can be referenced using the relative path "data/samples.csv".

Although there may be times where absolute paths are needed, relative file paths are often preferred. Namely, absolute file paths often have disadvantages:

  • when you share files, another user won’t have the same directory structure as you, so they will need to recreate the file paths;
  • if you alter your directory structure, you’ll need to rewrite the paths;
  • an absolute file path will likely be longer than a relative path, so there is more scope for error.

Note that on Windows systems, if we copy a file path and paste it into our R script ‘as is’, there is a problem with the backslash file path separators. This is because in R the backslash is the ‘escape’ character — R interprets the symbol as an escape character not as a backslash. File paths in R should therefore use forward slashes!

16.1 Reading and writing data

Data from external files can be loaded into an R session (or vice versa objects can be written to a file) in a variety of ways: base R has the read.table and write.table functions to read or write tabular data into/from an R object (or their shorthand versions for comma separated files: read.csv and write.csv). For example, if there is comma-delimited data stored in file “samples.csv” in folder “data” in the working directory, then this data can be read into an R data.frame using:

dat <- read.table("data/samples.csv", sep=",", head=TRUE)

or, shorter:

dat <- read.csv("data/samples.csv")

Sometimes it is convenient to save a single R object to a file, and later to restore it into your R session. The readRDS and saveRDS functions are then very helpful!

When reading tabular data into R (or writing tabular R objects - e.g., data.frames and matrices - to files), the fread and fwrite functions in the data.table package, and functions in the readr package (e.g. read_delim, write_delim, read_csv, write_csv), offer optimized versions of read.table and write.table (and their csv versions!), which are especially helpful if the tabular data contains huge volumes of data. To read data directly from MS Excel files, use the readxl package. The foreign package can be used to read/write data stored by PSS, SAS etc.

16.2 Sourcing scripts

Until now, you have worked either directly in the R console, or executed specific commands from the script editor. In other words, you issue a command, R responds, you issue the next command, R responds, and so on. However, we can also instruct R to perform all commands in a script one after the other without waiting for additional instructions. One way we can do this is by selecting the entire script (e.g. using ctrl + a) and executing the code (e.g., ctrl + r or ctrl + enter depending on your RStudio short-cut settings). Another way of doing this is to source an entire script. To execute all code in the file “filename.r”, we can use the source function:

source('filename.r')