Reading data into R

Andrew Irwin, a.irwin@dal.ca

2024-02-06

Reading data

  • Most data originates outside of R

  • To use the data with R, you must

    • Obtain a copy of the data in some format

    • Get R to “read” those data

    • Check that the data were interpreted correctly

What formats do data come in?

  • Spreadsheet (Excel, Google Sheets, Numbers, LibreOffice Calc)

  • Text files (csv, tab separated, delimited)

  • Binary formats (various)

  • Self-documenting (NetCDF for geophysical data, specialized for astronomy)

  • An almost infinite variety of custom formats

Examples

Spreadsheet

library(readxl)
dt1 <- read_excel("static/L11/test-data.xlsx", sheet=1)
dt1 |> kable()
Name Birthdate Mass_kg Species Friendly
A nonymouse 2001-01-01 14.5 NA FALSE
Frank 2008-09-01 4.1 Cat TRUE
Boojum 1982-07-11 7.2 Dog TRUE

Comma separated values

library(tidyverse)
dt2 <- read_csv("static/L11/en_climate_daily_NS_8202251_2020_P1D.csv")
dt2 |> paged_table()

Tab separated values

With a header

dt3 <- read_table("static/L11/co2_mm_mlo.txt", 
                  comment="#", col_names = FALSE)
dt3 |> paged_table()

Clean-up

names(dt3) <- c("year", "month", "decimal_year", "co2_monthly", 
                "co2_deseasoned", "n_days", "sdev_days", "uncertainty_mean")
dt3 <- dt3 |> 
  mutate(n_days = dplyr::na_if(n_days, -1), 
         sdev_days = dplyr::na_if(sdev_days, -9.99),
         uncertainty_mean = dplyr::na_if(uncertainty_mean, -0.99))

Another way to convert columns

library(hablar)
dt3 <- dt3 |> convert(int(year), 
                      int(month), 
                      num(decimal_year:uncertainty_mean) ) 

Warning, hablar has some functions with same names as functions in dplyr. For example, need to use dplyr::na_if to use the masked na_if function.

Rstudio data import helper

Flle > Import dataset > From text (readr) …

Things to watch out for

  • Multi-row headings, missing headings

  • Extra rows at the top of the file

  • Missing data coded in an “interesting” way

  • Multiple tabs (sheets) in a spreadsheet or workbook

  • Date formats

  • Numeric data interpreted as text

Summary

  • read_excel to read from spreadsheets

  • read_csv, read_tsv, and read_delim to read many text files

  • Try the Rstudio data import tool. Notice the R code generated for you.

  • Always check your data table in R

  • Lots of ways for errors to arise

  • Practice, practice, practice and ask for help!

Further reading

  • Lots more examples with different data in the course notes

Task

Practice these skills by performing the tasks in this lesson’s Task.