Name | Birthdate | Mass_kg | Species | Friendly |
---|---|---|---|---|
A nonymouse | 2001-01-01 | 14.5 | NA | FALSE |
Frank | 2008-09-01 | 4.1 | Cat | TRUE |
Boojum | 1982-07-11 | 7.2 | Dog | TRUE |
2024-02-06
Most data originates outside of R
To use the data with R, you must
Obtain a copy of the data in some format
Get R to “read” those data
Check that the data were interpreted correctly
Spreadsheet (Excel, Google Sheets, Numbers, LibreOffice Calc)
Text files (csv, tab separated, delimited)
Binary formats (various)
Self-documenting (NetCDF for geophysical data, specialized for astronomy)
An almost infinite variety of custom formats
Create your own excel worksheet
Environment Canada weather: https://climate.weather.gc.ca/climate_data/daily_data_e.html?StationID=50620
CO2: https://www.esrl.noaa.gov/gmd/webdata/ccgg/trends/co2/co2_mm_mlo.txt
With a header
Warning, hablar
has some functions with same names as functions in dplyr
. For example, need to use dplyr::na_if
to use the masked na_if
function.
Flle > Import dataset > From text (readr) …
Multi-row headings, missing headings
Extra rows at the top of the file
Missing data coded in an “interesting” way
Multiple tabs (sheets) in a spreadsheet or workbook
Date formats
Numeric data interpreted as text
read_excel
to read from spreadsheets
read_csv
, read_tsv
, and read_delim
to read many text files
Try the Rstudio data import tool. Notice the R code generated for you.
Always check your data table in R
Lots of ways for errors to arise
Practice, practice, practice and ask for help!
Practice these skills by performing the tasks in this lesson’s Task.