| Name | Birthdate | Mass_kg | Species | Friendly |
|---|---|---|---|---|
| A nonymouse | 2001-01-01 | 14.5 | NA | FALSE |
| Frank | 2008-09-01 | 4.1 | Cat | TRUE |
| Boojum | 1982-07-11 | 7.2 | Dog | TRUE |
2026-02-03
Most data originates outside of R
To use data with R, you must
Obtain a copy of the data in some format
Get R to “read” those data
Check that the data were interpreted correctly
Spreadsheet (Excel, Google Sheets, Numbers, LibreOffice Calc)
Text files (csv, tab separated, delimited)
Binary formats (various)
Self-documenting (NetCDF for geophysical data, specialized for astronomy, oceanography, computer model output)
An almost infinite variety of custom formats
Create your own excel worksheet
Environment Canada weather: https://climate.weather.gc.ca/climate_data/daily_data_e.html?StationID=50620
CO2: https://www.esrl.noaa.gov/gmd/webdata/ccgg/trends/co2/co2_mm_mlo.txt
With a header
Warning, hablar has some functions with same names as functions in dplyr. For example, need to use dplyr::na_if to use the masked na_if function.
Flle > Import dataset > From text (readr) …
Variable (column) names that contain spaces, brackets, arithmetic symbols, and other symbols
These variable names must be enclosed in backticks (e.g., `Max Temp (°C)`)
clean_names in the janitor package transforms the names to make them easier to type
Original names: Longitude (x), Date/Time, Max Temp (°C)
Cleaned names: longitude_x, date_time, max_temp_c
Multi-row headings, missing headings
Extra rows at the top of the file
Missing data coded in an “interesting” way
Multiple tabs (sheets) in a spreadsheet or workbook
Date formats
Numeric data interpreted as text
read_excel to read from spreadsheets
read_csv, read_tsv, and read_delim to read many text files
Try the Rstudio data import tool. Notice the R code generated for you.
Always check your data table in R
Lots of ways for errors to arise
Practice and ask for help if you have trouble