25 Checking your work
25.1 An introduction to testing
You’ve made a data visualization. It looks great. All the calculations necessary are in a single Rmd file – reading the data, organizing the data, and creating the figure. You can revise the data and reproduce the plot any time you want.
But then a thought occurs to you – how do I know the visualization is correct? The computer saves us time by doing calculations for us, but the price is that you can’t keep track of everything it does.
What if there is a problem with the data? Maybe the data you analyzed has a lot of observations – too many to check by hand. Or maybe the data are fine, but something went wrong when you read them into R. There are lots of ways for errors to creep in. Missing values when you thought there were none. Unexpected levels of factors (too many or too few). Detectable errors in the data such as impossible values.
The best idea to counteract all these problems is testing. The secret is to get the computer to perform the tests for you. In this lesson we will discuss two kinds of testing: checking your data and checking your calculations.
A lot of testing is done for you before you even start R: most (we hope all) the packages you use are carefully tested to be sure they work as intended. Still – you might misunderstand what the packages are supposed to do. Or you might make a typo and use the wrong variable name somewhere. Or you might get the logic of your calculation wrong. So you should test your work in as many ways as you can.
The most important reason to test is that you will catch mistakes. Possibly the second most important reason for doing testing – and being explicit about it – is that it can help the people who use your analysis have more confidence in it. This includes you in the future!
25.2 Checking analysis and visualizations
Once you have a preliminary analysis, develop a testing plan. Methods that can help:
- Check for errors in your data
- Perform your analysis on a small subset of your data that you can understand without computer help. This is the “sample calculation” method of testing
- Perform your analysis on simulated or fake “data” designed to test certain cases (independent variables, strong dependence, etc.) that will allow you to check your methods and interpretation
- Perform your calculations or visualizations two different ways, to check your understanding. This is especially useful if you have a fast and a slow way of doing a calculation. Use the slow way as a check on the fast method.
25.3 Data errors
What kinds of errors can appear in data?
- Misspellings, upper/lower case inconsistency, extra spaces
- Duplicated observations
- Miss-coded missing data (-999, 0)
- Inconsistently formatted dates and times
- Impossible values arising from typographical errors
- Data in wrong columns
- All the data can look correct, but the methods may have changed, creating “silent” errors
Why do data sometimes have errors?
- To answer this, you need to know about the process used to create the data. Were some data output by a particular machine or software package that has errors? Were the data typed in by a single person? Were many different people, who may have had different understandings of the data collection goals, involved? Were the data collected over a long period of time, when machines, people, and goals may have changed?
How can you find errors in data?
- Make summary tables and cross-check the summaries
- Look for missing data
- Check to be sure the correct number of variables (columns) and rows (observations) are present
- Plot histograms and densities
- Write tests to test data belong to a legitimate set of values
What do you do with errors?
- Keep original data, so you can revert the change in case of a misunderstanding
- Log changes and describe error
- Write tests to check for future errros
- Communicate with the people who collected the data and the people who will receive the analysis
25.3.1 Sample data to test
The following data were collected by a class of students evaluating their ability to identify a jelly bean flavour by blindfolded testing. Much of our sense of taste comes from smell, so there were two treatments: a control with the participant was blindfolded and a treatment where the participant was blindfolded and blocked their nose to reduce the sense of smell. Groups of students entered their data into a single Google docs spreadsheet which was exported to the csv file below.
This dataset has a lot of problems, but they are very typical for data entered by a group of people who are not directly involved in the systematic analysis of the data with a software package like R. (They are not attuned to the problems of data errors.) Take a look at the file and see how many problems you can find before continuing.
jelly <- read_csv(here("static/jelly-bean-data.csv")) %>% clean_names()
## Warning: Missing column names filled in: 'X7' [7], 'X8' [8], 'X9' [9],
## 'X10' [10], 'X11' [11], 'X12' [12], 'X13' [13], 'X14' [14], 'X15' [15],
## 'X16' [16], 'X17' [17], 'X18' [18], 'X19' [19], 'X20' [20], 'X21' [21],
## 'X22' [22], 'X23' [23], 'X24' [24], 'X25' [25], 'X26' [26], 'X27' [27]
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_logical(),
## Initials = col_character(),
## `Group (choose from drop down)` = col_character(),
## `Trial #` = col_double(),
## Flavour = col_character(),
## `Reaction time (in sec)` = col_character(),
## `Accuracy (0=incorrect, 1=correct)` = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
# jelly %>% kable() %>% scroll_box()
You should always look at data. For a first look, View
, skim
(in the skimr
package), and glimpse
functions are useful and will identify some problems.
glimpse(jelly)
## Rows: 261
## Columns: 27
## $ initials <chr> "PKL", "PKL", "PKL", "LAG", "LAG", "LAG…
## $ group_choose_from_drop_down <chr> "control", "control", "control", "exper…
## $ trial_number <dbl> 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, NA,…
## $ flavour <chr> "orange", "strawberry", "cherry", "oran…
## $ reaction_time_in_sec <chr> "7.2", "7.6", "10.1", "10.15", "12.24",…
## $ accuracy_0_incorrect_1_correct <dbl> 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, NA,…
## $ x7 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x8 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x9 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x10 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x11 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x12 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x13 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x14 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x15 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x16 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x17 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x18 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x19 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x20 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x21 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x22 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x23 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x24 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x25 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x26 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ x27 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
# skimr::skim(jelly)
There are many columns that have no heading and are purely missing data. Before getting rid of those columns, let’s be sure they are all missing using the miss_var_summary
function from the naniar
package.
jelly %>% miss_var_summary() %>% kable() %>% kable_styling(full_width = FALSE)
variable | n_miss | pct_miss |
---|---|---|
x7 | 261 | 100.00000 |
x8 | 261 | 100.00000 |
x9 | 261 | 100.00000 |
x10 | 261 | 100.00000 |
x11 | 261 | 100.00000 |
x12 | 261 | 100.00000 |
x13 | 261 | 100.00000 |
x14 | 261 | 100.00000 |
x15 | 261 | 100.00000 |
x16 | 261 | 100.00000 |
x17 | 261 | 100.00000 |
x18 | 261 | 100.00000 |
x19 | 261 | 100.00000 |
x20 | 261 | 100.00000 |
x21 | 261 | 100.00000 |
x22 | 261 | 100.00000 |
x23 | 261 | 100.00000 |
x24 | 261 | 100.00000 |
x25 | 261 | 100.00000 |
x26 | 261 | 100.00000 |
x27 | 261 | 100.00000 |
initials | 88 | 33.71648 |
group_choose_from_drop_down | 34 | 13.02682 |
reaction_time_in_sec | 34 | 13.02682 |
accuracy_0_incorrect_1_correct | 34 | 13.02682 |
trial_number | 33 | 12.64368 |
flavour | 33 | 12.64368 |
Now we will get rid of those data.
jelly <- jelly %>% select(-(x7:x27))
The dlookr
package provides a similar table, plus it adds a count of the number of distinct values for each variable.
diagnose(jelly) %>% kable() %>% kable_styling(full_width = FALSE)
variables | types | missing_count | missing_percent | unique_count | unique_rate |
---|---|---|---|---|---|
initials | character | 88 | 33.71648 | 75 | 0.2873563 |
group_choose_from_drop_down | character | 34 | 13.02682 | 4 | 0.0153257 |
trial_number | numeric | 33 | 12.64368 | 4 | 0.0153257 |
flavour | character | 33 | 12.64368 | 44 | 0.1685824 |
reaction_time_in_sec | character | 34 | 13.02682 | 201 | 0.7701149 |
accuracy_0_incorrect_1_correct | numeric | 34 | 13.02682 | 3 | 0.0114943 |
There are at least 12% missing values in each variable, so there may be some rows that are completely missing. If you take another look at the data you will see that a lot of rows are all NA. Let’s get rid of them. First find the rows that are all NA. First we count the number of missing data in each row (mutate) then we remove the rows where every variable is missing (filter). Whenever I clean data, I like to preview what will happen before modifying the dataset.
jelly %>% rowwise() %>%
mutate(n_na = sum(is.na(across()))) %>%
filter(n_na == ncol(jelly)) %>%
head() %>% kable()
initials | group_choose_from_drop_down | trial_number | flavour | reaction_time_in_sec | accuracy_0_incorrect_1_correct | n_na |
---|---|---|---|---|---|---|
NA | NA | NA | NA | NA | NA | 6 |
NA | NA | NA | NA | NA | NA | 6 |
NA | NA | NA | NA | NA | NA | 6 |
NA | NA | NA | NA | NA | NA | 6 |
NA | NA | NA | NA | NA | NA | 6 |
NA | NA | NA | NA | NA | NA | 6 |
Now remove them:
jelly <- jelly %>% rowwise() %>%
mutate(n_na = sum(is.na(across()))) %>%
ungroup() %>%
filter(n_na < ncol(jelly)) %>%
select(-n_na)
We still don’t know what most of thse variables mean, but already one thing should stand out. The variable accuracy_0_incorrect_1_correct
presumably should only be 0 or 1, but has three different values. We may not care about the initials of the investigator, but we might be concerned about the fact that there are 88 missing values (about 50 more than the all blank rows we removed.)
25.4 Data quality assurance
You can get report on your data using the pointblank
package. (See the documentation linked in further reading for examples.) Let’s test a few things:
- are any values of the accuracy variable not 0 or 1?
- is reaction time and trial_number numeric? (We know the answer already, but I’ll show how to test this condition.)
- are there any negative reaction times?
- are all the rows distinct? (It’s unlikely, but not impossible to get the same result twice.)
First we will write the tests.
my_test <- jelly %>%
create_agent(actions = action_levels(warn_at = 1)) %>%
col_vals_in_set(accuracy_0_incorrect_1_correct, c(0,1)) %>%
col_is_numeric(vars(trial_number, reaction_time_in_sec)) %>%
col_vals_gt(reaction_time_in_sec, 0) %>%
rows_distinct()
Now you can use the tests to get a report. (The report is not shown. Experiment with these functions on your own.)
my_test %>% interrogate()
Reaction time (in seconds) is text not a number. Let’s fix that. Probably there will be some non-numeric values in there, because otherwise the data would have been read as numeric. So let’s look for those rows first. If you try to convert text to a number and the text doesn’t look like a number, you’ll get an NA instead. So filter out the NAs that get created when you convert from text to number. R will give us a message to alert us to the fact that there are some NAs created.
jelly %>% filter(is.na(as.numeric(reaction_time_in_sec))) %>%
kable() %>% kable_styling(full_width = FALSE)
## Warning in mask$eval_all_filter(dots, env_filter): NAs introduced by coercion
initials | group_choose_from_drop_down | trial_number | flavour | reaction_time_in_sec | accuracy_0_incorrect_1_correct |
---|---|---|---|---|---|
KS | experimental | 3 | Orange | Lemon | 0 |
BP | NA | NA | NA | NA | NA |
NA | control | 3 | yellow and brown | NA | 1 |
One of those is obviously an error. (Lemon is not a number!) In a careful analysis, you would contact the person who recorded the data (that’s what the initials are for) and find out what went wrong. Same thing with the NA, where the time didn’t get recorded. For now, we’ll just throw away these numbers.
jelly <- jelly %>%
mutate(reaction_time_in_sec = as.numeric(reaction_time_in_sec)) %>%
filter(!is.na(reaction_time_in_sec))
The pointblank package also has a helpful function to produce a structured report on your whole dataset. (Again the results are not shown here.)
jelly %>% scan_data()
25.5 Data cleaning
Freshly collected and tabulated data are often “dirty”: there are errors in the data such as typographical errors, impossible values, or simply more detail than is appropriate for your analysis. Preparing a dataset for analysis is called data cleaning and it can be a complex and lengthy task that is at least as complext as analyzing data. Although there are common tasks performed in data cleaning, every dataset presents its own set of challenges. That’s the reason we usually use “clean” data in courses.
Data cleaning tasks are very individual and can take considerable creativity. Here we’ll just try a few.
Let’s look at the values of flavour
. It seems like there are a lot of them.
jelly %>% count(flavour) %>% arrange(-n) %>%
kable() %>% kable_styling(full_width = FALSE)
flavour | n |
---|---|
orange | 32 |
lemon | 23 |
cherry | 19 |
Orange | 19 |
grape | 18 |
coconut | 14 |
Cherry | 10 |
Banana | 9 |
Grape | 9 |
Lemon | 9 |
banana | 7 |
Coconut | 5 |
lime | 4 |
purple | 4 |
apple | 3 |
strawberry | 3 |
Yellow White | 3 |
blueberry | 2 |
bubblegum | 2 |
Purple | 2 |
red | 2 |
white | 2 |
White | 2 |
yellow | 2 |
yellow and white | 2 |
Yellow Brown | 2 |
Apple | 1 |
Bubblegum | 1 |
cinnamon | 1 |
cocnut | 1 |
Coffee | 1 |
lemon/lime | 1 |
licorice | 1 |
Lime | 1 |
marshmallow | 1 |
Pinapple | 1 |
Plum | 1 |
raspberry | 1 |
Red | 1 |
watermelon | 1 |
Watermelon | 1 |
yellow + white | 1 |
yellow and brown | 1 |
We definitely have some duplication. I’ll do one correction – changing all letters to lower case so differences in capitalization don’t duplicate flavours. (This simple change eliminates 13 different values of flavour.) If you look closely at the data, you will find spelling errors, different codings (and, +, or neither), and one missing flavour.
jelly %>% mutate(flavour = tolower(flavour)) %>%
count(flavour) %>% arrange(-n) %>%
kable() %>% kable_styling(full_width = FALSE)
flavour | n |
---|---|
orange | 51 |
lemon | 32 |
cherry | 29 |
grape | 27 |
coconut | 19 |
banana | 16 |
purple | 6 |
lime | 5 |
apple | 4 |
white | 4 |
bubblegum | 3 |
red | 3 |
strawberry | 3 |
yellow white | 3 |
blueberry | 2 |
watermelon | 2 |
yellow | 2 |
yellow and white | 2 |
yellow brown | 2 |
cinnamon | 1 |
cocnut | 1 |
coffee | 1 |
lemon/lime | 1 |
licorice | 1 |
marshmallow | 1 |
pinapple | 1 |
plum | 1 |
raspberry | 1 |
yellow + white | 1 |
yellow and brown | 1 |
Use what you learned to tidy the data. Here are some changes I’ll make before continuing the analysis.
jelly <- jelly %>%
mutate(flavour = tolower(flavour),
flavour = case_when(flavour == "cocnut" ~ "coconut",
TRUE ~ flavour),
flavour = str_replace(flavour, " and ", " "),
flavour = str_replace(flavour, "[/+]", " "),
flavour = str_squish(flavour))
25.6 Analyze a subset of your data
Let’s compute the average reaction time for correct and incorrect answers for each flavour and each treatment. We will count the number of observations for each grouping too.
jelly %>% group_by(flavour, group_choose_from_drop_down, accuracy_0_incorrect_1_correct) %>%
summarize(count = n(),
mean_reaction_time = mean(reaction_time_in_sec),
.groups = "drop") %>%
kable() %>% kable_styling(full_width = FALSE)
flavour | group_choose_from_drop_down | accuracy_0_incorrect_1_correct | count | mean_reaction_time |
---|---|---|---|---|
apple | experimental | 0 | 4 | 14.660000 |
banana | control | 1 | 9 | 12.616667 |
banana | experimental | 0 | 4 | 24.222500 |
banana | experimental | 1 | 3 | 17.166667 |
blueberry | control | 0 | 1 | 5.980000 |
blueberry | experimental | 0 | 1 | 20.730000 |
bubblegum | control | 0 | 1 | 7.300000 |
bubblegum | experimental | 0 | 2 | 17.620000 |
cherry | control | 0 | 4 | 17.487500 |
cherry | control | 1 | 12 | 6.785000 |
cherry | experimental | 0 | 11 | 15.835455 |
cherry | experimental | 1 | 1 | 12.320000 |
cherry | experimental | NA | 1 | 11.000000 |
cinnamon | control | 0 | 1 | 5.920000 |
coconut | control | 0 | 3 | 22.596667 |
coconut | control | 1 | 6 | 6.821667 |
coconut | experimental | 0 | 9 | 18.861111 |
coconut | experimental | 1 | 2 | 15.040000 |
coffee | experimental | 0 | 1 | 7.650000 |
grape | control | 0 | 6 | 23.903333 |
grape | control | 1 | 5 | 5.926000 |
grape | Control | 1 | 1 | 13.250000 |
grape | experimental | 0 | 12 | 17.655833 |
grape | experimental | 1 | 3 | 16.473333 |
lemon | control | 0 | 10 | 10.756000 |
lemon | control | 1 | 8 | 7.138750 |
lemon | experimental | 0 | 10 | 17.110000 |
lemon | experimental | 1 | 3 | 15.613333 |
lemon | NA | 1 | 1 | 5.460000 |
lemon lime | experimental | 1 | 1 | 10.800000 |
licorice | experimental | 0 | 1 | 8.460000 |
lime | control | 0 | 3 | 10.640000 |
lime | experimental | 0 | 2 | 12.760000 |
marshmallow | experimental | 0 | 1 | 12.050000 |
orange | control | 0 | 9 | 14.307778 |
orange | control | 1 | 16 | 6.546250 |
orange | experimental | 0 | 16 | 18.295000 |
orange | experimental | 1 | 10 | 10.086000 |
pinapple | experimental | 0 | 1 | 11.770000 |
plum | control | 0 | 1 | 15.200000 |
purple | control | 1 | 2 | 14.445000 |
purple | experimental | 0 | 3 | 14.756667 |
purple | experimental | 1 | 1 | 36.000000 |
raspberry | experimental | 0 | 1 | 16.150000 |
red | control | 1 | 1 | 12.120000 |
red | experimental | 0 | 2 | 24.000000 |
strawberry | control | 0 | 3 | 23.716667 |
watermelon | control | 0 | 2 | 23.800000 |
white | control | 1 | 2 | 3.300000 |
white | experimental | 0 | 1 | 21.500000 |
white | experimental | 1 | 1 | 7.480000 |
yellow | control | 1 | 2 | 9.265000 |
yellow brown | control | 1 | 2 | 15.800000 |
yellow brown | experimental | 0 | 1 | 9.340000 |
yellow white | control | 0 | 1 | 13.060000 |
yellow white | control | 1 | 1 | 15.500000 |
yellow white | experimental | 0 | 3 | 20.200000 |
yellow white | experimental | 1 | 1 | 11.000000 |
This is a fairly simple calculation, but it serves to demonstrate the “manual check” method. Filter out just the incorrect, experimental results for apple flavours. Check that the number of rows and average match.
jelly %>% filter(flavour == "apple",
accuracy_0_incorrect_1_correct == 0,
group_choose_from_drop_down == "experimental") %>%
kable() %>% kable_styling(full_width = FALSE)
initials | group_choose_from_drop_down | trial_number | flavour | reaction_time_in_sec | accuracy_0_incorrect_1_correct |
---|---|---|---|---|---|
FACP | experimental | 1 | apple | 14.30 | 0 |
VP | experimental | 1 | apple | 21.35 | 0 |
NA | experimental | 2 | apple | 9.99 | 0 |
Tw | experimental | 1 | apple | 13.00 | 0 |
Looks good!
25.7 Use fake data, where you already know the answer
This example will be a bit contrived, but its a good example of the idea. I’ll create some simple data and then compute the averages using exactly the same code I wrote above. I’ll put in a NA in one row just to see what happens.
my_jelly <- tribble(
~group_choose_from_drop_down, ~ flavour, ~ reaction_time_in_sec, ~ accuracy_0_incorrect_1_correct,
"experimental", "lime", 1, 1,
"experimental", "lime", 2, 1,
"experimental", "lemon", 3, 1,
"control", "lemon", 4, 1,
)
my_jelly %>% kable() %>% kable_styling(full_width = FALSE)
group_choose_from_drop_down | flavour | reaction_time_in_sec | accuracy_0_incorrect_1_correct |
---|---|---|---|
experimental | lime | 1 | 1 |
experimental | lime | 2 | 1 |
experimental | lemon | 3 | 1 |
control | lemon | 4 | 1 |
I know exactly what I’m expecting. Stop and decide for yourself before reading on.
my_jelly %>% group_by(flavour, group_choose_from_drop_down, accuracy_0_incorrect_1_correct) %>%
summarize(count = n(),
mean_reaction_time = mean(reaction_time_in_sec),
.groups = "drop") %>%
kable() %>% kable_styling(full_width = FALSE)
flavour | group_choose_from_drop_down | accuracy_0_incorrect_1_correct | count | mean_reaction_time |
---|---|---|---|---|
lemon | control | 1 | 1 | 4.0 |
lemon | experimental | 1 | 1 | 3.0 |
lime | experimental | 1 | 2 | 1.5 |
25.8 Do you calculations two different ways
Suppose we weren’t really confident that we knew what mean
does? Even simple functions like sd
(for standard deviation) and median
might not do what you think they should. How can we check it? Let’s write our own function based on sum, length, and division.
my_mean = function(x) sum(x) / length(x)
my_jelly %>% group_by(flavour,
group_choose_from_drop_down,
accuracy_0_incorrect_1_correct) %>%
summarize(count = length(flavour),
mean_reaction_time = my_mean(reaction_time_in_sec),
.groups = "drop") %>%
kable() %>% kable_styling(full_width = FALSE)
flavour | group_choose_from_drop_down | accuracy_0_incorrect_1_correct | count | mean_reaction_time |
---|---|---|---|---|
lemon | control | 1 | 1 | 4.0 |
lemon | experimental | 1 | 1 | 3.0 |
lime | experimental | 1 | 2 | 1.5 |
Excellent! We got the same answers.
That’s an introduction to testing calculations. These are good checks to make as you calculate and test your analysis. The next step is to write computer code that checks your answers as well, by giving sample data to your calculation code along with the known answers. There is a package (of course!) for helping with that: testthat
and tinytest
(links below). We won’t be using these in the course, but you should take time to learn about them after the course.
25.9 At the end of your analysis
- Provide a relatively simple dataset together with analysis results that can be used to verify your code is working the same way in the future, or that someone who develops a new way of analysing the data can use for comparison
- Document any testing processes you used for your data and for your computer code so that later users will know what problems you looked for
- Document any problems you found in the data and what steps you took to fix the problems
- Describe any weaknesses in your method which anticipate might cause problems for future users
25.10 Lessons for this course
We are not integrating testing into our work in this course. This lesson is here to alert you to this problem, ensure you know many people think carefully about this, and to show you the first steps to developing a good quality assurance plan. Be aware that testing takes time – possibly as much time as “the rest” of the work you do for data analysis.
25.11 Further reading
- Data cleaning overview
- pointblank documentation and examples
- dlookr documentation and examples
- dataMaid documentation for data cleaning
- For testing R code, look at tinytest and testthat