+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization

Checking your work

Andrew Irwin,

Math & Stats, Dalhousie University

2021-03-05 (updated: 2021-03-09)

1 / 15

Plan

  • Why is testing important?

  • Testing data

  • Testing code

  • Examples

  • Application in this course

2 / 15

Data error

3 / 15

Computing error

4 / 15

Why is testing important?

  • Visualizations are powerful and help people draw conclusions

  • Data errors and misunderstandings can corrupt your work

  • Mistakes in analysis (summarize, grouping, calculations) can lead to wrong conclusions

  • Checking (manually and automatically) your work improves confidence

  • If you return to a project later, or give it to someone else, misunderstandings can lead to misinterpretation

5 / 15

Testing data

Things to check

  • Were the data read correctly by R (numbers, text, dates, missing values)?

  • Are the expected numbers of rows and columns present? Is there a way to check?

  • Are any values impossible? (Negative counts or lengths.)

  • Changes in units? (Lots of "outliers") Human-coded numbers (commas, spaces, scales like M, K for millions and thousands)

  • Spelling errors, abbreviations, or variants? Capitalization. Extra spaces.

  • Duplicated data?

  • Date formatting. Times and time zones.

6 / 15

How to test data?

Most powerful and easiest to use techniques:

  • Summary tables: counts, means, ranges

  • Simple visualizations: histograms, boxplots, scatter plots

Check the "obvious" things.

7 / 15

Testing calculations

  • Test your code on sample or simulated data

  • Perform a part of the calculation by hand to independently check a result

  • Positive and negative controls (just like experiments)

  • Provide a test dataset and correct report for future users to check

8 / 15

Example: Jelly bean data

Key variables: treatment, flavour, reaction time, accuracy.

jelly %>% count(treatment) %>% kable() %>% kable_styling(full_width = FALSE)
treatment n
control 112
Control 1
experimental 114
NA 1
9 / 15

Example: Jelly bean data

jelly %>% count(flavour) %>% kable() %>% kable_styling(full_width = FALSE)
flavour n
apple 3
Apple 1
banana 7
Banana 9
blueberry 2
bubblegum 2
Bubblegum 1
cherry 19
Cherry 10
cinnamon 1
cocnut 1
coconut 14
Coconut 5
Coffee 1
grape 18
Grape 9
lemon 23
Lemon 9
lemon/lime 1
licorice 1
lime 4
Lime 1
marshmallow 1
orange 32
Orange 20
Pinapple 1
Plum 1
purple 4
Purple 2
raspberry 1
red 2
Red 1
strawberry 3
watermelon 1
Watermelon 1
white 2
White 2
yellow 2
yellow + white 1
yellow and brown 2
yellow and white 2
Yellow Brown 2
Yellow White 3
10 / 15

Example: Jelly bean data

jelly %>% count(tolower(flavour)) %>% kable() %>% kable_styling(full_width = FALSE)
tolower(flavour) n
apple 4
banana 16
blueberry 2
bubblegum 3
cherry 29
cinnamon 1
cocnut 1
coconut 19
coffee 1
grape 27
lemon 32
lemon/lime 1
licorice 1
lime 5
marshmallow 1
orange 52
pinapple 1
plum 1
purple 6
raspberry 1
red 3
strawberry 3
watermelon 2
white 4
yellow 2
yellow + white 1
yellow and brown 2
yellow and white 2
yellow brown 2
yellow white 3
11 / 15

Example: Sorting

jelly %>% count(tolower(flavour)) %>% arrange(n) %>%
kable() %>% kable_styling(full_width = FALSE)
tolower(flavour) n
cinnamon 1
cocnut 1
coffee 1
lemon/lime 1
licorice 1
marshmallow 1
pinapple 1
plum 1
raspberry 1
yellow + white 1
blueberry 2
watermelon 2
yellow 2
yellow and brown 2
yellow and white 2
yellow brown 2
bubblegum 3
red 3
strawberry 3
yellow white 3
apple 4
white 4
lime 5
purple 6
banana 16
coconut 19
grape 27
cherry 29
lemon 32
orange 52
12 / 15

Summary: testing in this course

  • We are not routinely testing data or code in this course

  • You should be aware of the way errors get into analyses and that there are methods for guarding against them

  • You should do some checking of the data in your term project, but it's not an assigned part of the work

13 / 15

Further reading

  • Course notes
14 / 15

Task

  • No task for this lesson
15 / 15

Plan

  • Why is testing important?

  • Testing data

  • Testing code

  • Examples

  • Application in this course

2 / 15
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow