class: center, middle, inverse, title-slide # Data Visualization ## Using the grammar of graphics (ggplot2) ### Andrew Irwin,
a.irwin@dal.ca
### Math & Stats, Dalhousie University ### 2021-01-20 (updated: 2021-01-06) --- class: middle # ggplot2 examples A series of examples to demonstrate how to use the grammar of graphics to plan and create visualizations. Using `gss_sm` and `gss_cat` in the `socviz` package which contains a general social survey with 2867 observations (rows) and 32 variables (columns). Basic examples using selected geometries, plus some elaborations. --- class: middle, inverse # What data do you have? * One (or two) categorical: `geom_bar` * One quantitative: `geom_histogram` * One categorical and one quantitative: `geom_boxplot` * Two quantitative: `geom_point` --- class: middle ### One categorical variable ```r gss_sm %>% ggplot(aes(y = region)) + geom_bar() ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- class: middle ### Two categorical variables ```r gss_sm %>% ggplot(aes(y = region, fill = degree)) + geom_bar() ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- class: middle #### Two categorical variables, using area for counts ```r gss_sm %>% ggplot(aes(y=income16, x = happy)) + geom_count() ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- class: middle ### Same data using bars ```r gss_sm %>% ggplot(aes(y=income16, fill = happy)) + geom_bar() ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- class: middle #### Same data, too many colors ```r gss_sm %>% ggplot(aes(fill=income16, x = happy)) + geom_bar() ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- class: middle ### One quantitative variable ```r gss_sm %>% ggplot(aes(x = age)) + geom_histogram() ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- class: middle #### One categorical, one quantitative variable ```r gss_sm %>% ggplot(aes(x = age, y=factor(childs))) + * geom_boxplot() + * labs(y="Number of children") ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- class: middle #### Two quantitative variables ```r gss_cat %>% slice_sample(n=2000) %>% ggplot(aes(x = age, y= tvhours)) + * geom_point(size=0.2) ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- class: middle #### Two quantitative variables ```r gss_cat %>% slice_sample(n=2000) %>% ggplot(aes(x = age, y= tvhours)) + * geom_jitter(size=0.2) + * geom_smooth() ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- class: middle #### Two quantitative variables, show variability in one ```r gss_cat %>% ggplot(aes(x = age, y = tvhours )) + * stat_summary(fun.data = "mean_se") ``` <img src="08-ggplot-intro_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- class: middle # Summary * Showed variations on introductory example with 1 or 2 categorical and quantitative variables * Demonstrated a few embelishments (colours, standard deviations, symbol area, smooths, jitter points) * Showed some visualizations that "don't work" -- too many categories --- class: middle # Further reading * Lots more examples with different data in the course notes * "Extras" including axis labels, text on figures, other annotations --- class: middle, inverse ## Task Practice these graphs by * reproducing them, and * modifying them by changing variables used in aesthetic mappings. If you would like, use other data sources described in "Data Sources" chapter in course notes.