Using the grammar of graphics (ggplot2)

Andrew Irwin, a.irwin@dal.ca

2024-01-23

ggplot2 examples

A series of examples to demonstrate how to use the grammar of graphics to plan and create visualizations.

Using gss_sm and gss_cat in the socviz package which contains a general social survey with 2867 observations (rows) and 32 variables (columns).

Basic examples using selected geometries, plus some elaborations.

General Social Survey data, 2016

A dataset containing an extract from the 2016 General Social Survey. See http://gss.norc.org/Get-Documentation for full documentation of the variables.

glimpse(gss_sm)
Rows: 2,867
Columns: 32
$ year        <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016…
$ id          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ ballot      <labelled> 1, 2, 3, 1, 3, 2, 1, 3, 1, 3, 2, 1, 2, 3, 2, 3, 3, 2,…
$ age         <dbl> 47, 61, 72, 43, 55, 53, 50, 23, 45, 71, 33, 86, 32, 60, 76…
$ childs      <dbl> 3, 0, 2, 4, 2, 2, 2, 3, 3, 4, 5, 4, 3, 5, 7, 2, 6, 5, 0, 2…
$ sibs        <labelled> 2, 3, 3, 3, 2, 2, 2, 6, 5, 1, 4, 4, 3, 6, 0, 1, 3, 8,…
$ degree      <fct> Bachelor, High School, Bachelor, High School, Graduate, Ju…
$ race        <fct> White, White, White, White, White, White, White, Other, Bl…
$ sex         <fct> Male, Male, Male, Female, Female, Female, Male, Female, Ma…
$ region      <fct> New England, New England, New England, New England, New En…
$ income16    <fct> $170000 or over, $50000 to 59999, $75000 to $89999, $17000…
$ relig       <fct> None, None, Catholic, Catholic, None, None, None, Catholic…
$ marital     <fct> Married, Never Married, Married, Married, Married, Married…
$ padeg       <fct> Graduate, Lt High School, High School, NA, Bachelor, NA, H…
$ madeg       <fct> High School, High School, Lt High School, High School, Hig…
$ partyid     <fct> "Independent", "Ind,near Dem", "Not Str Republican", "Not …
$ polviews    <fct> Moderate, Liberal, Conservative, Moderate, Slightly Libera…
$ happy       <fct> Pretty Happy, Pretty Happy, Very Happy, Pretty Happy, Very…
$ partners    <fct> NA, "1 Partner", "1 Partner", NA, "1 Partner", "1 Partner"…
$ grass       <fct> NA, Legal, Not Legal, NA, Legal, Legal, NA, Not Legal, NA,…
$ zodiac      <fct> Aquarius, Scorpio, Pisces, Cancer, Scorpio, Scorpio, Capri…
$ pres12      <labelled> 3, 1, 2, 2, 1, 1, NA, NA, NA, 2, NA, NA, 1, 1, 2, 1, …
$ wtssall     <dbl> 0.9569935, 0.4784968, 0.9569935, 1.9139870, 1.4354903, 0.9…
$ income_rc   <fct> Gt $170000, Gt $50000, Gt $75000, Gt $170000, Gt $170000, …
$ agegrp      <fct> Age 45-55, Age 55-65, Age 65+, Age 35-45, Age 45-55, Age 4…
$ ageq        <fct> Age 34-49, Age 49-62, Age 62+, Age 34-49, Age 49-62, Age 4…
$ siblings    <fct> 2, 3, 3, 3, 2, 2, 2, 6+, 5, 1, 4, 4, 3, 6+, 0, 1, 3, 6+, 2…
$ kids        <fct> 3, 0, 2, 4+, 2, 2, 2, 3, 3, 4+, 4+, 4+, 3, 4+, 4+, 2, 4+, …
$ religion    <fct> None, None, Catholic, Catholic, None, None, None, Catholic…
$ bigregion   <fct> Northeast, Northeast, Northeast, Northeast, Northeast, Nor…
$ partners_rc <fct> NA, 1, 1, NA, 1, 1, NA, 1, NA, 3, 1, NA, 1, NA, 0, 1, 0, N…
$ obama       <dbl> 0, 1, 0, 0, 1, 1, NA, NA, NA, 0, NA, NA, 1, 1, 0, 1, 0, 1,…

gss overview

glimpse(gss_cat)
Rows: 21,483
Columns: 9
$ year    <int> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 20…
$ marital <fct> Never married, Divorced, Widowed, Never married, Divorced, Mar…
$ age     <int> 26, 48, 67, 39, 25, 25, 36, 44, 44, 47, 53, 52, 52, 51, 52, 40…
$ race    <fct> White, White, White, White, White, White, White, White, White,…
$ rincome <fct> $8000 to 9999, $8000 to 9999, Not applicable, Not applicable, …
$ partyid <fct> "Ind,near rep", "Not str republican", "Independent", "Ind,near…
$ relig   <fct> Protestant, Protestant, Protestant, Orthodox-christian, None, …
$ denom   <fct> "Southern baptist", "Baptist-dk which", "No denomination", "No…
$ tvhours <int> 12, NA, 2, 4, 1, NA, 3, NA, 0, 3, 2, NA, 1, NA, 1, 7, NA, 3, 3…

What data do you have?

  • One (or two) categorical: geom_bar

  • One quantitative: geom_histogram

  • One categorical and one quantitative: geom_boxplot

  • Two quantitative: geom_point

One categorical variable

gss_sm |> ggplot(aes(y = region)) + geom_bar()

Two categorical variables

gss_sm |> ggplot(aes(y = region, fill = degree)) + geom_bar() 

Two categorical variables, using area for counts

gss_sm |> ggplot(aes(y=income16, x = happy)) + geom_count()

Same data using bars

gss_sm |> ggplot(aes(y=income16, fill = happy)) + geom_bar() 

Same data, too many colors

gss_sm |> ggplot(aes(fill=income16, x = happy)) + geom_bar() 

One quantitative variable

gss_sm |> ggplot(aes(x = age)) + geom_histogram()

1 categorical, 1 quantitative

gss_sm |> ggplot(aes(x = age, y=factor(childs))) + 
  geom_boxplot() + 
  labs(y="Number of children") 

Two quantitative variables

gss_cat |> slice_sample(n=2000) |> 
  ggplot(aes(x = age, y= tvhours)) + 
  geom_point(size=0.2) 

Two quantitative variables

gss_cat |> slice_sample(n=2000) |> 
  ggplot(aes(x = age, y= tvhours)) + 
  geom_jitter(size=0.2) + 
  geom_smooth() 

2 quantitative, show variability in one

gss_cat |> ggplot(aes(x = age, y = tvhours )) + 
  stat_summary(fun.data = "mean_se")

Summary

  • Showed variations on introductory example with 1 or 2 categorical and quantitative variables

  • Demonstrated a few embelishments (colours, standard deviations, symbol area, smooths, jitter points)

  • Showed some visualizations that “don’t work” – too many categories

Further reading

  • Lots more examples with different data in the course notes

  • “Extras” including axis labels, text on figures, other annotations

Task

Practice these graphs by

  • reproducing them, and

  • modifying them by changing variables used in aesthetic mappings.

If you would like, use other data sources described in “Data Sources” chapter in course notes.