Grammar of Graphics practice

Author

Andrew Irwin

Published

January 27, 2026

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(socviz) # data
library(datasets) # data
library(palmerpenguins) # data

Attaching package: 'palmerpenguins'

The following objects are masked from 'package:datasets':

    penguins, penguins_raw
library(gapminder) # data

Practice making plots of these data: gss_sm, gss_lon, penguins, penguins_raw, gapminder, and many others in the datasets package.

Examine the data using View, glimpse. Read the help pages about each dataset.

glimpse(ToothGrowth)
Rows: 60
Columns: 3
$ len  <dbl> 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16.5, 16.5,…
$ supp <fct> VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, V…
$ dose <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1.0, 1.0, …
glimpse(gss_sm)
Rows: 2,867
Columns: 32
$ year        <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016…
$ id          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ ballot      <labelled> 1, 2, 3, 1, 3, 2, 1, 3, 1, 3, 2, 1, 2, 3, 2, 3, 3, 2,…
$ age         <dbl> 47, 61, 72, 43, 55, 53, 50, 23, 45, 71, 33, 86, 32, 60, 76…
$ childs      <dbl> 3, 0, 2, 4, 2, 2, 2, 3, 3, 4, 5, 4, 3, 5, 7, 2, 6, 5, 0, 2…
$ sibs        <labelled> 2, 3, 3, 3, 2, 2, 2, 6, 5, 1, 4, 4, 3, 6, 0, 1, 3, 8,…
$ degree      <fct> Bachelor, High School, Bachelor, High School, Graduate, Ju…
$ race        <fct> White, White, White, White, White, White, White, Other, Bl…
$ sex         <fct> Male, Male, Male, Female, Female, Female, Male, Female, Ma…
$ region      <fct> New England, New England, New England, New England, New En…
$ income16    <fct> $170000 or over, $50000 to 59999, $75000 to $89999, $17000…
$ relig       <fct> None, None, Catholic, Catholic, None, None, None, Catholic…
$ marital     <fct> Married, Never Married, Married, Married, Married, Married…
$ padeg       <fct> Graduate, Lt High School, High School, NA, Bachelor, NA, H…
$ madeg       <fct> High School, High School, Lt High School, High School, Hig…
$ partyid     <fct> "Independent", "Ind,near Dem", "Not Str Republican", "Not …
$ polviews    <fct> Moderate, Liberal, Conservative, Moderate, Slightly Libera…
$ happy       <fct> Pretty Happy, Pretty Happy, Very Happy, Pretty Happy, Very…
$ partners    <fct> NA, "1 Partner", "1 Partner", NA, "1 Partner", "1 Partner"…
$ grass       <fct> NA, Legal, Not Legal, NA, Legal, Legal, NA, Not Legal, NA,…
$ zodiac      <fct> Aquarius, Scorpio, Pisces, Cancer, Scorpio, Scorpio, Capri…
$ pres12      <labelled> 3, 1, 2, 2, 1, 1, NA, NA, NA, 2, NA, NA, 1, 1, 2, 1, …
$ wtssall     <dbl> 0.9569935, 0.4784968, 0.9569935, 1.9139870, 1.4354903, 0.9…
$ income_rc   <fct> Gt $170000, Gt $50000, Gt $75000, Gt $170000, Gt $170000, …
$ agegrp      <fct> Age 45-55, Age 55-65, Age 65+, Age 35-45, Age 45-55, Age 4…
$ ageq        <fct> Age 34-49, Age 49-62, Age 62+, Age 34-49, Age 49-62, Age 4…
$ siblings    <fct> 2, 3, 3, 3, 2, 2, 2, 6+, 5, 1, 4, 4, 3, 6+, 0, 1, 3, 6+, 2…
$ kids        <fct> 3, 0, 2, 4+, 2, 2, 2, 3, 3, 4+, 4+, 4+, 3, 4+, 4+, 2, 4+, …
$ religion    <fct> None, None, Catholic, Catholic, None, None, None, Catholic…
$ bigregion   <fct> Northeast, Northeast, Northeast, Northeast, Northeast, Nor…
$ partners_rc <fct> NA, 1, 1, NA, 1, 1, NA, 1, NA, 3, 1, NA, 1, NA, 0, 1, 0, N…
$ obama       <dbl> 0, 1, 0, 0, 1, 1, NA, NA, NA, 0, NA, NA, 1, 1, 0, 1, 0, 1,…
glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Aesthetic mappings to use

x, y, color, fill, shape, size, group

Geometries to try

  • geom_bar, geom_count,
  • geom_point, geom_line, geom_boxplot, geom_jitter,
  • geom_histogram, geom_density,
  • geom_hline, geom_vline, geom_rug,
  • stat_summary,
  • geom_function

Which should you use with categorical data? quantitative data? A mixture?

What happens if you use the wrong kind of data?

Take a look at the help for each geom using the Help panel in Rstudio or on the ggplot website.

Try a few simple themes:

  • theme_bw(base_size = 14)
  • theme_minimal(base_family = “serif”) # sans, mono, font name
  • theme_classic()

As we practice, we will describe each geom here and provide some examples.

Examples

Created in class using the suggestions above.

gss_sm |> ggplot(aes(y = region)) + geom_bar() + 
  theme_bw(base_size=14)

gss_sm |> ggplot() +
  geom_bar(aes(y = region, fill = degree ))

gss_sm |> ggplot() +
  geom_bar(aes(y = region, colour = degree ))

gss_sm |> ggplot() +
  geom_bar(aes(y = region, fill = degree ), color = "white")

gss_sm |> ggplot() +
  geom_bar(aes(y = region, fill = degree, color = "white"))

gss_sm |> ggplot() +
  geom_bar(aes(y = degree, fill = region ))

penguins |> ggplot() + 
  geom_histogram(aes(body_mass_g)) 
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

penguins |> ggplot() + 
  geom_histogram(aes(body_mass_g), binwidth = 225) 

penguins |> ggplot() + 
  geom_histogram(aes(y = body_mass_g)) 
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

penguins |> ggplot() + 
  geom_histogram(aes(y = body_mass_g, fill = species)) 
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

penguins |> ggplot() + 
  geom_histogram(aes(x = body_mass_g, fill = species), color = "white") 
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

penguins |> ggplot() + 
  geom_density(aes(flipper_length_mm)) 

penguins |> ggplot() + 
  geom_density(aes(flipper_length_mm), bw = 2) 

penguins |> ggplot() + 
  geom_density(aes(x = body_mass_g, fill = species)) 

penguins |> ggplot() + 
  geom_density(aes(x = body_mass_g, fill = species), alpha = 0.4) 

penguins |> ggplot() + 
  geom_density(aes(x = body_mass_g, colour = species)) 

penguins |> ggplot() + 
  geom_boxplot(aes(body_mass_g)) 
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

penguins |> ggplot() + 
  geom_boxplot(aes(y = body_mass_g)) 
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

penguins |> ggplot() + 
  geom_boxplot(aes(y = body_mass_g, x = species, fill = species)) 
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

penguins |> ggplot() + 
  geom_boxplot(aes(y = body_mass_g, x = species, fill = species), color = "white") 
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

penguins |> ggplot() + 
  geom_count(aes(species, island))

penguins |> ggplot() + 
  geom_bar(aes(species, fill = sex))

gapminder |> ggplot() + 
  geom_point(aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) +
  theme_minimal()

gapminder |> ggplot() + 
  geom_point(aes(x = log10(gdpPercap), y = lifeExp, size = pop, color = continent)) +
  theme_minimal()

gapminder |> ggplot() + 
  geom_point(aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) +
  scale_x_log10() + 
  theme_minimal()

gapminder |> filter(year == 2002) |>
  ggplot() + 
  geom_point(aes(x = gdpPercap, y = lifeExp, size = log10(pop), color = continent)) +
  scale_x_log10() + 
  theme_minimal()

gapminder |> filter(country == "Canada") |>
  ggplot() + 
  geom_line(aes(x = year, y = lifeExp, color = continent)) +
  theme_minimal()

gapminder |> filter(country == "Canada") |>
  ggplot() + 
  geom_line(aes(x = year, y = lifeExp, color = country)) +
  ylim(0, NA) + 
  theme_minimal()

gapminder |> filter(country == "Canada"  | country == "China") |>
  ggplot() + 
  geom_line(aes(x = year, y = lifeExp, color = country)) +
  ylim(0, NA) + 
  theme_minimal()

gapminder |> filter(country %in% c("Canada", "China", "Brazil", "United States")) |>
  ggplot() + 
  geom_line(aes(x = year, y = lifeExp, color = country), linewidth = 1.2) +
  geom_point(aes(x = year, y = lifeExp, color = country), size = 4, shape = "o") +
  ylim(0, NA) + 
  theme_minimal() +
  labs(x = "Year", y = "Life Expectancy at birth", 
       colour = "Country",
       title = "Life expectancy over time",
       subtitle = "From gapminder")

Previous year’s explorations

An example to warm up with

gss_sm |> ggplot(aes(y = region, color = sex)) + geom_bar()

gss_sm |> ggplot(aes(y = region, fill = sex)) + geom_bar()

glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
penguins |> ggplot(aes(x = flipper_length_mm)) + geom_histogram(binwidth = 5)
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).

penguins |> ggplot(aes(x = flipper_length_mm, fill = species)) + geom_histogram(binwidth = 5)
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).

penguins |> ggplot(aes(x = flipper_length_mm, fill = island)) + geom_histogram(binwidth = 5)
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).

Make lots of your own experiments

Can we show two categorical variables on one plot?

penguins |> ggplot(aes(species, island)) + geom_count()

penguins |> ggplot(aes(species, island)) + geom_jitter()

penguins |> ggplot(aes(species, island)) + geom_jitter(aes(color = sex)) # a bit random for my taste

penguins |> ggplot(aes(species, island)) + geom_count(aes(color = sex)) # not sure I understand this one

Make the plots look better

  • Add labels with labs(x = "...", y = "...", title = "...", color = "...", fill = "...").
penguins |> ggplot(aes(species, island, color = sex)) + geom_jitter() +
  labs(x = "Species", y = "Island", color = "Sex")

  • Make the text bigger with theme(text = element_text(size = 15)).

  • Turn off the grey shading with theme_bw()

  • Always use + to “add” parts of a ggplot together.

gapminder experiments

str(gapminder)
tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
 $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
 $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
 $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
 $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
gapminder |> ggplot(aes(x = gdpPercap, y = lifeExp )) + geom_point() +
  labs(x = "GDP per capita ($)", y = "Life expectancy (years)")

gapminder |> ggplot(aes(x = gdpPercap, y = lifeExp,
                        color = year, shape = continent)) + geom_point() +
  labs(x = "GDP per capita ($)", y = "Life expectancy (years)")

gapminder |> ggplot(aes(x = log10(gdpPercap), y = lifeExp,
                        color = year, shape = continent)) + geom_point() +
  labs(x = "GDP per capita ($)", y = "Life expectancy (years)")

Can we see how GDP per capita varies with year? And maybe continent too?

gapminder |> ggplot(aes(y = log10(gdpPercap), x = year)) + geom_point()

gapminder |> ggplot(aes(y = log10(gdpPercap), x = year)) + geom_boxplot() # ugh!!
Warning: Orientation is not uniquely specified when both the x and y aesthetics are
continuous. Picking default orientation 'x'.
Warning: Continuous x aesthetic
ℹ did you forget `aes(group = ...)`?

gapminder |> ggplot(aes(y = log10(gdpPercap), x = factor(year))) + geom_boxplot()

gapminder |> ggplot(aes(y = log10(gdpPercap), x = factor(year), fill = continent)) + geom_boxplot()

gapminder |> ggplot(aes(y = log10(gdpPercap), fill = factor(year), x = continent)) + geom_boxplot()

Examples from previous courses

Left here as more examples.

penguins |> ggplot(aes(x = sex)) + geom_bar()

penguins |> ggplot(aes(x = sex, color = species)) + geom_bar()

penguins |> ggplot(aes(x = sex, fill = species)) + geom_bar()

penguins |> ggplot(aes(fill = sex, x = species)) + geom_bar()

penguins |> ggplot(aes(fill = sex, x = species)) + geom_bar(position = "dodge") 

penguins |> ggplot(aes(x = species, y = sex)) + geom_count()

# penguins |> ggplot(aes(x = species)) + geom_count() # fails can't drqw with only one of x or y
penguins |> ggplot(aes(x = species, y = island, color = sex)) + geom_count(position = "jitter") # terrible, but fun!

penguins |> ggplot(aes(flipper_length_mm, bill_length_mm)) + geom_point()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

penguins |> ggplot(aes(flipper_length_mm, bill_length_mm, color = species)) + geom_point()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

penguins |> ggplot(aes(flipper_length_mm, bill_length_mm, color = sex)) + geom_point()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

penguins |> na.omit() |> ggplot(aes(flipper_length_mm, bill_length_mm, color = species, size = sex)) + geom_point()
Warning: Using size for a discrete variable is not advised.

penguins |> ggplot(aes(body_mass_g, bill_length_mm)) + geom_line() # works, but silly
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_line()`).

gapminder |> filter(country %in% c("China", "Canada", "India", "France", "Argentina", "Lybia")) |>
  ggplot(aes(year, lifeExp)) + geom_line()

gapminder |> filter(country %in% c("China", "Canada", "India", "France", "Argentina", "Lybia")) |>
  ggplot(aes(year, lifeExp, color = country)) + geom_line()

gapminder |> 
  ggplot(aes(year, lifeExp, group = country)) + geom_line()

gapminder |> ggplot(aes(x = continent, y = lifeExp)) + geom_boxplot()

gapminder |> ggplot(aes(x = year, y = lifeExp)) + geom_boxplot()  # doesn't work the way I hoped!
Warning: Orientation is not uniquely specified when both the x and y aesthetics are
continuous. Picking default orientation 'x'.
Warning: Continuous x aesthetic
ℹ did you forget `aes(group = ...)`?

gapminder |> ggplot(aes(x = factor(year), y = lifeExp)) + geom_boxplot()

gapminder |> ggplot(aes(x = factor(year), y = lifeExp, color = continent)) + geom_boxplot()

gapminder |> ggplot(aes(lifeExp, fill = continent)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value `binwidth`.

gapminder |> ggplot(aes(lifeExp, color = continent)) + geom_density()

gapminder |> ggplot(aes(y = lifeExp, x = continent, color = factor(year))) + stat_summary() +
  labs(x = "Continent", y = "Life Expectancy (years)", color = "Year") +
  theme_bw()
No summary function supplied, defaulting to `mean_se()`