Goals:

Penguin example

We will make one plot, using shape and colour for a categorical variable.

p1 <- penguins |> na.omit() |>
  ggplot() +
  geom_point(aes(body_mass_g, bill_length_mm, shape = island, color = species )) + 
  theme_minimal()
p1

The species are easy to see, partly because we used colour, and partly because they are clearly separated. But the island variable is harder to see. Breaking the plot into facets can highlight the effect of island.

p1 + facet_wrap(~ island)

Or we could use species for the facets

p1 + facet_wrap(~ species)

p2 <- penguins |> na.omit() |>
  ggplot() +
  geom_point(aes(body_mass_g, bill_length_mm, shape = species, color = island )) + 
  theme_minimal()
p2 + facet_wrap(~ species)

There are two ways of using two categorical variables for facets.

First, a grid of facets.

p1 + facet_grid(species ~ island) + theme_bw()

Four of the facets have no data, so a wrapped linear layout may be more useful:

p1 + facet_wrap( ~ species + island) + theme_bw()

Effect of reshaping data

The original data:

df1 <- tribble(
  ~country,~year, ~cases, ~population,
"Afghanistan",  1999,    745,  19987071,
"Afghanistan",  2000,   2666,   20595360,
"Brazil",       1999,  37737,  172006362,
"Brazil",       2000,  80488,  174504898,
"China",        1999, 212258, 1272915272,
"China",        2000, 213766, 1280428583,
)

And a plot, where it is easy to connect year and country to aesthetics. We can use points for cases and a line for population.

df1 |> ggplot() + 
  geom_point(aes(factor(year), cases, color = country)) + 
  theme_classic()

Can we add population to this graph? We need a different vertical scale. That requires some fiddling around with something called a secondary axis. Let’s go ahead and put it on a log scale while we are making changes. (Change scale_y_log10 to scale_y_continuous to see the linear scale.)

df1 |> ggplot() + 
  geom_point(aes(factor(year), cases, color = country)) + 
  geom_line(aes(factor(year), population/1e4, group = country, color = country)) + 
  scale_y_log10(sec.axis = sec_axis(transform = ~ .*1e4, name = "Population")) +
  theme_classic()

Is there an easier way to show these data? Perhaps cases and population can be put in separate facets? To do that we need a variable that says if a number is a cases or population. So we pivot_longer!

df1 |> pivot_longer(cases:population)
## # A tibble: 12 × 4
##    country      year name            value
##    <chr>       <dbl> <chr>           <dbl>
##  1 Afghanistan  1999 cases             745
##  2 Afghanistan  1999 population   19987071
##  3 Afghanistan  2000 cases            2666
##  4 Afghanistan  2000 population   20595360
##  5 Brazil       1999 cases           37737
##  6 Brazil       1999 population  172006362
##  7 Brazil       2000 cases           80488
##  8 Brazil       2000 population  174504898
##  9 China        1999 cases          212258
## 10 China        1999 population 1272915272
## 11 China        2000 cases          213766
## 12 China        2000 population 1280428583

Now we plot:

df1 |> pivot_longer(cases:population) |>
  ggplot() + 
  geom_point(aes(factor(year), value, color = country)) + 
  facet_wrap(~ name) + 
  theme_bw()

The default is to use the same scale for the y axis on both plots. That’s usually what you want, but here we are plotting very different quantities, so we will let the facets have their own axes.

df1 |> pivot_longer(cases:population) |>
  ggplot() + 
  geom_point(aes(factor(year), value, color = country)) + 
  facet_wrap(~ name, scales = "free_y") + 
  scale_y_log10() + 
  theme_bw() + 
  labs(x = "Year", y = "")