Goals:
We will make one plot, using shape and colour for a categorical variable.
p1 <- penguins |> na.omit() |>
ggplot() +
geom_point(aes(body_mass_g, bill_length_mm, shape = island, color = species )) +
theme_minimal()
p1
The species are easy to see, partly because we used colour, and partly because they are clearly separated. But the island variable is harder to see. Breaking the plot into facets can highlight the effect of island.
p1 + facet_wrap(~ island)
Or we could use species for the facets
p1 + facet_wrap(~ species)
p2 <- penguins |> na.omit() |>
ggplot() +
geom_point(aes(body_mass_g, bill_length_mm, shape = species, color = island )) +
theme_minimal()
p2 + facet_wrap(~ species)
There are two ways of using two categorical variables for facets.
First, a grid of facets.
p1 + facet_grid(species ~ island) + theme_bw()
Four of the facets have no data, so a wrapped linear layout may be more useful:
p1 + facet_wrap( ~ species + island) + theme_bw()
The original data:
df1 <- tribble(
~country,~year, ~cases, ~population,
"Afghanistan", 1999, 745, 19987071,
"Afghanistan", 2000, 2666, 20595360,
"Brazil", 1999, 37737, 172006362,
"Brazil", 2000, 80488, 174504898,
"China", 1999, 212258, 1272915272,
"China", 2000, 213766, 1280428583,
)
And a plot, where it is easy to connect year and country to aesthetics. We can use points for cases and a line for population.
df1 |> ggplot() +
geom_point(aes(factor(year), cases, color = country)) +
theme_classic()
Can we add population to this graph? We need a different vertical
scale. That requires some fiddling around with something called a
secondary axis. Let’s go ahead and put it on a log scale while we are
making changes. (Change scale_y_log10 to
scale_y_continuous to see the linear scale.)
df1 |> ggplot() +
geom_point(aes(factor(year), cases, color = country)) +
geom_line(aes(factor(year), population/1e4, group = country, color = country)) +
scale_y_log10(sec.axis = sec_axis(transform = ~ .*1e4, name = "Population")) +
theme_classic()
Is there an easier way to show these data? Perhaps cases and population can be put in separate facets? To do that we need a variable that says if a number is a cases or population. So we pivot_longer!
df1 |> pivot_longer(cases:population)
## # A tibble: 12 × 4
## country year name value
## <chr> <dbl> <chr> <dbl>
## 1 Afghanistan 1999 cases 745
## 2 Afghanistan 1999 population 19987071
## 3 Afghanistan 2000 cases 2666
## 4 Afghanistan 2000 population 20595360
## 5 Brazil 1999 cases 37737
## 6 Brazil 1999 population 172006362
## 7 Brazil 2000 cases 80488
## 8 Brazil 2000 population 174504898
## 9 China 1999 cases 212258
## 10 China 1999 population 1272915272
## 11 China 2000 cases 213766
## 12 China 2000 population 1280428583
Now we plot:
df1 |> pivot_longer(cases:population) |>
ggplot() +
geom_point(aes(factor(year), value, color = country)) +
facet_wrap(~ name) +
theme_bw()
The default is to use the same scale for the y axis on both plots. That’s usually what you want, but here we are plotting very different quantities, so we will let the facets have their own axes.
df1 |> pivot_longer(cases:population) |>
ggplot() +
geom_point(aes(factor(year), value, color = country)) +
facet_wrap(~ name, scales = "free_y") +
scale_y_log10() +
theme_bw() +
labs(x = "Year", y = "")