Demo slide presentations

Andrew Irwin

2024-03-12

First slide

  • Try the demo document from File > New File … > Quarto presentation …

Second slide

Title at the top of the page

Add code just like in regular markdown. Hide a code block with echo = FALSE.

Many other options are available.: eval, echo, output, warning, error.

species n
Adelie 152
Chinstrap 68
Gentoo 124

Linear model formula

Make a linear model that predicts flipper length from bill length.

Linear model formula

Show code with echo = TRUE.

m1 <- lm( bill_length_mm ~ flipper_length_mm, data = penguins)
broom::tidy(m1, conf.int = TRUE) |> knitr::kable(digits = 3)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) -7.265 3.200 -2.270 0.024 -13.559 -0.970
flipper_length_mm 0.255 0.016 16.034 0.000 0.224 0.286

Linear model formula

m2 <- lm( flipper_length_mm ~ bill_length_mm, data = penguins)
broom::tidy(m2, conf.int = TRUE) |> knitr::kable(digits = 3)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 126.684 4.665 27.156 0 117.508 135.860
bill_length_mm 1.690 0.105 16.034 0 1.483 1.897

Change background colour

And the text changes color too

Make large text

What is data visualization?

Two columns

Stuff on the left

Stuff on the right

Another way to do 2 columns

List One

  • Item A
  • Item B
  • Item C

List Two

  • Item X
  • Item Y
  • Item Z

Exercises

The next few slides are some dplyr and visualization exercises.

Translating English to dplyr

Make a table that counts the number of observations per country.

library(gapminder)
gapminder |> group_by(country) |> summarize(n_observations = n())
# A tibble: 142 × 2
   country     n_observations
   <fct>                <int>
 1 Afghanistan             12
 2 Albania                 12
 3 Algeria                 12
 4 Angola                  12
 5 Argentina               12
 6 Australia               12
 7 Austria                 12
 8 Bahrain                 12
 9 Bangladesh              12
10 Belgium                 12
# ℹ 132 more rows
gapminder |> count(country)
# A tibble: 142 × 2
   country         n
   <fct>       <int>
 1 Afghanistan    12
 2 Albania        12
 3 Algeria        12
 4 Angola         12
 5 Argentina      12
 6 Australia      12
 7 Austria        12
 8 Bahrain        12
 9 Bangladesh     12
10 Belgium        12
# ℹ 132 more rows
gapminder |> count(is.na(country))
# A tibble: 1 × 2
  `is.na(country)`     n
  <lgl>            <int>
1 FALSE             1704
gapminder |> group_by(country) |> 
  summarize(n_missing_lifeExp = sum(is.na(lifeExp)) )
# A tibble: 142 × 2
   country     n_missing_lifeExp
   <fct>                   <int>
 1 Afghanistan                 0
 2 Albania                     0
 3 Algeria                     0
 4 Angola                      0
 5 Argentina                   0
 6 Australia                   0
 7 Austria                     0
 8 Bahrain                     0
 9 Bangladesh                  0
10 Belgium                     0
# ℹ 132 more rows

Translating English to dplyr

Find the country with the highest and lowest life expectancy in each year.

table3 <- gapminder |> group_by(year) |>
  mutate(max_life_exp = max(lifeExp),
         min_life_exp = min(lifeExp)) |>
  filter(near(lifeExp, max_life_exp) | near(lifeExp, min_life_exp)) 
table3 |> kable() |> kable_styling()
country continent year lifeExp pop gdpPercap max_life_exp min_life_exp
Afghanistan Asia 1952 28.801 8425333 779.4453 72.670 28.801
Afghanistan Asia 1957 30.332 9240934 820.8530 73.470 30.332
Afghanistan Asia 1962 31.997 10267083 853.1007 73.680 31.997
Afghanistan Asia 1967 34.020 11537966 836.1971 74.160 34.020
Angola Africa 1987 39.906 7874230 2430.2083 78.670 39.906
Cambodia Asia 1977 31.220 6978607 524.9722 76.110 31.220
Iceland Europe 1957 73.470 165110 9244.0014 73.470 30.332
Iceland Europe 1962 73.680 182053 10350.1591 73.680 31.997
Iceland Europe 1977 76.110 221823 19654.9625 76.110 31.220
Japan Asia 1982 77.110 118454974 19384.1057 77.110 38.445
Japan Asia 1987 78.670 122091325 22375.9419 78.670 39.906
Japan Asia 1992 79.360 124329269 26824.8951 79.360 23.599
Japan Asia 1997 80.690 125956499 28816.5850 80.690 36.087
Japan Asia 2002 82.000 127065841 28604.5919 82.000 39.193
Japan Asia 2007 82.603 127467972 31656.0681 82.603 39.613
Norway Europe 1952 72.670 3327728 10095.4217 72.670 28.801
Rwanda Africa 1992 23.599 7290203 737.0686 79.360 23.599
Rwanda Africa 1997 36.087 7212583 589.9445 80.690 36.087
Sierra Leone Africa 1972 35.400 2879013 1353.7598 74.720 35.400
Sierra Leone Africa 1982 38.445 3464522 1465.0108 77.110 38.445
Swaziland Africa 2007 39.613 1133066 4513.4806 82.603 39.613
Sweden Europe 1967 74.160 7867931 15258.2970 74.160 34.020
Sweden Europe 1972 74.720 8122293 17832.0246 74.720 35.400
Zambia Africa 2002 39.193 10595811 1071.6139 82.000 39.193

Translating English to dplyr

Graph these data.

table3 |> ggplot(aes(y = country, x = lifeExp, color = year)) + geom_point()

Looking ahead to the lesson on factors, we can make this plot look a bit nicer by rearranging the countries according to lifeExpectancy:

table3 |> ggplot(aes(y = fct_reorder(country, lifeExp),
                     x = lifeExp, 
                     color = year)) +
  geom_point()

Translating English to dplyr

Make a table showing the number of missing data for each penguin species.

There are missing data for sex and four quantitative variables:

penguins |> group_by(species) |>
  summarize(bill_length_na = sum(is.na(bill_length_mm)),
            bill_depth_na = sum(is.na(bill_depth_mm)),
            flipper_length_na = sum(is.na(flipper_length_mm)),
            body_mass_na = sum(is.na(body_mass_g))
            )
# A tibble: 3 × 5
  species   bill_length_na bill_depth_na flipper_length_na body_mass_na
  <fct>              <int>         <int>             <int>        <int>
1 Adelie                 1             1                 1            1
2 Chinstrap              0             0                 0            0
3 Gentoo                 1             1                 1            1

The naniar package has a nice function to show this too:

library(naniar)
gg_miss_var(penguins)

Bonus topics

The remaining slides address some questions that sometimes arise when you are working on term projects.

Comparisons

You can compare numbers and “text numbers” (numbers in quotation marks) but you shouldn’t make a habit of it.

library(gapminder)
gapminder |> filter(country == "Canada", year == 2007)
# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.
gapminder |> filter(country == "Canada", year == "2007")
# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.
gapminder |> filter(country == "Canada", lifeExp == "80.653")
# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.
gapminder |> filter(country == "Canada", near(lifeExp, 80.7, tol = 0.1))
# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.
gapminder |> filter(country == "Canada", abs(lifeExp - 80.7) < 0.1)
# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.

Comparions

Some surprises exist out there…

4 == 4
[1] TRUE
4 == "4"
[1] TRUE
TRUE == 1
[1] TRUE
TRUE == "1"
[1] FALSE
3E4
[1] 30000
3E4 == "30000"
[1] TRUE
3E4 == "3E4"
[1] FALSE

More comparisons

Numbers with decimal points (floating point) can cause challenges for comparisons.

2/10 - 1/10 == 0.1
[1] TRUE
3/10 - 1/10 == 0.2
[1] FALSE
3/10 - 1/10
[1] 0.2

Comparions

What if you really want to know if two things are the same?

identical(4, "4")
[1] FALSE
identical(4, 4)
[1] TRUE
near(2/10 - 1/10, 0.1)
[1] TRUE
near(3/10 - 1/10, 0.2)
[1] TRUE
near(pi, 22/7)
[1] FALSE
near(pi, 355/113, tol = 1e-6)
[1] TRUE
pi - 355/113
[1] -2.667642e-07

Combining two tables

t1 <- gapminder |> group_by(country) |> summarize(max_life_exp = max(lifeExp)) 
t1
# A tibble: 142 × 2
   country     max_life_exp
   <fct>              <dbl>
 1 Afghanistan         43.8
 2 Albania             76.4
 3 Algeria             72.3
 4 Angola              42.7
 5 Argentina           75.3
 6 Australia           81.2
 7 Austria             79.8
 8 Bahrain             75.6
 9 Bangladesh          64.1
10 Belgium             79.4
# ℹ 132 more rows
country_codes |> head()
# A tibble: 6 × 3
  country     iso_alpha iso_num
  <chr>       <chr>       <int>
1 Afghanistan AFG             4
2 Albania     ALB             8
3 Algeria     DZA            12
4 Angola      AGO            24
5 Argentina   ARG            32
6 Armenia     ARM            51

Combining two tables

left_join(t1, country_codes, by = "country") |> DT::datatable()

Combining two tables

left_join(t1, rgbif::isocodes, by = c("country" = "name")) |> DT::datatable()

Other kinds of joins

  • full_join: all rows in both tables are kept
  • left_join: all rows from left table
  • right_join: all rows from right table
  • inner_join: only rows in both tables
  • anti_join: filter rows in left that are missing from right table
  • semi_join: filter rows in left that are present in right table

Animated versions of joins

R4DS textbook explanation of joins