Demo slide presentations

Andrew Irwin

2024-03-12

First slide

Try the demo document from File > New File … > Quarto presentation …

Second slide

Title at the top of the page

Add code just like in regular markdown. Hide a code block with echo = FALSE.

Many other options are available.: eval, echo, output, warning, error.

species	n
Adelie	152
Chinstrap	68
Gentoo	124

Linear model formula

Make a linear model that predicts flipper length from bill length.

Linear model formula

Show code with echo = TRUE.

m1 <- lm( bill_length_mm ~ flipper_length_mm, data = penguins)
broom::tidy(m1, conf.int = TRUE) |> knitr::kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-7.265	3.200	-2.270	0.024	-13.559	-0.970
flipper_length_mm	0.255	0.016	16.034	0.000	0.224	0.286

Linear model formula

m2 <- lm( flipper_length_mm ~ bill_length_mm, data = penguins)
broom::tidy(m2, conf.int = TRUE) |> knitr::kable(digits = 3)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	126.684	4.665	27.156	0	117.508	135.860
bill_length_mm	1.690	0.105	16.034	0	1.483	1.897

Change background colour

And the text changes color too

Make large text

What is data visualization?

Two columns

Stuff on the left

Stuff on the right

Another way to do 2 columns

List One

Item A
Item B
Item C

List Two

Item X
Item Y
Item Z

Exercises

The next few slides are some dplyr and visualization exercises.

Translating English to dplyr

Make a table that counts the number of observations per country.

library(gapminder)
gapminder |> group_by(country) |> summarize(n_observations = n())

# A tibble: 142 × 2
   country     n_observations
   <fct>                <int>
 1 Afghanistan             12
 2 Albania                 12
 3 Algeria                 12
 4 Angola                  12
 5 Argentina               12
 6 Australia               12
 7 Austria                 12
 8 Bahrain                 12
 9 Bangladesh              12
10 Belgium                 12
# ℹ 132 more rows

gapminder |> count(country)

# A tibble: 142 × 2
   country         n
   <fct>       <int>
 1 Afghanistan    12
 2 Albania        12
 3 Algeria        12
 4 Angola         12
 5 Argentina      12
 6 Australia      12
 7 Austria        12
 8 Bahrain        12
 9 Bangladesh     12
10 Belgium        12
# ℹ 132 more rows

gapminder |> count(is.na(country))

# A tibble: 1 × 2
  `is.na(country)`     n
  <lgl>            <int>
1 FALSE             1704

gapminder |> group_by(country) |> 
  summarize(n_missing_lifeExp = sum(is.na(lifeExp)) )

# A tibble: 142 × 2
   country     n_missing_lifeExp
   <fct>                   <int>
 1 Afghanistan                 0
 2 Albania                     0
 3 Algeria                     0
 4 Angola                      0
 5 Argentina                   0
 6 Australia                   0
 7 Austria                     0
 8 Bahrain                     0
 9 Bangladesh                  0
10 Belgium                     0
# ℹ 132 more rows

Translating English to dplyr

Find the country with the highest and lowest life expectancy in each year.

table3 <- gapminder |> group_by(year) |>
  mutate(max_life_exp = max(lifeExp),
         min_life_exp = min(lifeExp)) |>
  filter(near(lifeExp, max_life_exp) | near(lifeExp, min_life_exp)) 
table3 |> kable() |> kable_styling()

country	continent	year	lifeExp	pop	gdpPercap	max_life_exp	min_life_exp
Afghanistan	Asia	1952	28.801	8425333	779.4453	72.670	28.801
Afghanistan	Asia	1957	30.332	9240934	820.8530	73.470	30.332
Afghanistan	Asia	1962	31.997	10267083	853.1007	73.680	31.997
Afghanistan	Asia	1967	34.020	11537966	836.1971	74.160	34.020
Angola	Africa	1987	39.906	7874230	2430.2083	78.670	39.906
Cambodia	Asia	1977	31.220	6978607	524.9722	76.110	31.220
Iceland	Europe	1957	73.470	165110	9244.0014	73.470	30.332
Iceland	Europe	1962	73.680	182053	10350.1591	73.680	31.997
Iceland	Europe	1977	76.110	221823	19654.9625	76.110	31.220
Japan	Asia	1982	77.110	118454974	19384.1057	77.110	38.445
Japan	Asia	1987	78.670	122091325	22375.9419	78.670	39.906
Japan	Asia	1992	79.360	124329269	26824.8951	79.360	23.599
Japan	Asia	1997	80.690	125956499	28816.5850	80.690	36.087
Japan	Asia	2002	82.000	127065841	28604.5919	82.000	39.193
Japan	Asia	2007	82.603	127467972	31656.0681	82.603	39.613
Norway	Europe	1952	72.670	3327728	10095.4217	72.670	28.801
Rwanda	Africa	1992	23.599	7290203	737.0686	79.360	23.599
Rwanda	Africa	1997	36.087	7212583	589.9445	80.690	36.087
Sierra Leone	Africa	1972	35.400	2879013	1353.7598	74.720	35.400
Sierra Leone	Africa	1982	38.445	3464522	1465.0108	77.110	38.445
Swaziland	Africa	2007	39.613	1133066	4513.4806	82.603	39.613
Sweden	Europe	1967	74.160	7867931	15258.2970	74.160	34.020
Sweden	Europe	1972	74.720	8122293	17832.0246	74.720	35.400
Zambia	Africa	2002	39.193	10595811	1071.6139	82.000	39.193

Translating English to dplyr

Graph these data.

table3 |> ggplot(aes(y = country, x = lifeExp, color = year)) + geom_point()

Looking ahead to the lesson on factors, we can make this plot look a bit nicer by rearranging the countries according to lifeExpectancy:

table3 |> ggplot(aes(y = fct_reorder(country, lifeExp),
                     x = lifeExp, 
                     color = year)) +
  geom_point()

Translating English to dplyr

Make a table showing the number of missing data for each penguin species.

There are missing data for sex and four quantitative variables:

penguins |> group_by(species) |>
  summarize(bill_length_na = sum(is.na(bill_length_mm)),
            bill_depth_na = sum(is.na(bill_depth_mm)),
            flipper_length_na = sum(is.na(flipper_length_mm)),
            body_mass_na = sum(is.na(body_mass_g))
            )

# A tibble: 3 × 5
  species   bill_length_na bill_depth_na flipper_length_na body_mass_na
  <fct>              <int>         <int>             <int>        <int>
1 Adelie                 1             1                 1            1
2 Chinstrap              0             0                 0            0
3 Gentoo                 1             1                 1            1

The naniar package has a nice function to show this too:

library(naniar)
gg_miss_var(penguins)

Bonus topics

The remaining slides address some questions that sometimes arise when you are working on term projects.

Comparisons

You can compare numbers and “text numbers” (numbers in quotation marks) but you shouldn’t make a habit of it.

library(gapminder)
gapminder |> filter(country == "Canada", year == 2007)

# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.

gapminder |> filter(country == "Canada", year == "2007")

# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.

gapminder |> filter(country == "Canada", lifeExp == "80.653")

# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.

gapminder |> filter(country == "Canada", near(lifeExp, 80.7, tol = 0.1))

# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.

gapminder |> filter(country == "Canada", abs(lifeExp - 80.7) < 0.1)

# A tibble: 1 × 6
  country continent  year lifeExp      pop gdpPercap
  <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
1 Canada  Americas   2007    80.7 33390141    36319.

Comparions

Some surprises exist out there…

4 == 4

[1] TRUE

4 == "4"

[1] TRUE

TRUE == 1

[1] TRUE

TRUE == "1"

[1] FALSE

3E4

[1] 30000

3E4 == "30000"

[1] TRUE

3E4 == "3E4"

[1] FALSE

More comparisons

Numbers with decimal points (floating point) can cause challenges for comparisons.

2/10 - 1/10 == 0.1

[1] TRUE

3/10 - 1/10 == 0.2

[1] FALSE

3/10 - 1/10

[1] 0.2

Comparions

What if you really want to know if two things are the same?

identical(4, "4")

[1] FALSE

identical(4, 4)

[1] TRUE

near(2/10 - 1/10, 0.1)

[1] TRUE

near(3/10 - 1/10, 0.2)

[1] TRUE

near(pi, 22/7)

[1] FALSE

near(pi, 355/113, tol = 1e-6)

[1] TRUE

pi - 355/113

[1] -2.667642e-07

Combining two tables

t1 <- gapminder |> group_by(country) |> summarize(max_life_exp = max(lifeExp)) 
t1

# A tibble: 142 × 2
   country     max_life_exp
   <fct>              <dbl>
 1 Afghanistan         43.8
 2 Albania             76.4
 3 Algeria             72.3
 4 Angola              42.7
 5 Argentina           75.3
 6 Australia           81.2
 7 Austria             79.8
 8 Bahrain             75.6
 9 Bangladesh          64.1
10 Belgium             79.4
# ℹ 132 more rows

country_codes |> head()

# A tibble: 6 × 3
  country     iso_alpha iso_num
  <chr>       <chr>       <int>
1 Afghanistan AFG             4
2 Albania     ALB             8
3 Algeria     DZA            12
4 Angola      AGO            24
5 Argentina   ARG            32
6 Armenia     ARM            51

Combining two tables

left_join(t1, country_codes, by = "country") |> DT::datatable()

Combining two tables

left_join(t1, rgbif::isocodes, by = c("country" = "name")) |> DT::datatable()

Other kinds of joins

full_join: all rows in both tables are kept
left_join: all rows from left table
right_join: all rows from right table
inner_join: only rows in both tables
anti_join: filter rows in left that are missing from right table
semi_join: filter rows in left that are present in right table

Animated versions of joins

R4DS textbook explanation of joins