species | n |
---|---|
Adelie | 152 |
Chinstrap | 68 |
Gentoo | 124 |
2024-03-12
Title at the top of the page
Add code just like in regular markdown. Hide a code block with echo = FALSE
.
Many other options are available.: eval, echo, output, warning, error.
species | n |
---|---|
Adelie | 152 |
Chinstrap | 68 |
Gentoo | 124 |
Make a linear model that predicts flipper length from bill length.
Show code with echo = TRUE
.
And the text changes color too
What is data visualization?
Stuff on the left
Stuff on the right
List One
List Two
The next few slides are some dplyr and visualization exercises.
Make a table that counts the number of observations per country.
# A tibble: 142 × 2
country n_observations
<fct> <int>
1 Afghanistan 12
2 Albania 12
3 Algeria 12
4 Angola 12
5 Argentina 12
6 Australia 12
7 Austria 12
8 Bahrain 12
9 Bangladesh 12
10 Belgium 12
# ℹ 132 more rows
# A tibble: 142 × 2
country n
<fct> <int>
1 Afghanistan 12
2 Albania 12
3 Algeria 12
4 Angola 12
5 Argentina 12
6 Australia 12
7 Austria 12
8 Bahrain 12
9 Bangladesh 12
10 Belgium 12
# ℹ 132 more rows
# A tibble: 1 × 2
`is.na(country)` n
<lgl> <int>
1 FALSE 1704
# A tibble: 142 × 2
country n_missing_lifeExp
<fct> <int>
1 Afghanistan 0
2 Albania 0
3 Algeria 0
4 Angola 0
5 Argentina 0
6 Australia 0
7 Austria 0
8 Bahrain 0
9 Bangladesh 0
10 Belgium 0
# ℹ 132 more rows
Find the country with the highest and lowest life expectancy in each year.
table3 <- gapminder |> group_by(year) |>
mutate(max_life_exp = max(lifeExp),
min_life_exp = min(lifeExp)) |>
filter(near(lifeExp, max_life_exp) | near(lifeExp, min_life_exp))
table3 |> kable() |> kable_styling()
country | continent | year | lifeExp | pop | gdpPercap | max_life_exp | min_life_exp |
---|---|---|---|---|---|---|---|
Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 | 72.670 | 28.801 |
Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 | 73.470 | 30.332 |
Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 | 73.680 | 31.997 |
Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 | 74.160 | 34.020 |
Angola | Africa | 1987 | 39.906 | 7874230 | 2430.2083 | 78.670 | 39.906 |
Cambodia | Asia | 1977 | 31.220 | 6978607 | 524.9722 | 76.110 | 31.220 |
Iceland | Europe | 1957 | 73.470 | 165110 | 9244.0014 | 73.470 | 30.332 |
Iceland | Europe | 1962 | 73.680 | 182053 | 10350.1591 | 73.680 | 31.997 |
Iceland | Europe | 1977 | 76.110 | 221823 | 19654.9625 | 76.110 | 31.220 |
Japan | Asia | 1982 | 77.110 | 118454974 | 19384.1057 | 77.110 | 38.445 |
Japan | Asia | 1987 | 78.670 | 122091325 | 22375.9419 | 78.670 | 39.906 |
Japan | Asia | 1992 | 79.360 | 124329269 | 26824.8951 | 79.360 | 23.599 |
Japan | Asia | 1997 | 80.690 | 125956499 | 28816.5850 | 80.690 | 36.087 |
Japan | Asia | 2002 | 82.000 | 127065841 | 28604.5919 | 82.000 | 39.193 |
Japan | Asia | 2007 | 82.603 | 127467972 | 31656.0681 | 82.603 | 39.613 |
Norway | Europe | 1952 | 72.670 | 3327728 | 10095.4217 | 72.670 | 28.801 |
Rwanda | Africa | 1992 | 23.599 | 7290203 | 737.0686 | 79.360 | 23.599 |
Rwanda | Africa | 1997 | 36.087 | 7212583 | 589.9445 | 80.690 | 36.087 |
Sierra Leone | Africa | 1972 | 35.400 | 2879013 | 1353.7598 | 74.720 | 35.400 |
Sierra Leone | Africa | 1982 | 38.445 | 3464522 | 1465.0108 | 77.110 | 38.445 |
Swaziland | Africa | 2007 | 39.613 | 1133066 | 4513.4806 | 82.603 | 39.613 |
Sweden | Europe | 1967 | 74.160 | 7867931 | 15258.2970 | 74.160 | 34.020 |
Sweden | Europe | 1972 | 74.720 | 8122293 | 17832.0246 | 74.720 | 35.400 |
Zambia | Africa | 2002 | 39.193 | 10595811 | 1071.6139 | 82.000 | 39.193 |
Graph these data.
Looking ahead to the lesson on factors, we can make this plot look a bit nicer by rearranging the countries according to lifeExpectancy:
Make a table showing the number of missing data for each penguin species.
There are missing data for sex and four quantitative variables:
penguins |> group_by(species) |>
summarize(bill_length_na = sum(is.na(bill_length_mm)),
bill_depth_na = sum(is.na(bill_depth_mm)),
flipper_length_na = sum(is.na(flipper_length_mm)),
body_mass_na = sum(is.na(body_mass_g))
)
# A tibble: 3 × 5
species bill_length_na bill_depth_na flipper_length_na body_mass_na
<fct> <int> <int> <int> <int>
1 Adelie 1 1 1 1
2 Chinstrap 0 0 0 0
3 Gentoo 1 1 1 1
The naniar
package has a nice function to show this too:
The remaining slides address some questions that sometimes arise when you are working on term projects.
You can compare numbers and “text numbers” (numbers in quotation marks) but you shouldn’t make a habit of it.
# A tibble: 1 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Canada Americas 2007 80.7 33390141 36319.
# A tibble: 1 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Canada Americas 2007 80.7 33390141 36319.
# A tibble: 1 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Canada Americas 2007 80.7 33390141 36319.
# A tibble: 1 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Canada Americas 2007 80.7 33390141 36319.
# A tibble: 1 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Canada Americas 2007 80.7 33390141 36319.
Some surprises exist out there…
Numbers with decimal points (floating point) can cause challenges for comparisons.
What if you really want to know if two things are the same?
# A tibble: 142 × 2
country max_life_exp
<fct> <dbl>
1 Afghanistan 43.8
2 Albania 76.4
3 Algeria 72.3
4 Angola 42.7
5 Argentina 75.3
6 Australia 81.2
7 Austria 79.8
8 Bahrain 75.6
9 Bangladesh 64.1
10 Belgium 79.4
# ℹ 132 more rows
# A tibble: 6 × 3
country iso_alpha iso_num
<chr> <chr> <int>
1 Afghanistan AFG 4
2 Albania ALB 8
3 Algeria DZA 12
4 Angola AGO 24
5 Argentina ARG 32
6 Armenia ARM 51
Animated versions of joins
R4DS textbook explanation of joins