17  GAM and LOESS smoothing

In this lesson I will show you how to create GAM and LOESS models and perform some basic tasks to interact with the R model objects that the functions create. In keeping with the goals of the course, we will primarily focus on using the models for visualization and not attempt a detailed statistical analysis of when and why you might use a particular model for inference. This means we will restrict our attention to some basic uses of these models using one predictor (x) and one response (y) variable.

17.1 Generalized Additive Models

Generalized additive models are a kind of linear regression, but instead of finding coefficients of predictor variables (e.g., intercepts, slopes), the model finds a “smooth” response function for each predictor. Typically this means that a piecewise cubic function (spline) is used to approximate the relationship between two variables. We can compute predicted values, confindence and prediction intervals, and show the smooth response function that arose from the model. We don’t provide a simple list of coefficients like we did with linear regression, because the spline curve is defined by many numbers which are not usually informative on their own.

Generalized additive models are most commonly used when there is no theoretical motivation for a functional relationship between the variables being studied. We’ll look at the Mauna Loa atmospheric CO2 concentration. These data increase year over year and have well-established interannual oscillations, but there is no clear function for either of these patterns.

We will start by using the last decade of data from Lesson 1.

Here is a plot of atmospheric CO2 concentration over time.

my_theme = theme_linedraw() + theme(text = element_text(size = 14))
p1 <- co2 |> ggplot(aes(decimal_date, co2_avg)) + geom_point() + my_theme +
  labs(x = "Year (decimal)", y = "Atmospheric CO2 (ppm)")
p1

The next plot shows each observation, minus the annual mean, as a function of the time of year but not the actual year. Each year’s observations are overlapped.

p2 <- co2 |> ggplot(aes(year_fraction, co2_anomaly)) + geom_point() + my_theme +
  labs(x = "Year fraction (decimal)", y = "Atmospheric CO2 anomaly (ppm)")
p2

Here are GAM fits to both of these pairs of data using the gam function from the mgcv package. There is a plot function in the mgcv package, but I’m using a ggplot function called draw in the gratia package.

g1 <- gam(co2_anomaly ~ s(year_fraction, bs="cs"), data = co2)
draw(g1, residuals=TRUE, rug=FALSE)

g2 <- gam(co2_avg ~ s(decimal_date, bs="cs"),  data = co2)
draw(g2, residuals=TRUE, rug=FALSE) 

Depending on your goal, you may find that the second plot is too smooth – the interannual oscillations are smoothed out completely. We can increase the number of “knots” (by setting k=25) in the spline to capture the oscillation, but this is not the way the smoothing is normally used:

g3 <- gam(co2_avg ~ s(decimal_date, bs="cs", k = 25),  data = co2)
draw(g3, residuals=TRUE, rug=FALSE) 

There is a summary function for GAM fits and the broom functions glance and tidy give access to these data, but the interpretation of this output is beyond the scope of the course:


Family: gaussian 
Link function: identity 

Formula:
co2_anomaly ~ s(year_fraction, bs = "cs")

Parametric coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.353e-15  2.673e-02       0        1

Approximate significance of smooth terms:
                   edf Ref.df     F p-value    
s(year_fraction) 8.296      9 530.7  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.976   Deviance explained = 97.8%
GCV = 0.092244  Scale est. = 0.085038  n = 119
glance(g1) |> kable()
df logLik AIC BIC deviance df.residual nobs
9.296045 -17.36692 55.32592 83.9399 9.328996 109.704 119
tidy(g1) |> kable()
term edf ref.df statistic p.value
s(year_fraction) 8.296045 9 530.7004 0

It is often useful to compute residuals and extract predicted values.

co2 |> add_residuals(g1) |> add_fitted(g1) |> kable() |>  scroll_box(height = "200px")
year month decimal_date co2_avg year_fraction co2_anomaly .residual .value
2011 1 2011.042 391.33 0.0417 -0.3216667 0.4287558 -0.7504224
2011 2 2011.125 391.86 0.1250 0.2083333 0.3028663 -0.0945330
2011 3 2011.208 392.60 0.2083 0.9483333 0.1440665 0.8042668
2011 4 2011.292 393.25 0.2917 1.5983333 -0.6010913 2.1994246
2011 5 2011.375 394.19 0.3750 2.5383333 -0.5071835 3.0455169
2011 6 2011.458 393.74 0.4583 2.0883333 -0.1757645 2.2640978
2011 7 2011.542 392.51 0.5417 0.8583333 0.3875896 0.4707438
2011 8 2011.625 390.13 0.6250 -1.5216667 0.0541232 -1.5757899
2011 9 2011.708 389.08 0.7083 -2.5716667 0.3920423 -2.9637089
2011 10 2011.792 388.99 0.7917 -2.6616667 0.0151210 -2.6767877
2011 11 2011.875 390.28 0.8750 -1.3716667 -0.2731113 -1.0985553
2011 12 2011.958 391.86 0.9583 0.2083333 -0.2091638 0.4174971
2012 1 2012.042 393.12 0.0417 -0.7350000 0.0154224 -0.7504224
2012 2 2012.125 393.86 0.1250 0.0050000 0.0995330 -0.0945330
2012 3 2012.208 394.40 0.2083 0.5450000 -0.2592668 0.8042668
2012 4 2012.292 396.18 0.2917 2.3250000 0.1255754 2.1994246
2012 5 2012.375 396.74 0.3750 2.8850000 -0.1605169 3.0455169
2012 6 2012.458 395.71 0.4583 1.8550000 -0.4090978 2.2640978
2012 7 2012.542 394.36 0.5417 0.5050000 0.0342562 0.4707438
2012 8 2012.625 392.39 0.6250 -1.4650000 0.1107899 -1.5757899
2012 9 2012.708 391.13 0.7083 -2.7250000 0.2387089 -2.9637089
2012 10 2012.792 391.05 0.7917 -2.8050000 -0.1282123 -2.6767877
2012 11 2012.875 392.98 0.8750 -0.8750000 0.2235553 -1.0985553
2012 12 2012.958 394.34 0.9583 0.4850000 0.0675029 0.4174971
2013 1 2013.042 395.55 0.0417 -0.9700000 -0.2195776 -0.7504224
2013 2 2013.125 396.80 0.1250 0.2800000 0.3745330 -0.0945330
2013 3 2013.208 397.43 0.2083 0.9100000 0.1057332 0.8042668
2013 4 2013.292 398.41 0.2917 1.8900000 -0.3094246 2.1994246
2013 5 2013.375 399.78 0.3750 3.2600000 0.2144831 3.0455169
2013 6 2013.458 398.60 0.4583 2.0800000 -0.1840978 2.2640978
2013 7 2013.542 397.32 0.5417 0.8000000 0.3292562 0.4707438
2013 8 2013.625 395.20 0.6250 -1.3200000 0.2557899 -1.5757899
2013 9 2013.708 393.45 0.7083 -3.0700000 -0.1062911 -2.9637089
2013 10 2013.792 393.70 0.7917 -2.8200000 -0.1432123 -2.6767877
2013 11 2013.875 395.16 0.8750 -1.3600000 -0.2614447 -1.0985553
2013 12 2013.958 396.84 0.9583 0.3200000 -0.0974971 0.4174971
2014 1 2014.042 397.85 0.0417 -0.7925000 -0.0420776 -0.7504224
2014 2 2014.125 398.01 0.1250 -0.6325000 -0.5379670 -0.0945330
2014 3 2014.208 399.71 0.2083 1.0675000 0.2632332 0.8042668
2014 4 2014.292 401.33 0.2917 2.6875000 0.4880754 2.1994246
2014 5 2014.375 401.78 0.3750 3.1375000 0.0919831 3.0455169
2014 6 2014.458 401.25 0.4583 2.6075000 0.3434022 2.2640978
2014 7 2014.542 399.11 0.5417 0.4675000 -0.0032438 0.4707438
2014 8 2014.625 397.03 0.6250 -1.6125000 -0.0367101 -1.5757899
2014 9 2014.708 395.38 0.7083 -3.2625000 -0.2987911 -2.9637089
2014 10 2014.792 396.07 0.7917 -2.5725000 0.1042877 -2.6767877
2014 11 2014.875 397.28 0.8750 -1.3625000 -0.2639447 -1.0985553
2014 12 2014.958 398.91 0.9583 0.2675000 -0.1499971 0.4174971
2015 1 2015.042 399.98 0.0417 -0.8458333 -0.0954109 -0.7504224
2015 2 2015.125 400.35 0.1250 -0.4758333 -0.3813003 -0.0945330
2015 3 2015.208 401.52 0.2083 0.6941667 -0.1101001 0.8042668
2015 4 2015.292 403.15 0.2917 2.3241667 0.1247421 2.1994246
2015 5 2015.375 403.96 0.3750 3.1341667 0.0886498 3.0455169
2015 6 2015.458 402.80 0.4583 1.9741667 -0.2899311 2.2640978
2015 7 2015.542 401.29 0.5417 0.4641667 -0.0065771 0.4707438
2015 8 2015.625 398.93 0.6250 -1.8958333 -0.3200435 -1.5757899
2015 9 2015.708 397.63 0.7083 -3.1958333 -0.2321244 -2.9637089
2015 10 2015.792 398.29 0.7917 -2.5358333 0.1409543 -2.6767877
2015 11 2015.875 400.16 0.8750 -0.6658333 0.4327220 -1.0985553
2015 12 2015.958 401.85 0.9583 1.0241667 0.6066695 0.4174971
2016 1 2016.042 402.50 0.0417 -1.7225000 -0.9720776 -0.7504224
2016 2 2016.125 404.07 0.1250 -0.1525000 -0.0579670 -0.0945330
2016 3 2016.208 404.87 0.2083 0.6475000 -0.1567668 0.8042668
2016 4 2016.292 407.42 0.2917 3.1975000 0.9980754 2.1994246
2016 5 2016.375 407.72 0.3750 3.4975000 0.4519831 3.0455169
2016 6 2016.458 406.81 0.4583 2.5875000 0.3234022 2.2640978
2016 7 2016.542 404.40 0.5417 0.1775000 -0.2932438 0.4707438
2016 8 2016.625 402.26 0.6250 -1.9625000 -0.3867101 -1.5757899
2016 9 2016.708 401.05 0.7083 -3.1725000 -0.2087911 -2.9637089
2016 10 2016.792 401.60 0.7917 -2.6225000 0.0542877 -2.6767877
2016 11 2016.875 403.53 0.8750 -0.6925000 0.4060553 -1.0985553
2016 12 2016.958 404.44 0.9583 0.2175000 -0.1999971 0.4174971
2017 1 2017.042 406.17 0.0417 -0.3850000 0.3654224 -0.7504224
2017 2 2017.125 406.47 0.1250 -0.0850000 0.0095330 -0.0945330
2017 3 2017.208 407.23 0.2083 0.6750000 -0.1292668 0.8042668
2017 4 2017.292 409.03 0.2917 2.4750000 0.2755754 2.1994246
2017 5 2017.375 409.69 0.3750 3.1350000 0.0894831 3.0455169
2017 6 2017.458 408.89 0.4583 2.3350000 0.0709022 2.2640978
2017 7 2017.542 407.13 0.5417 0.5750000 0.1042562 0.4707438
2017 8 2017.625 405.12 0.6250 -1.4350000 0.1407899 -1.5757899
2017 9 2017.708 403.37 0.7083 -3.1850000 -0.2212911 -2.9637089
2017 10 2017.792 403.63 0.7917 -2.9250000 -0.2482123 -2.6767877
2017 11 2017.875 405.12 0.8750 -1.4350000 -0.3364447 -1.0985553
2017 12 2017.958 406.81 0.9583 0.2550000 -0.1624971 0.4174971
2018 1 2018.042 407.96 0.0417 -0.5600000 0.1904224 -0.7504224
2018 2 2018.125 408.32 0.1250 -0.2000000 -0.1054670 -0.0945330
2018 3 2018.208 409.39 0.2083 0.8700000 0.0657332 0.8042668
2018 4 2018.292 410.25 0.2917 1.7300000 -0.4694246 2.1994246
2018 5 2018.375 411.24 0.3750 2.7200000 -0.3255169 3.0455169
2018 6 2018.458 410.79 0.4583 2.2700000 0.0059022 2.2640978
2018 7 2018.542 408.70 0.5417 0.1800000 -0.2907438 0.4707438
2018 8 2018.625 406.97 0.6250 -1.5500000 0.0257899 -1.5757899
2018 9 2018.708 405.52 0.7083 -3.0000000 -0.0362911 -2.9637089
2018 10 2018.792 406.00 0.7917 -2.5200000 0.1567877 -2.6767877
2018 11 2018.875 408.02 0.8750 -0.5000000 0.5985553 -1.0985553
2018 12 2018.958 409.08 0.9583 0.5600000 0.1425029 0.4174971
2019 1 2019.042 410.83 0.0417 -0.6041667 0.1462558 -0.7504224
2019 2 2019.125 411.75 0.1250 0.3158333 0.4103663 -0.0945330
2019 3 2019.208 411.97 0.2083 0.5358333 -0.2684335 0.8042668
2019 4 2019.292 413.33 0.2917 1.8958333 -0.3035913 2.1994246
2019 5 2019.375 414.64 0.3750 3.2058333 0.1603165 3.0455169
2019 6 2019.458 413.93 0.4583 2.4958333 0.2317355 2.2640978
2019 7 2019.542 411.74 0.5417 0.3058333 -0.1649104 0.4707438
2019 8 2019.625 409.95 0.6250 -1.4841667 0.0916232 -1.5757899
2019 9 2019.708 408.54 0.7083 -2.8941667 0.0695423 -2.9637089
2019 10 2019.792 408.52 0.7917 -2.9141667 -0.2373790 -2.6767877
2019 11 2019.875 410.25 0.8750 -1.1841667 -0.0856113 -1.0985553
2019 12 2019.958 411.76 0.9583 0.3258333 -0.0916638 0.4174971
2020 1 2020.042 413.39 0.0417 -0.6154545 0.1349679 -0.7504224
2020 2 2020.125 414.11 0.1250 0.1045455 0.1990785 -0.0945330
2020 3 2020.208 414.51 0.2083 0.5045455 -0.2997213 0.8042668
2020 4 2020.292 416.21 0.2917 2.2045455 0.0051209 2.1994246
2020 5 2020.375 417.07 0.3750 3.0645455 0.0190286 3.0455169
2020 6 2020.458 416.38 0.4583 2.3745455 0.1104477 2.2640978
2020 7 2020.542 414.38 0.5417 0.3745455 -0.0961983 0.4707438
2020 8 2020.625 412.55 0.6250 -1.4554545 0.1203353 -1.5757899
2020 9 2020.708 411.29 0.7083 -2.7154545 0.2482544 -2.9637089
2020 10 2020.792 411.28 0.7917 -2.7254545 -0.0486669 -2.6767877
2020 11 2020.875 412.89 0.8750 -1.1154545 -0.0168992 -1.0985553

If we generate new data (dates), we can plot predictions too. Since the model is a piecewise cubic function, extrapolations are often dramatically unreliable. The downward facing cubic in the last “bump” is simply continued, with comically bad results. Extrapolation of models, and especially smooths, is somewhere between risky and foolish!

new_data <- tibble(decimal_date = seq(2017, 2022, by = 0.05))
new_data |> add_fitted(g3) |>
  ggplot(aes(decimal_date, .value)) + 
  geom_line(color = "blue", size = 2) +
  geom_point(aes(y= co2_avg), data = co2) + my_theme + xlim(2017, 2022)

If you want confidence intervals on the fitted values, use the confint function together with the name of the smooth you are extracting. Be aware that this function does not include the intercept (or grand mean) from the model, so the values are all centred on zero.

confint(g1, "s(year_fraction)", level = 0.95) |> kable() |>  scroll_box(height = "200px")
smooth type by year_fraction est se crit lower upper
s(year_fraction) CRS (shrink) NA 0.0417000 -0.7504224 0.0911641 1.959964 -0.9291007 -0.5717441
s(year_fraction) CRS (shrink) NA 0.0509586 -0.6817731 0.0809576 1.959964 -0.8404472 -0.5230990
s(year_fraction) CRS (shrink) NA 0.0602172 -0.6128047 0.0733925 1.959964 -0.7566513 -0.4689581
s(year_fraction) CRS (shrink) NA 0.0694758 -0.5431982 0.0690818 1.959964 -0.6785960 -0.4078004
s(year_fraction) CRS (shrink) NA 0.0787343 -0.4726346 0.0681204 1.959964 -0.6061481 -0.3391211
s(year_fraction) CRS (shrink) NA 0.0879929 -0.4007948 0.0699211 1.959964 -0.5378376 -0.2637519
s(year_fraction) CRS (shrink) NA 0.0972515 -0.3273597 0.0734371 1.959964 -0.4712938 -0.1834257
s(year_fraction) CRS (shrink) NA 0.1065101 -0.2520105 0.0775478 1.959964 -0.4040013 -0.1000197
s(year_fraction) CRS (shrink) NA 0.1157687 -0.1744279 0.0812996 1.959964 -0.3337721 -0.0150836
s(year_fraction) CRS (shrink) NA 0.1250273 -0.0942929 0.0839688 1.959964 -0.2588687 0.0702829
s(year_fraction) CRS (shrink) NA 0.1342859 -0.0112865 0.0850507 1.959964 -0.1779829 0.1554099
s(year_fraction) CRS (shrink) NA 0.1435444 0.0749103 0.0842570 1.959964 -0.0902303 0.2400509
s(year_fraction) CRS (shrink) NA 0.1528030 0.1646477 0.0816674 1.959964 0.0045826 0.3247128
s(year_fraction) CRS (shrink) NA 0.1620616 0.2583992 0.0780514 1.959964 0.1054213 0.4113771
s(year_fraction) CRS (shrink) NA 0.1713202 0.3566685 0.0744216 1.959964 0.2108048 0.5025323
s(year_fraction) CRS (shrink) NA 0.1805788 0.4599596 0.0717896 1.959964 0.3192547 0.6006645
s(year_fraction) CRS (shrink) NA 0.1898374 0.5687763 0.0709348 1.959964 0.4297468 0.7078059
s(year_fraction) CRS (shrink) NA 0.1990960 0.6836225 0.0721337 1.959964 0.5422429 0.8250020
s(year_fraction) CRS (shrink) NA 0.2083545 0.8050020 0.0750446 1.959964 0.6579174 0.9520866
s(year_fraction) CRS (shrink) NA 0.2176131 0.9334187 0.0788581 1.959964 0.7788597 1.0879777
s(year_fraction) CRS (shrink) NA 0.2268717 1.0693765 0.0825778 1.959964 0.9075269 1.2312261
s(year_fraction) CRS (shrink) NA 0.2361303 1.2133792 0.0852429 1.959964 1.0463061 1.3804523
s(year_fraction) CRS (shrink) NA 0.2453889 1.3659307 0.0860566 1.959964 1.1972628 1.5345985
s(year_fraction) CRS (shrink) NA 0.2546475 1.5269277 0.0846482 1.959964 1.3610204 1.6928350
s(year_fraction) CRS (shrink) NA 0.2639061 1.6938561 0.0816106 1.959964 1.5339022 1.8538101
s(year_fraction) CRS (shrink) NA 0.2731646 1.8636034 0.0779236 1.959964 1.7108759 2.0163308
s(year_fraction) CRS (shrink) NA 0.2824232 2.0330568 0.0746644 1.959964 1.8867173 2.1793963
s(year_fraction) CRS (shrink) NA 0.2916818 2.1991039 0.0728081 1.959964 2.0564026 2.3418053
s(year_fraction) CRS (shrink) NA 0.3009404 2.3586321 0.0729366 1.959964 2.2156789 2.5015852
s(year_fraction) CRS (shrink) NA 0.3101990 2.5085287 0.0750029 1.959964 2.3615256 2.6555317
s(year_fraction) CRS (shrink) NA 0.3194576 2.6456811 0.0783537 1.959964 2.4921106 2.7992516
s(year_fraction) CRS (shrink) NA 0.3287162 2.7669769 0.0819889 1.959964 2.6062816 2.9276722
s(year_fraction) CRS (shrink) NA 0.3379747 2.8693034 0.0848476 1.959964 2.7030052 3.0356015
s(year_fraction) CRS (shrink) NA 0.3472333 2.9495480 0.0859986 1.959964 2.7809937 3.1181022
s(year_fraction) CRS (shrink) NA 0.3564919 3.0052962 0.0849386 1.959964 2.8388196 3.1717728
s(year_fraction) CRS (shrink) NA 0.3657505 3.0369263 0.0821713 1.959964 2.8758735 3.1979791
s(year_fraction) CRS (shrink) NA 0.3750091 3.0455143 0.0786208 1.959964 2.8914204 3.1996083
s(year_fraction) CRS (shrink) NA 0.3842677 3.0321367 0.0753298 1.959964 2.8844929 3.1797804
s(year_fraction) CRS (shrink) NA 0.3935263 2.9978695 0.0732707 1.959964 2.8542615 3.1414775
s(year_fraction) CRS (shrink) NA 0.4027848 2.9437891 0.0730697 1.959964 2.8005751 3.0870030
s(year_fraction) CRS (shrink) NA 0.4120434 2.8709716 0.0747628 1.959964 2.7244393 3.0175039
s(year_fraction) CRS (shrink) NA 0.4213020 2.7804934 0.0777794 1.959964 2.6280486 2.9329383
s(year_fraction) CRS (shrink) NA 0.4305606 2.6734307 0.0811703 1.959964 2.5143399 2.8325215
s(year_fraction) CRS (shrink) NA 0.4398192 2.5508597 0.0838920 1.959964 2.3864344 2.7152849
s(year_fraction) CRS (shrink) NA 0.4490778 2.4138566 0.0850124 1.959964 2.2472353 2.5804779
s(year_fraction) CRS (shrink) NA 0.4583364 2.2634823 0.0840211 1.959964 2.0988041 2.4281606
s(year_fraction) CRS (shrink) NA 0.4675949 2.1007371 0.0814055 1.959964 1.9411853 2.2602890
s(year_fraction) CRS (shrink) NA 0.4768535 1.9266061 0.0780703 1.959964 1.7735912 2.0796210
s(year_fraction) CRS (shrink) NA 0.4861121 1.7420745 0.0750318 1.959964 1.5950148 1.8891341
s(year_fraction) CRS (shrink) NA 0.4953707 1.5481273 0.0732232 1.959964 1.4046125 1.6916421
s(year_fraction) CRS (shrink) NA 0.5046293 1.3457498 0.0732233 1.959964 1.2022348 1.4892649
s(year_fraction) CRS (shrink) NA 0.5138879 1.1359272 0.0750321 1.959964 0.9888669 1.2829875
s(year_fraction) CRS (shrink) NA 0.5231465 0.9196446 0.0780707 1.959964 0.7666287 1.0726604
s(year_fraction) CRS (shrink) NA 0.5324051 0.6978871 0.0814060 1.959964 0.5383343 0.8574399
s(year_fraction) CRS (shrink) NA 0.5416636 0.4716399 0.0840214 1.959964 0.3069609 0.6363189
s(year_fraction) CRS (shrink) NA 0.5509222 0.2418882 0.0850126 1.959964 0.0752666 0.4085099
s(year_fraction) CRS (shrink) NA 0.5601808 0.0096723 0.0838920 1.959964 -0.1547530 0.1740976
s(year_fraction) CRS (shrink) NA 0.5694394 -0.2237447 0.0811702 1.959964 -0.3828354 -0.0646541
s(year_fraction) CRS (shrink) NA 0.5786980 -0.4570436 0.0777794 1.959964 -0.6094884 -0.3045988
s(year_fraction) CRS (shrink) NA 0.5879566 -0.6889047 0.0747631 1.959964 -0.8354377 -0.5423718
s(year_fraction) CRS (shrink) NA 0.5972152 -0.9180088 0.0730706 1.959964 -1.0612245 -0.7747931
s(year_fraction) CRS (shrink) NA 0.6064737 -1.1430364 0.0732722 1.959964 -1.2866474 -0.9994254
s(year_fraction) CRS (shrink) NA 0.6157323 -1.3626681 0.0753319 1.959964 -1.5103159 -1.2150204
s(year_fraction) CRS (shrink) NA 0.6249909 -1.5755845 0.0786231 1.959964 -1.7296830 -1.4214861
s(year_fraction) CRS (shrink) NA 0.6342495 -1.7804663 0.0821735 1.959964 -1.9415234 -1.6194091
s(year_fraction) CRS (shrink) NA 0.6435081 -1.9759939 0.0849404 1.959964 -2.1424740 -1.8095138
s(year_fraction) CRS (shrink) NA 0.6527667 -2.1608480 0.0859997 1.959964 -2.3294043 -1.9922916
s(year_fraction) CRS (shrink) NA 0.6620253 -2.3336718 0.0848479 1.959964 -2.4999706 -2.1673729
s(year_fraction) CRS (shrink) NA 0.6712838 -2.4929588 0.0819889 1.959964 -2.6536541 -2.3322636
s(year_fraction) CRS (shrink) NA 0.6805424 -2.6371655 0.0783541 1.959964 -2.7907368 -2.4835942
s(year_fraction) CRS (shrink) NA 0.6898010 -2.7647478 0.0750048 1.959964 -2.9117546 -2.6177411
s(year_fraction) CRS (shrink) NA 0.6990596 -2.8741622 0.0729410 1.959964 -3.0171239 -2.7312005
s(year_fraction) CRS (shrink) NA 0.7083182 -2.9638648 0.0728152 1.959964 -3.1065800 -2.8211495
s(year_fraction) CRS (shrink) NA 0.7175768 -3.0323118 0.0746736 1.959964 -3.1786694 -2.8859541
s(year_fraction) CRS (shrink) NA 0.7268354 -3.0779594 0.0779336 1.959964 -3.2307065 -2.9252124
s(year_fraction) CRS (shrink) NA 0.7360939 -3.0992640 0.0816197 1.959964 -3.2592356 -2.9392923
s(year_fraction) CRS (shrink) NA 0.7453525 -3.0946816 0.0846547 1.959964 -3.2606017 -2.9287615
s(year_fraction) CRS (shrink) NA 0.7546111 -3.0626686 0.0860598 1.959964 -3.2313426 -2.8939946
s(year_fraction) CRS (shrink) NA 0.7638697 -3.0024297 0.0852433 1.959964 -3.1695036 -2.8353559
s(year_fraction) CRS (shrink) NA 0.7731283 -2.9161857 0.0825785 1.959964 -3.0780367 -2.7543348
s(year_fraction) CRS (shrink) NA 0.7823869 -2.8069168 0.0788644 1.959964 -2.9614882 -2.6523454
s(year_fraction) CRS (shrink) NA 0.7916455 -2.6776030 0.0750628 1.959964 -2.8247235 -2.5304825
s(year_fraction) CRS (shrink) NA 0.8009040 -2.5312246 0.0721691 1.959964 -2.6726735 -2.3897757
s(year_fraction) CRS (shrink) NA 0.8101626 -2.3707617 0.0709880 1.959964 -2.5098957 -2.2316278
s(year_fraction) CRS (shrink) NA 0.8194212 -2.1991946 0.0718550 1.959964 -2.3400278 -2.0583613
s(year_fraction) CRS (shrink) NA 0.8286798 -2.0195033 0.0744885 1.959964 -2.1654980 -1.8735085
s(year_fraction) CRS (shrink) NA 0.8379384 -1.8346680 0.0781073 1.959964 -1.9877556 -1.6815804
s(year_fraction) CRS (shrink) NA 0.8471970 -1.6476689 0.0817030 1.959964 -1.8078037 -1.4875341
s(year_fraction) CRS (shrink) NA 0.8564556 -1.4614862 0.0842697 1.959964 -1.6266517 -1.2963207
s(year_fraction) CRS (shrink) NA 0.8657141 -1.2785516 0.0850508 1.959964 -1.4452480 -1.1118552
s(year_fraction) CRS (shrink) NA 0.8749727 -1.0990792 0.0839868 1.959964 -1.2636903 -0.9344681
s(year_fraction) CRS (shrink) NA 0.8842313 -0.9227229 0.0813945 1.959964 -1.0822532 -0.7631926
s(year_fraction) CRS (shrink) NA 0.8934899 -0.7491364 0.0778159 1.959964 -0.9016528 -0.5966200
s(year_fraction) CRS (shrink) NA 0.9027485 -0.5779735 0.0740203 1.959964 -0.7230506 -0.4328964
s(year_fraction) CRS (shrink) NA 0.9120071 -0.4088880 0.0710024 1.959964 -0.5480501 -0.2697259
s(year_fraction) CRS (shrink) NA 0.9212657 -0.2415336 0.0698892 1.959964 -0.3785140 -0.1045532
s(year_fraction) CRS (shrink) NA 0.9305242 -0.0755641 0.0716662 1.959964 -0.2160273 0.0648991
s(year_fraction) CRS (shrink) NA 0.9397828 0.0893667 0.0768070 1.959964 -0.0611722 0.2399055
s(year_fraction) CRS (shrink) NA 0.9490414 0.2536050 0.0851237 1.959964 0.0867656 0.4204444
s(year_fraction) CRS (shrink) NA 0.9583000 0.4174971 0.0959746 1.959964 0.2293904 0.6056038
confint(g1, "s(year_fraction)", level = 0.95) |> 
  ggplot(aes(year_fraction)) + 
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.25) + 
  geom_line(aes(y = est)) + my_theme

You can also use lm to fit splines; these are similar to GAMs but there are some important differences. The mgcv package has a lot of features not available with lm.

s1 <- lm(co2_avg ~ splines::bs(decimal_date, df = 5), data = co2)
tidy(s1) |> kable(digits = 2)
term estimate std.error statistic p.value
(Intercept) 391.73 1.13 347.53 0.00
splines::bs(decimal_date, df = 5)1 0.64 2.16 0.30 0.77
splines::bs(decimal_date, df = 5)2 6.87 1.47 4.69 0.00
splines::bs(decimal_date, df = 5)3 14.49 1.98 7.31 0.00
splines::bs(decimal_date, df = 5)4 21.01 1.62 13.01 0.00
splines::bs(decimal_date, df = 5)5 22.14 1.64 13.50 0.00
glance(s1) |> kable(digits = 2)
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.91 0.9 2.31 219.14 0 5 -265.4 544.8 564.26 602.91 113 119

We can use augment to generate data to plot and combine it with the original data.

a1 <- augment(s1, data = co2, se_fit = TRUE, interval="prediction") 
a1 |>  ggplot(aes(decimal_date)) + 
  geom_ribbon(aes(ymin = .lower, ymax = .upper), alpha = 0.2) + 
  geom_line(aes(y= .fitted)) +
  geom_point(aes(y = co2_avg)) + 
  my_theme

For a spline that closely traces the data, increase df to 26 or more. This increases the numbers of knots or separate cubics used to approximate the data.

17.2 Locally Estimated Scatterplot Smoothing (LOESS)

LOESS smooths are constructed by making a large number of quadratic (or possibly linear) regression lines as a window moves along the x-axis. The degree of the fits and the width of the window (and other details) can be adjusted. The predictions from these many local regressions are then computed and plotted as a “smooth” of the data. We usually just imagine the line is drawn through the data in a way that allows it to track fluctuations without specifying a model.

l1 <- loess(co2_anomaly ~ year_fraction, data = co2)

As with the GAMs, the summary function is not particularly easy to interpret and we will not explore the details, but you can see that this is a quadratic model with a “span” of 0.75, meaning that 75% of the data are used for each regression. (The weighting of each point in the regression varies as a function of x, resulting a continuous change in the predictions.)

Call:
loess(formula = co2_anomaly ~ year_fraction, data = co2)

Number of Observations: 119 
Equivalent Number of Parameters: 4.58 
Residual Standard Error: 0.4647 
Trace of smoother matrix: 5  (exact)

Control settings:
  span     :  0.75 
  degree   :  2 
  family   :  gaussian
  surface  :  interpolate     cell = 0.2
  normalize:  TRUE
 parametric:  FALSE
drop.square:  FALSE 

We can use predict, residuals and augment on these model objects as we did for lm fits in the previous lesson.

augment(l1, se_fit = TRUE) |> kable() |>  scroll_box(height = "200px")
co2_anomaly year_fraction .fitted .se.fit .resid
-0.3216667 0.0417 -1.1874816 0.1278142 0.8658149
0.2083333 0.1250 0.3297748 0.0783764 -0.1214414
0.9483333 0.2083 1.4858927 0.0756092 -0.5375594
1.5983333 0.2917 2.2386031 0.0811237 -0.6402698
2.5383333 0.3750 2.7453971 0.0876098 -0.2070637
2.0883333 0.4583 2.1417191 0.0876044 -0.0533857
0.8583333 0.5417 0.4302506 0.0876044 0.4280828
-1.5216667 0.6250 -1.5283681 0.0876098 0.0067014
-2.5716667 0.7083 -2.4561226 0.0811840 -0.1155441
-2.6616667 0.7917 -2.3873488 0.0757368 -0.2743178
-1.3716667 0.8750 -1.4172800 0.0803002 0.0456133
0.2083333 0.9583 0.4905675 0.1329762 -0.2822341
-0.7350000 0.0417 -1.1874816 0.1278142 0.4524816
0.0050000 0.1250 0.3297748 0.0783764 -0.3247748
0.5450000 0.2083 1.4858927 0.0756092 -0.9408927
2.3250000 0.2917 2.2386031 0.0811237 0.0863969
2.8850000 0.3750 2.7453971 0.0876098 0.1396029
1.8550000 0.4583 2.1417191 0.0876044 -0.2867191
0.5050000 0.5417 0.4302506 0.0876044 0.0747494
-1.4650000 0.6250 -1.5283681 0.0876098 0.0633681
-2.7250000 0.7083 -2.4561226 0.0811840 -0.2688774
-2.8050000 0.7917 -2.3873488 0.0757368 -0.4176512
-0.8750000 0.8750 -1.4172800 0.0803002 0.5422800
0.4850000 0.9583 0.4905675 0.1329762 -0.0055675
-0.9700000 0.0417 -1.1874816 0.1278142 0.2174816
0.2800000 0.1250 0.3297748 0.0783764 -0.0497748
0.9100000 0.2083 1.4858927 0.0756092 -0.5758927
1.8900000 0.2917 2.2386031 0.0811237 -0.3486031
3.2600000 0.3750 2.7453971 0.0876098 0.5146029
2.0800000 0.4583 2.1417191 0.0876044 -0.0617191
0.8000000 0.5417 0.4302506 0.0876044 0.3697494
-1.3200000 0.6250 -1.5283681 0.0876098 0.2083681
-3.0700000 0.7083 -2.4561226 0.0811840 -0.6138774
-2.8200000 0.7917 -2.3873488 0.0757368 -0.4326512
-1.3600000 0.8750 -1.4172800 0.0803002 0.0572800
0.3200000 0.9583 0.4905675 0.1329762 -0.1705675
-0.7925000 0.0417 -1.1874816 0.1278142 0.3949816
-0.6325000 0.1250 0.3297748 0.0783764 -0.9622748
1.0675000 0.2083 1.4858927 0.0756092 -0.4183927
2.6875000 0.2917 2.2386031 0.0811237 0.4488969
3.1375000 0.3750 2.7453971 0.0876098 0.3921029
2.6075000 0.4583 2.1417191 0.0876044 0.4657809
0.4675000 0.5417 0.4302506 0.0876044 0.0372494
-1.6125000 0.6250 -1.5283681 0.0876098 -0.0841319
-3.2625000 0.7083 -2.4561226 0.0811840 -0.8063774
-2.5725000 0.7917 -2.3873488 0.0757368 -0.1851512
-1.3625000 0.8750 -1.4172800 0.0803002 0.0547800
0.2675000 0.9583 0.4905675 0.1329762 -0.2230675
-0.8458333 0.0417 -1.1874816 0.1278142 0.3416483
-0.4758333 0.1250 0.3297748 0.0783764 -0.8056081
0.6941667 0.2083 1.4858927 0.0756092 -0.7917261
2.3241667 0.2917 2.2386031 0.0811237 0.0855636
3.1341667 0.3750 2.7453971 0.0876098 0.3887696
1.9741667 0.4583 2.1417191 0.0876044 -0.1675524
0.4641667 0.5417 0.4302506 0.0876044 0.0339161
-1.8958333 0.6250 -1.5283681 0.0876098 -0.3674653
-3.1958333 0.7083 -2.4561226 0.0811840 -0.7397107
-2.5358333 0.7917 -2.3873488 0.0757368 -0.1484845
-0.6658333 0.8750 -1.4172800 0.0803002 0.7514466
1.0241667 0.9583 0.4905675 0.1329762 0.5335992
-1.7225000 0.0417 -1.1874816 0.1278142 -0.5350184
-0.1525000 0.1250 0.3297748 0.0783764 -0.4822748
0.6475000 0.2083 1.4858927 0.0756092 -0.8383927
3.1975000 0.2917 2.2386031 0.0811237 0.9588969
3.4975000 0.3750 2.7453971 0.0876098 0.7521029
2.5875000 0.4583 2.1417191 0.0876044 0.4457809
0.1775000 0.5417 0.4302506 0.0876044 -0.2527506
-1.9625000 0.6250 -1.5283681 0.0876098 -0.4341319
-3.1725000 0.7083 -2.4561226 0.0811840 -0.7163774
-2.6225000 0.7917 -2.3873488 0.0757368 -0.2351512
-0.6925000 0.8750 -1.4172800 0.0803002 0.7247800
0.2175000 0.9583 0.4905675 0.1329762 -0.2730675
-0.3850000 0.0417 -1.1874816 0.1278142 0.8024816
-0.0850000 0.1250 0.3297748 0.0783764 -0.4147748
0.6750000 0.2083 1.4858927 0.0756092 -0.8108927
2.4750000 0.2917 2.2386031 0.0811237 0.2363969
3.1350000 0.3750 2.7453971 0.0876098 0.3896029
2.3350000 0.4583 2.1417191 0.0876044 0.1932809
0.5750000 0.5417 0.4302506 0.0876044 0.1447494
-1.4350000 0.6250 -1.5283681 0.0876098 0.0933681
-3.1850000 0.7083 -2.4561226 0.0811840 -0.7288774
-2.9250000 0.7917 -2.3873488 0.0757368 -0.5376512
-1.4350000 0.8750 -1.4172800 0.0803002 -0.0177200
0.2550000 0.9583 0.4905675 0.1329762 -0.2355675
-0.5600000 0.0417 -1.1874816 0.1278142 0.6274816
-0.2000000 0.1250 0.3297748 0.0783764 -0.5297748
0.8700000 0.2083 1.4858927 0.0756092 -0.6158927
1.7300000 0.2917 2.2386031 0.0811237 -0.5086031
2.7200000 0.3750 2.7453971 0.0876098 -0.0253971
2.2700000 0.4583 2.1417191 0.0876044 0.1282809
0.1800000 0.5417 0.4302506 0.0876044 -0.2502506
-1.5500000 0.6250 -1.5283681 0.0876098 -0.0216319
-3.0000000 0.7083 -2.4561226 0.0811840 -0.5438774
-2.5200000 0.7917 -2.3873488 0.0757368 -0.1326512
-0.5000000 0.8750 -1.4172800 0.0803002 0.9172800
0.5600000 0.9583 0.4905675 0.1329762 0.0694325
-0.6041667 0.0417 -1.1874816 0.1278142 0.5833149
0.3158333 0.1250 0.3297748 0.0783764 -0.0139414
0.5358333 0.2083 1.4858927 0.0756092 -0.9500594
1.8958333 0.2917 2.2386031 0.0811237 -0.3427698
3.2058333 0.3750 2.7453971 0.0876098 0.4604363
2.4958333 0.4583 2.1417191 0.0876044 0.3541143
0.3058333 0.5417 0.4302506 0.0876044 -0.1244172
-1.4841667 0.6250 -1.5283681 0.0876098 0.0442014
-2.8941667 0.7083 -2.4561226 0.0811840 -0.4380441
-2.9141667 0.7917 -2.3873488 0.0757368 -0.5268178
-1.1841667 0.8750 -1.4172800 0.0803002 0.2331133
0.3258333 0.9583 0.4905675 0.1329762 -0.1647341
-0.6154545 0.0417 -1.1874816 0.1278142 0.5720271
0.1045455 0.1250 0.3297748 0.0783764 -0.2252293
0.5045455 0.2083 1.4858927 0.0756092 -0.9813473
2.2045455 0.2917 2.2386031 0.0811237 -0.0340576
3.0645455 0.3750 2.7453971 0.0876098 0.3191484
2.3745455 0.4583 2.1417191 0.0876044 0.2328264
0.3745455 0.5417 0.4302506 0.0876044 -0.0557051
-1.4554545 0.6250 -1.5283681 0.0876098 0.0729135
-2.7154545 0.7083 -2.4561226 0.0811840 -0.2593319
-2.7254545 0.7917 -2.3873488 0.0757368 -0.3381057
-1.1154545 0.8750 -1.4172800 0.0803002 0.3018254
augment(l1, se_fit = TRUE) |>
  ggplot(aes(x = year_fraction, y = .fitted)) + 
  geom_line() + 
  geom_ribbon(aes(ymin = .fitted - 2*.se.fit, ymax = .fitted + 2*.se.fit), alpha = 0.20) + 
  my_theme

This line is only drawn using 12 points, so we might want to generate a smoother prediction by creating a new set of dates. We also add the original data to the plot.

new_data = tibble(year_fraction = seq(min(co2$year_fraction), 
                                      max(co2$year_fraction),
                                      length = 100))
augment(l1, newdata = new_data, se_fit = TRUE) |>
  ggplot(aes(x = year_fraction, y = .fitted)) + 
  geom_line() + 
  geom_ribbon(aes(ymin = .fitted - 2*.se.fit,
                  ymax = .fitted + 2*.se.fit), 
              alpha = 0.20) + 
  geom_point(aes(y = co2_anomaly), data = co2) +
  my_theme

The LOESS function will not predict outside of the range of data provided, so I had to select the range of new_data above carefully to get the line to start and end near the first and last points.

Now we make two plots that contrast the results with different sized windows (span) and degree of the local regression models.

l2 <- loess(co2_avg ~ decimal_date, data = co2, 
            degree = 1, span = 0.05)
l3 <- loess(co2_avg ~ decimal_date, data = co2, 
            degree = 1, span = 0.25)
new_data = tibble(decimal_date = seq(min(co2$decimal_date), 
                                      max(co2$decimal_date),
                                     length = 300))
a2 <- augment(l2, newdata = new_data, se_fit = TRUE) 
a3 <- augment(l3, newdata = new_data, se_fit = TRUE) 
ggplot(a2, aes(x = decimal_date, y = .fitted)) + 
  geom_line(col="green") + 
  geom_ribbon(aes(ymin = .fitted - 2*.se.fit, 
                  ymax = .fitted + 2*.se.fit), 
              alpha = 0.20, fill="green") + 
  geom_point(aes(y = co2_avg), data = co2) +
  geom_line(data = a3, 
            col="blue") + 
  geom_ribbon(data = a3, 
              aes(ymin = .fitted - 2*.se.fit, 
                  ymax = .fitted + 2*.se.fit), 
              alpha = 0.20, fill="blue") + 
  my_theme

LOESS fits are slow and take a lot of memory compared to other methods (both time and storage requirements increase like the square of the number of points), so they are usually only used for small data sets.

17.3 Packages used

In addition to tidyverse, kableExtra and gapminder, in this lesson we used

  • broom
  • mgcv for GAMs
  • gratia for plotting and extracting information from GAMs

17.4 Further reading