class: center, middle, inverse, title-slide # S3 smoother and ggplot2 ### Niels Richard Hansen ### September 8, 2020 --- ## Recall the temperature time series ```r p <- qplot(Year, Temperature, data = Nuuk_year); p ``` <img src="S3smoother_files/figure-html/unnamed-chunk-2-1.png" height="400" style="display: block; margin: auto;" /> --- ## A loess smoother ```r p + geom_smooth() ``` ``` `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` <img src="S3smoother_files/figure-html/unnamed-chunk-3-1.png" height="400" style="display: block; margin: auto;" /> --- ## Loess The nonlinear loess estimator is implemented in the `loess()` function. It does not automatically select the span (tuning parameter) but has a default of 0.75, which is often too large. Since loess is nonlinear, the formulas for linear smoothers do not apply. One can instead implement 5- or 10-fold cross validation for tuning. The `loess()` function does return an object with a `trace.hat` entry, which might be used as a surrogate for `\(\mathrm{trace}(\mathbf{S})\)` for GCV, say. Loess is a robust smoother (linear smoothers are not) and relatively insensitive to outliers. --- ## Another loess smoother ```r p + geom_smooth(method = "loess", span = 0.5) ``` ``` `geom_smooth()` using formula 'y ~ x' ``` <img src="S3smoother_files/figure-html/unnamed-chunk-4-1.png" height="400" style="display: block; margin: auto;" /> --- ## A linear "smoother" ```r p + geom_smooth(method = "lm") ``` ``` `geom_smooth()` using formula 'y ~ x' ``` <img src="S3smoother_files/figure-html/unnamed-chunk-5-1.png" height="400" style="display: block; margin: auto;" /> --- ## A polynomial smoother ```r p + geom_smooth(method = "lm", formula = y ~ poly(x, 5)) ``` <img src="S3smoother_files/figure-html/unnamed-chunk-6-1.png" height="400" style="display: block; margin: auto;" /> --- ## Another polynomial smoother ```r p + geom_smooth(method = "lm", formula = y ~ poly(x, 20)) ``` <img src="S3smoother_files/figure-html/unnamed-chunk-7-1.png" height="400" style="display: block; margin: auto;" /> --- ## A spline smoother ```r p + geom_smooth(method = "gam", formula = y ~ s(x)) ``` <img src="S3smoother_files/figure-html/unnamed-chunk-8-1.png" height="400" style="display: block; margin: auto;" /> --- ## Another spline smoother ```r p + geom_smooth(method = "gam", formula = y ~ s(x, k = 100)) ``` <img src="S3smoother_files/figure-html/unnamed-chunk-9-1.png" height="400" style="display: block; margin: auto;" /> --- ## Smoothing with ggplot2 The `geom_smooth()` function easily adds misc. model fits or scatter plot smoothers to the scatter plot. Spline smoothing is performed via the `gam()` function in the mgcv package, whereas loess smoothing is via the `loess()` function in the stats package. Any "smoother" can be used that supports a formula interface and has a prediction function adhering to the standards of `predict.lm()`. --- ## Running means ```r # The vector 'y' must be sorted according to the x-values run_mean <- function(y, k) { n <- length(y) m <- floor((k - 1) / 2) k <- 2 * m + 1 # Ensures k to be odd and m = (k-1) / 2 y <- y / k s <- rep(NA, n) s[m + 1] <- sum(y[1:k]) for(i in (m + 1):(n - m - 1)) s[i + 1] <- s[i] - y[i - m] + y[i + 1 + m] s } ``` --- ## An interface for `geom_smooth()`. ```r running_mean <- function(..., data, k = 5) { ord <- order(data$x) s <- run_mean(data$y[ord], k = k) structure(list(x = data$x[ord], y = s), class = "running_mean") } ``` And a predict method. ```r predict.running_mean <- function(object, newdata, ...) approx(object$x, object$y, newdata$x)$y # Linear interpolation ``` --- ## A running mean ```r p + geom_smooth(method = "running_mean", se = FALSE, n = 200) ``` ``` `geom_smooth()` using formula 'y ~ x' ``` <img src="S3smoother_files/figure-html/unnamed-chunk-12-1.png" height="400" style="display: block; margin: auto;" /> --- ## Another running mean ```r p + geom_smooth(method = "running_mean", se = FALSE, n = 200, method.args = list(k = 13)) ``` ``` `geom_smooth()` using formula 'y ~ x' ``` <img src="S3smoother_files/figure-html/unnamed-chunk-13-1.png" height="400" style="display: block; margin: auto;" /> --- ## Boundary ```r running_mean <- function(..., data, k = 5, boundary = NULL) { ord <- order(data$x) y <- data$y[ord] n <- length(y) m <- floor((k - 1) / 2) if (m > 0 & !is.null(boundary)) { if (boundary == "pad") * y <- c(rep(y[1], m), y, rep(y[n], m)) if (boundary == "rev") * y <- c(y[m:1], y, y[n:(n - m + 1)]) } s <- run_mean(y, k = k) if(!is.null(boundary)) s <- na.omit(s) structure(list(x = data$x[ord], y = s), class = "running_mean") } ``` --- ## No boundary <img src="S3smoother_files/figure-html/unnamed-chunk-15-1.png" height="400" style="display: block; margin: auto;" /> --- ## Boundary, padding <img src="S3smoother_files/figure-html/unnamed-chunk-16-1.png" height="400" style="display: block; margin: auto;" /> --- ## Boundary, reversion <img src="S3smoother_files/figure-html/unnamed-chunk-17-1.png" height="400" style="display: block; margin: auto;" />