class: center, middle, inverse, title-slide # Bivariate smoothing ### Niels Richard Hansen ### September 11, 2019 --- ## A scatter plot ```r p <- qplot(Year, Temperature, data = Nuuk_year); p ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-2-1.png" height="400" style="display: block; margin: auto;" /> --- ## Nearest neighbor estimation Data: `\((x_1, y_1), \ldots, (x_n, y_n)\)` The `\(k\)` nearest neighbor smoother in `\(x\)` is defined as `$$\hat{f}(x) = \frac{1}{k} \sum_{j \in N(x)} y_j$$` where `\(N(x)\)` is the set of indices for the `\(k\)` nearest neighbors of `\(x\)`. This is an estimator of `$$f(x) = E(Y \mid X = x).$$` --- ## Nearest neighbor estimation The estimator in `\(x_i\)` is `$$\hat{f}_i = \frac{1}{k} \sum_{j \in N_i} y_j$$` where `\(N_i = N(x_i)\)`. With `\(S_{ij} = \frac{1}{k} 1(j \in N_i)\)` and `\(\mathbf{S} = (S_{ij})\)` `$$\hat{\mathbf{f}} = (\hat{f}_i) = \mathbf{S} \mathbf{y}.$$` `\(\hat{\mathbf{f}}\)` is an estimator of the vector `\((f(x_i))\)`. --- ## Linear smoothers Any estimator of `\((f(x_i))\)` of the form `\(\mathbf{S} \mathbf{y}\)` for a *smoother matrix* `\(\mathbf{S}\)` is called a *linear smoother*. The `\(k\)` nearest neighbor smoother is a simple example of a linear smoother that works for `\(x\)`-values in any metric space. The representation of a linear smoother as a matrix-vector product, `$$\mathbf{S} \mathbf{y}$$` is theoretically useful, but often not the best way to actually compute `\(\hat{\mathbf{f}}\)`. --- ## Running mean When `\(x_i \in \mathbb{R}\)` we can sort data according to `\(x\)`-values and then use a *symmetric* neighbor definition: `$$N_i = \{i - (k - 1) / 2, i - (k - 1) / 2 + 1, \ldots, \\ i - 1 , i, i + 1, \ldots, i + (k - 1) / 2\}$$` (for `\(k\)` odd.) This simplifies computations: we don't need to keep track of metric comparisons, only the order matters. --- ## Bias-variance tradeoff If data is i.i.d. with `\(V(Y \mid X) = \sigma^2\)`: `\begin{aligned} & \mathrm{MSE}(x) = E((f(x) - \hat{f}(x))^2 \mid X=x) \\ & = E((f(x)-E(\hat{f}(x)))^2 \mid X=x) + E((\hat{f}(x)-E(\hat{f}(x)))^2 \mid X=x) \\ & = \underbrace{\left[f(x) - \frac{1}{k} \sum_{l \in N(x)} f(x_l)\right]^2}_{\textrm{squared bias}} + \underbrace{\frac{\sigma^2}{k}}_{\textrm{variance}} \end{aligned}` Here `\(f(x) = E(Y \mid X = x)\)`. Small `\(k\)` gives large variance and small bias (if `\(f\)` is smooth). Large `\(k\)` give small variance and potentially large bias (if `\(f\)` is not constant). --- ## Running mean Implementation (assuming `\(y\)` in correct order) using the identity `$$\hat{f}_{i+1} = \hat{f}_{i} - y_{i - (k-1)/2} + y_{i + (k + 1)/2},$$` ```r ## The vector 'y' must be sorted according to the x-values runMean <- function(y, k) { n <- length(y) m <- floor((k - 1) / 2) k <- 2 * m + 1 y <- y / k s <- rep(NA, n) s[m + 1] <- sum(y[1:k]) for(i in (m + 1):(n - m - 1)) s[i + 1] <- s[i] - y[i - m] + y[i + 1 + m] s } ``` --- ## Visualization ```r f_hat <- runMean(Nuuk_year$Temperature, 11) p + geom_line(aes(y = f_hat), color = "blue") ``` <img src="BivariateSmoothers_files/figure-html/Nuuk-NN-plot2-1.png" height="400" style="display: block; margin: auto;" /> --- ## Using 'filter' The R function `filter` computes running means and moving averages ```r f_hat_filter <- stats::filter(Nuuk_year$Temperature, rep(1/11, 11)) range(f_hat_filter - f_hat, na.rm = TRUE) ``` ``` [1] -1.332268e-15 4.440892e-16 ``` ```r f_hat_filter[c(1:10, 137:147)] ``` ``` [1] NA NA NA NA NA -1.85 -1.72 -1.55 -1.52 -1.48 -0.55 [12] -0.52 -0.15 -0.23 -0.19 -0.12 NA NA NA NA NA ``` --- ## Benchmarking ``` expr median 1 S1 %*% y[1:512] 324.9540 2 S2 %*% y[1:1024] 1327.9415 3 S3 %*% y[1:2048] 5429.9425 4 S4 %*% y[1:4196] 23142.1245 5 runMean(y[1:512], k = 11) 80.4405 6 runMean(y[1:1024], k = 11) 160.9555 7 runMean(y[1:2048], k = 11) 302.6750 8 runMean(y[1:4196], k = 11) 604.9305 9 stats::filter(y[1:512], rep(1/11, 11)) 91.1445 10 stats::filter(y[1:1024], rep(1/11, 11)) 114.6415 11 stats::filter(y[1:2048], rep(1/11, 11)) 172.9890 12 stats::filter(y[1:4196], rep(1/11, 11)) 263.9965 ``` The Matrix-vector multiplication is `\(O(n^2)\)`. The two other algorithms are `\(O(n)\)`. --- ## LOOCV The running mean / nearest neighbour smoother is a *linear smoother*, `\(\hat{\mathbf{f}} = \mathbf{S} \mathbf{Y}.\)` How to predict `\(y_i\)` if `\((x_i, y_i)\)` is left out? A *definition* for a linear smoother is `$$\hat{f}^{-i}_i = \sum_{j \neq i} \frac{S_{ij}y_j}{1 - S_{ii}}.$$` For many smoothing procedures with a natural "out-of-sample" prediction method the identity above holds. --- ## LOOCV It follows that for *leave-one-out cross validation* `$$\mathrm{LOOCV} = \sum_{i} (y_i - \hat{f}^{-i}_i)^2 = \sum_{i} \left(\frac{y_i - \hat{f}_i}{1 - S_{ii}}\right)^2$$` ```r loocv <- function(k, y) { f_hat <- runMean(y, k) mean(((y - f_hat) / (1 - 1/k))^2, na.rm = TRUE) } ``` --- ##LOOCV ```r k <- seq(3, 40, 2) CV <- sapply(k, function(kk) loocv(kk, Nuuk_year$Temperature)) k_opt <- k[which.min(CV)] qplot(k, CV) + geom_line() + geom_vline(xintercept = k_opt, color = "red") ``` <img src="BivariateSmoothers_files/figure-html/Nuuk-running-loocv-1.png" width="500" height="333" style="display: block; margin: auto;" /> --- ##LOOCV The optimal choice of `\(k\)` is 15. ```r p + geom_line(aes(y = runMean(Nuuk_year$Temperature, 25)), color = "red") + geom_line(aes(y = runMean(Nuuk_year$Temperature, k_opt)), color = "blue") ``` <img src="BivariateSmoothers_files/figure-html/Nuuk-NN-plot3-1.png" height="400" style="display: block; margin: auto;" /> --- ## Smoothing splines The minimizer of `$$L(f) = \sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \underbrace{\int f''(z)^2 \mathrm{d} z}_{\|f''\|_2^2}$$` is a cubic spline with *knots* in the data points `\(x_i\)`, that is, a function `$$f = \sum_i \beta_i \varphi_i$$` where `\(\varphi_i\)` are basis functions for the `\(n\)`-dimensional space of such splines. Cubic splines are piecewise degree 3 polynomials in between knots. --- ## Smoothing splines In vector notation `$$\hat{\mathbf{f}} = \boldsymbol{\Phi}\hat{\beta}$$` with `\(\boldsymbol{\Phi}_{ij} = \varphi_j(x_i)\)`, and `\begin{aligned} L(\mathbf{f}) & = (\mathbf{y} - \mathbf{f})^T (\mathbf{y} - \mathbf{f}) + \lambda \|f''\|_2^2 \\ & = ( \mathbf{y} - \boldsymbol{\Phi}\beta)^T (\mathbf{y} - \boldsymbol{\Phi}\beta) + \lambda \beta^T \mathbf{\Omega} \beta \end{aligned}` with `$$\mathbf{\Omega}_{ij} = \int \varphi_i''(z) \varphi_j''(z) \mathrm{d}z.$$` --- ## Smoothing splines The minimizer is `$$\hat{\beta} = (\boldsymbol{\Phi}^T \boldsymbol{\Phi} + \lambda \mathbf{\Omega})^{-1}\boldsymbol{\Phi}^T \mathbf{Y}$$` with resulting smoother `$$\hat{\mathbf{f}} = \underbrace{\boldsymbol{\Phi} ((\boldsymbol{\Phi}^T \boldsymbol{\Phi} + \lambda \mathbf{\Omega})^{-1}\boldsymbol{\Phi}^T}_{\mathbf{S}_{\lambda}} \mathbf{y}.$$` --- ## Splines in R ```r library(splines) ## Note the specification of repeated boundary knots knots <- c(0, 0, 0, seq(0, 1, 0.2), 1, 1, 1) xx <- seq(0, 1, 0.005) B_splines <- splineDesign(knots, xx) matplot(xx, B_splines, type = "l", lty = 1) ``` <img src="BivariateSmoothers_files/figure-html/splines-1.png" height="333" style="display: block; margin: auto;" /> --- ## Penalty matrix ```r ## See notes for implementation of penalty matrix Omega <- pen_mat(seq(0, 1, 0.1)) image(Matrix(Omega)) ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-4-1.png" height="400" style="display: block; margin: auto;" /> --- ## Fitting a smoothing spline We implement the matrix-algebra directly for computing `\(\mathbf{S}_{\lambda} \mathbf{y}\)`. ```r inner_knots <- Nuuk_year$Year knots <- c(rep(range(inner_knots), 3), inner_knots) Phi <- splineDesign(knots, inner_knots) Omega <- pen_mat(inner_knots) smoother <- function(lambda) Phi %*% solve(crossprod(Phi) + lambda * Omega, t(Phi) %*% Nuuk_year$Temperature) ``` --- ## Fitting a smoothing spline ```r p + geom_line(aes(y = smoother(0.1)), color = "blue") + ## Undersmooth geom_line(aes(y = smoother(10)), color = "red") + ## Smooth geom_line(aes(y = smoother(1000)), color = "purple") ## Oversmooth ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-5-1.png" height="400" style="display: block; margin: auto;" /> --- ## Generalized cross-validation With `\(\mathrm{df} = \mathrm{trace}(\mathbf{S}) = \sum_{i=1}^n S_{ii}\)` we replace `\(S_{ii}\)` in LOOCV by `\(\mathrm{df} / n\)` to get the *generalized* cross-validation criterion `$$\mathrm{GCV} = \sum_{i=1}^n \left(\frac{y_i - \hat{f}_i}{1 - \mathrm{df} / n}\right)^2.$$` ```r gcv <- function(lambda, y) { S <- Phi %*% solve(crossprod(Phi) + lambda * Omega, t(Phi)) df <- sum(diag(S)) ## The trace of the smoother matrix sum(((y - S %*% y) / (1 - df / length(y)))^2, na.rm = TRUE) } ``` --- ## Generalized cross-validation Then we apply this function to a grid of `\(\lambda\)`-values and choose the value of `\(\lambda\)` that minimizes GCV. ```r lambda <- seq(50, 250, 2) GCV <- sapply(lambda, gcv, y = Nuuk_year$Temperature) lambda_opt <- lambda[which.min(GCV)] ``` --- ## Generalized cross-validation ```r qplot(lambda, GCV) + geom_vline(xintercept = lambda_opt, color = "red") ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-6-1.png" height="400" style="display: block; margin: auto;" /> --- ## Optimal smoothing spline ```r temp_smooth_opt <- Phi %*% solve(crossprod(Phi) + lambda_opt * Omega, t(Phi) %*% Nuuk_year$Temperature) p + geom_line(aes(y = temp_smooth_opt), color = "blue") ``` <img src="BivariateSmoothers_files/figure-html/Nuuk-spline-opt-1.png" height="400" style="display: block; margin: auto;" /> --- ## Using `smooth.spline` ```r temp_smooth_splines <- smooth.spline(Nuuk_year$Year, Nuuk_year$Temperature, all.knots = TRUE) ## Don't use fast heuristic p + geom_line(aes(y = temp_smooth_splines$y), color = "blue") ``` <img src="BivariateSmoothers_files/figure-html/Nuuk-spline-opt2-1.png" height="400" style="display: block; margin: auto;" /> --- ## Efficient computations In practice we use `\(p < n\)` basis functions. Using the singular value decomposition `$$\Phi = \mathbf{U} D \mathbf{V}^T$$` it holds that `$$\mathbf{S}_{\lambda} = \widetilde{\mathbf{U}} (I + \lambda \Gamma)^{-1} \widetilde{\mathbf{U}}^T$$` where `\(\widetilde{\mathbf{U}} = \mathbf{U} \mathbf{W}\)` and `$$D^{-1} \mathbf{V}^T \mathbf{\Omega} \mathbf{V} D^{-1} = \mathbf{W} \Gamma \mathbf{W}^T.$$` --- ## Efficient computations The interpretation of this representation is as follows. * First, the coefficients, `\(\hat{\beta} = \widetilde{\mathbf{U}}^Ty\)`, are computed for expanding `\(y\)` in the basis given by the columns of `\(\mathbf{U}\)`. * Second, the `\(i\)`th coefficient is shrunk towards 0, `$$\hat{\beta}_i(\lambda) = \frac{\hat{\beta}_i}{1 + \lambda \gamma_i}.$$` * Third, the smoothed values, `\(\widetilde{\mathbf{U}} \hat{\beta}(\lambda)\)`, are computed as an expansion using the shrunken coefficients. --- ### The Demmler-Reinsch basis (columns of `\(\widetilde{\mathbf{U}}\)`) <img src="BivariateSmoothers_files/figure-html/spline-diagonalization-1.png" height="500" style="display: block; margin: auto;" /> --- ## and the eigenvalues `\(\gamma_i\)` <img src="BivariateSmoothers_files/figure-html/spline-gamma-1.png" height="400" style="display: block; margin: auto;" /> --- ## Nadaraya-Watson The `ksmooth` function implements the Nadaraya-Watson smoother. There is no automatic choice of bandwidth. Kernel smoothing for bivariate smoothing is easy to implement along the same lines as kernel smoothing for density estimation. Binning can be useful for handling large data sets. Local regression smoothing can be implemented using `lm` and its `weights` argument, or perhaps more efficiently using `lm.wfit`. Refits are in principle needed in all points. The `locpoly` function in the KernSmooth package does local polynomial regression using binning. It does not do automatic bandwidth selection. For kernel smoothing it is quite easy to compute the diagonals in the smoothing matrix and thus LOOCV or GCV. --- ## Loess The nonlinear loess estimator is implemented in the `loess` function. It could be implemented using `lm.wfit`. The `loess` function does not automatically select the span but has a default of 0.75, which is often too large. Since loess is nonlinear, the formulas for linear smoothers do not apply. One can instead implement 5- or 10-fold cross validation. The `loess` function does return an object with a `trace.hat` entry, which might be used as a surrogate for `\(\mathrm{trace}(\mathbf{S})\)` for GCV, say. Loess is a robust smoother (linear smoothers are not) and relatively insensitive to outliers. --- ## A loess smoother ```r p + geom_smooth() ``` ``` `geom_smooth()` using method = 'loess' and formula 'y ~ x' ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-7-1.png" height="400" style="display: block; margin: auto;" /> --- ## Another loess smoother ```r p + geom_smooth(method = "loess", span = 0.5) ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-8-1.png" height="400" style="display: block; margin: auto;" /> --- ## A linear smoother ```r p + geom_smooth(method = "lm") ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-9-1.png" height="400" style="display: block; margin: auto;" /> --- ## A polynomial smoother ```r p + geom_smooth(method = "lm", formula = y ~ poly(x, 5)) ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-10-1.png" height="400" style="display: block; margin: auto;" /> --- ## Another polynomial smoother ```r p + geom_smooth(method = "lm", formula = y ~ poly(x, 20)) ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-11-1.png" height="400" style="display: block; margin: auto;" /> --- ## A spline smoother ```r p + geom_smooth(method = "gam", formula = y ~ s(x)) ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-12-1.png" height="400" style="display: block; margin: auto;" /> --- ## Another spline smoother ```r p + geom_smooth(method = "gam", formula = y ~ s(x, k=40)) ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-13-1.png" height="400" style="display: block; margin: auto;" /> --- ## Smoothing with ggplot2 The `geom_smooth` function easily adds misc. model fits or scatter plot smoothers to the scatter plot. The `stat_smooth` function is reponsible for delegating the computations. See `?stat_smooth`. Spline smoothing is performed via the `gam` function in the mgcv package, whereas loess smoothing is via the `loess` function in the stats package. Any "smoother" can be used that supports a formula interface and has a prediction function adhering to the standards of `predict.lm`. --- ## An interface for `geom_smooth`. ```r rMean <- function(..., data, k = 5) { ord <- order(data$x) structure(list(x = data$x[ord], y = runMean(data$y[ord], k = k)), class = "rMean") } predict.rMean <- function(object, newdata, ...) approx(object$x, object$y, newdata$x)$y ## Linear interpolation ``` --- ## A running mean ```r p + geom_smooth(method = "rMean", se = FALSE, n = 200) ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-15-1.png" height="400" style="display: block; margin: auto;" /> --- ## Another running mean ```r p + geom_smooth(method = "rMean", se = FALSE, n = 200, method.args = list(k = 13)) ``` <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-16-1.png" height="400" style="display: block; margin: auto;" /> --- ## Boundary ```r rMean <- function(..., data, k = 5, boundary = NULL) { ord <- order(data$x) y <- data$y[ord] n <- length(y) m <- floor((k - 1) / 2) if (m > 0 & !is.null(boundary)) { if (boundary == "pad") y <- c(rep(y[1], m), y, rep(y[n], m)) if (boundary == "rev") y <- c(y[m:1], y, y[n:(n - m + 1)]) } s <- runMean(y, k = k) if(!is.null(boundary)) s <- na.omit(s) structure(list(x = data$x[ord], y = s), class = "rMean") } ``` --- ## No boundary <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-18-1.png" height="400" style="display: block; margin: auto;" /> --- ## Boundary, padding <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-19-1.png" height="400" style="display: block; margin: auto;" /> --- ## Boundary, reversion <img src="BivariateSmoothers_files/figure-html/unnamed-chunk-20-1.png" height="400" style="display: block; margin: auto;" />