Spline smoothing

class: center, middle, inverse, title-slide

# Spline smoothing
### Niels Richard Hansen
### September 8, 2020

---

## Recall the temperature time series

```r
p <- qplot(Year, Temperature, data = Nuuk_year); p
```

---
## Smoothing splines

The minimizer of 
`$$L(f) = \sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \underbrace{\int f''(z)^2 \mathrm{d} z}_{\|f''\|_2^2}$$`
is a cubic spline

with *knots* in the data points `$x_i$`, that is, a function 
`$$f = \sum_i \beta_i \varphi_i$$`
where `$\varphi_i$` are basis functions for the `$n$`-dimensional space of such splines.

Cubic splines are piecewise degree 3 polynomials in between knots.

---
## Smoothing splines

In vector notation 
`$$\hat{\mathbf{f}} = \boldsymbol{\Phi}\hat{\beta}$$`
with `$\boldsymbol{\Phi}_{ij} = \varphi_j(x_i)$`,

and
`\begin{aligned}
L(\mathbf{f}) & = (\mathbf{y} - \mathbf{f})^T (\mathbf{y} - \mathbf{f}) + \lambda \|f''\|_2^2 \\
& = ( \mathbf{y} -  \boldsymbol{\Phi}\beta)^T (\mathbf{y} -  \boldsymbol{\Phi}\beta) + \lambda \beta^T \mathbf{\Omega} \beta
\end{aligned}`

with 
`$$\mathbf{\Omega}_{ij} = \int \varphi_i''(z) \varphi_j''(z) \mathrm{d}z.$$`

---
## Smoothing splines

The minimizer is 
`$$\hat{\beta} = (\boldsymbol{\Phi}^T \boldsymbol{\Phi} + \lambda \mathbf{\Omega})^{-1}\boldsymbol{\Phi}^T \mathbf{y}$$`

with resulting smoother
`$$\hat{\mathbf{f}} = \underbrace{\boldsymbol{\Phi} ((\boldsymbol{\Phi}^T \boldsymbol{\Phi} + \lambda \mathbf{\Omega})^{-1}\boldsymbol{\Phi}^T}_{\mathbf{S}_{\lambda}} \mathbf{y}.$$`

We recognize this as a *linear smoother* with smoother matrix `$\mathbf{S}_{\lambda}$`.

---
## Splines in R

```r
library(splines)
# Note the specification of repeated boundary knots
knots <- c(0, 0, 0, seq(0, 1, 0.2), 1, 1, 1)
xx <- seq(0, 1, 0.005)
B_splines <- splineDesign(knots, xx)
matplot(xx, B_splines, type = "l", lty = 1)
```

---
## Penalty matrix

```r
# See notes for implementation of penalty matrix 
Omega <- pen_mat(seq(0, 1, 0.1))
image(Matrix(Omega))
```

---
## Fitting a smoothing spline

We implement the matrix-algebra directly 
for computing `$\mathbf{S}_{\lambda} \mathbf{y}$`.

```r
inner_knots <- Nuuk_year$Year
knots <- c(rep(range(inner_knots), 3), inner_knots)
*Phi <- splineDesign(knots, inner_knots)
```

```r
Omega <- pen_mat(inner_knots)
smoother <- function(lambda) 
  Phi %*% solve(
    crossprod(Phi) + lambda * Omega,    # Phi^T Phi + lambda Omega
    t(Phi) %*% Nuuk_year$Temperature    # Phi^T y
  )
```

---
## Fitting a smoothing spline

```r
p + geom_line(aes(y = smoother(10)), color = "blue") +   # Undersmooth
  geom_line(aes(y = smoother(1000)), color = "red") +       # Smooth
  geom_line(aes(y = smoother(100000)), color = "purple")    # Oversmooth
```

---
## Generalized cross-validation

With `$\mathrm{df} = \mathrm{trace}(\mathbf{S}) = \sum_{i=1}^n S_{ii}$`
we  replace `$S_{ii}$` in LOOCV by `$\mathrm{df} / n$` to get the
*generalized* cross-validation criterion
`$$\mathrm{GCV} = \sum_{i=1}^n \left(\frac{y_i - \hat{f}_i}{1 - \mathrm{df} / n}\right)^2.$$`

```r
gcv <- function(lambda, y) {
  S <- Phi %*% solve(crossprod(Phi) + lambda * Omega, t(Phi))
  df <- sum(diag(S))  # The trace of the smoother matrix
  sum(((y - S %*% y) / (1 - df / length(y)))^2, na.rm = TRUE) 
}
```

---
## Generalized cross-validation

Then we apply this function to a grid of `$\lambda$`-values and 
choose the value of `$\lambda$` that minimizes GCV.

```r
lambda <- seq(50, 250, 2)
GCV <- sapply(lambda, gcv, y = Nuuk_year$Temperature)
lambda_opt <- lambda[which.min(GCV)]
```

---
## Generalized cross-validation

```r
qplot(lambda, GCV) + geom_vline(xintercept = lambda_opt, color = "red")
```

---
## Optimal smoothing spline

```r
# Optimal smoother using our implementation
smooth_opt <- smoother(lambda_opt) 
p + geom_line(aes(y = smooth_opt), color = "blue")
```

---
## Using `smooth.spline()`

```r
# and using 'smooth.spline()' without fast heuristic
smooth <- smooth.spline(Nuuk_year$Year, Nuuk_year$Temperature, all.knots = TRUE)  
p + geom_line(aes(y = smooth$y), color = "blue")
```

---
## Efficient computations

In practice we use `$p < n$` basis functions.

Using the singular value decomposition 
`$$\Phi = \mathbf{U} D \mathbf{V}^T$$`

it holds that 
`$$\mathbf{S}_{\lambda} = \widetilde{\mathbf{U}}  (I + \lambda  \Gamma)^{-1} \widetilde{\mathbf{U}}^T$$`
where `$\widetilde{\mathbf{U}} = \mathbf{U} \mathbf{W}$` and 
`$$D^{-1} \mathbf{V}^T \mathbf{\Omega} \mathbf{V} D^{-1} = \mathbf{W} \Gamma \mathbf{W}^T.$$`

---
## Efficient computations

The interpretation of this representation is as follows.

* First, the coefficients, `$\hat{\beta} = \widetilde{\mathbf{U}}^Ty$`, are computed for 
expanding `$y$` in the basis given by the columns of `$\widetilde{\mathbf{U}}$`.

* Second, the `$i$`-th coefficient is shrunk towards 0,
`$$\hat{\beta}_i(\lambda) = \frac{\hat{\beta}_i}{1 + \lambda \gamma_i}.$$`

* Third, the smoothed values, `$\widetilde{\mathbf{U}} \hat{\beta}(\lambda)$`, 
are computed as an expansion using the shrunken coefficients.

---
### The Demmler-Reinsch basis (columns of `$\widetilde{\mathbf{U}}$`)

---
## and the eigenvalues `$\gamma_i$`