class: center, middle, inverse, title-slide # Poisson regression and numerical optimization ### Niels Richard Hansen ### September 24, 2020 --- ## Poisson regression Consider observations `\(y_i \in \mathbb{N}_0\)`, `\(x_i \in \mathbb{R}^p\)` for `\(i = 1, \ldots, n\)`, and Poisson point probabilities `$$f_i(y_i) = e^{-\mu(x_i)} \frac{\mu(x_i)^{y_i}}{y_i!}.$$` If `\(\log(\mu(x_i)) = x_i^T \beta\)` we rewrite `$$f_i(y_i) = e^{\beta^T x_i y_i - \exp( x_i^T \beta)} \frac{1}{y_i!}.$$` This is a *Poisson regression model*. --- ## Example ```r summary(vegetables) ``` ``` sale normalSale store Min. : 1.00 Min. : 0.20 1 : 4 1st Qu.: 12.00 1st Qu.: 4.20 102 : 4 Median : 21.00 Median : 7.25 106 : 4 Mean : 40.29 Mean : 11.72 107 : 4 3rd Qu.: 40.00 3rd Qu.: 12.25 11 : 4 Max. :571.00 Max. :102.00 110 : 4 (Other):1042 ``` ```r dim(vegetables) ``` ``` [1] 1066 3 ``` --- ## Poisson model `$$\log(E(\text{sale})) = \beta_0 + \beta_1 \log(\text{normalSale}) + \beta_{\text{store}}.$$` ```r ## Note, variable store is a factor with 352 levels! pois_model <- glm( sale ~ log(normalSale) + store, data = vegetables, family = poisson() ) ``` ```r summary(pois_model) %>% coefficients() %>% head() ``` ``` Estimate Std. Error z value Pr(>|z|) (Intercept) 2.718 0.128 21.3 9.4e-101 log(normalSale) 0.202 0.031 6.5 9.5e-11 store10 1.577 0.114 13.9 1.1e-43 store100 0.768 0.125 6.2 7.0e-10 store101 0.583 0.131 4.5 8.2e-06 store102 -0.029 0.149 -0.2 8.4e-01 ``` --- ## Exponential families Joint density `\begin{equation} f(\mathbf{y} \mid \theta) = \prod_{i=1}^n \frac{1}{\varphi_i(\theta)} e^{\theta^T t_i(y_i)} = e^{\theta^T \sum_{i=1}^n t_i(y_i) - \sum_{i=1}^n \kappa_i(\theta)} \end{equation}` where `\(\varphi_i(\theta) = \int e^{\theta^T t_i(u)} \mu_i(\mathrm{d}u)\)` and `\(\kappa_i(\theta) = \log(\varphi_i(\theta)).\)` The log-likelihood is `$$\ell(\theta) = \theta^T t(\mathbf{y}) - \kappa(\theta)$$` where `$$t(\mathbf{y}) = \sum_{i=1}^m t_i(y_i) \quad \text{and} \quad \kappa(\theta) = \sum_{i=1}^m \log \varphi_i(\theta).$$` The gradient is `$$\nabla \ell(\theta) = t(\mathbf{y}) - \nabla \kappa(\theta).$$` --- ## The Poisson regression model For the Poisson regression model we find that `$$t(\mathbf{y}) = \sum_{i=1}^n x_i y_i = \mathbf{X}^T \mathbf{y} \quad \text{and} \quad \kappa(\beta) = \sum_{i=1}^n e^{x_i^T \beta}$$` Moreover, `$$\nabla \kappa(\beta) = \sum_{i=1}^n x_i e^{x_i^T \beta} = \mathbf{X}^T \mathbf{w}(\beta)$$` where $$ \mathbf{w}(\beta) = \exp(\mathbf{X}^T \beta)$$ with `\(\exp\)` applied coordinatewisely. --- ## Model matrix and `glm.fit()` ```r X <- model.matrix( sale ~ log(normalSale) + store, data = vegetables ) y <- vegetables$sale ``` ```r system.time( pois_fit <- glm.fit(X, y, family = poisson()) ) ``` ``` user system elapsed 0.540 0.008 0.550 ``` --- ## `glm.fit()` ```r pois_fit$iter ``` ``` [1] 5 ``` ```r pois_fit$converged ``` ``` [1] TRUE ``` The objective function is not computed directly as the negative log-likelihood but as twice the negative log-likelihood plus an additive constant. For comparisons it's easiest to compute deviance differences. ```r pois_fit0 <- glm.fit(X[, 1], y, family = poisson()) pois_fit0$deviance - pois_fit$deviance ``` ``` [1] 42592.68 ``` --- ## `glm.fit()` ```r pois_fit <- glm.fit(X, y, family = poisson(), control = list(trace = TRUE)) ``` ``` Deviance = 9410.28 Iterations - 1 Deviance = 8595.537 Iterations - 2 Deviance = 8584.647 Iterations - 3 Deviance = 8584.638 Iterations - 4 Deviance = 8584.638 Iterations - 5 ``` The stopping criterion is *small relative descent* in terms of deviance, that is `$$|\mathrm{deviance}_{n-1} - \mathrm{deviance}_n| < \varepsilon (\mathrm{deviance}_n + 0.1)$$` with the default tolerance parameter `\(\varepsilon = 10^{-8}\)` and with a maximal number of iterations set to 25. --- ## Implementation of objective function ```r t_map <- drop(crossprod(X, y)) H <- function(beta) drop(sum(exp(X %*% beta)) - beta %*% t_map) / nrow(X) grad_H <- function(beta) (drop(crossprod(X, exp(X %*% beta))) - t_map) / nrow(X) ``` Recomputing the deviance difference ```r 2 * nrow(X) * (H(c(pois_fit0$coefficients, rep(0, 352))) - H(pois_fit$coefficients)) ``` ``` [1] 42592.68 ``` and the value of the negative log-likelihood ```r H(pois_fit$coefficients) ``` ``` [1] -128.5895 ``` --- ## Using `optim()` with conjugate gradient ```r system.time( pois_CG <- optim( rep(0, length=ncol(X)), H, grad_H, method = "CG", control = list(maxit = 10000) ) ) ``` ``` user system elapsed 10.989 0.068 11.090 ``` ```r pois_CG$value ``` ``` [1] -128.5895 ``` ```r pois_CG$counts ``` ``` function gradient 10858 4641 ``` --- ## Using `optim()` with BFGS ```r system.time( pois_BFGS <- optim( rep(0, length=ncol(X)), H, grad_H, method = "BFGS", control = list(maxit = 10000) ) ) ``` ``` user system elapsed 0.262 0.001 0.264 ``` ```r pois_BFGS$value ``` ``` [1] -128.5894 ``` ```r pois_BFGS$counts ``` ``` function gradient 126 114 ``` --- ## Using sparse matrices ```r library(Matrix) ``` ```r X <- Matrix(X) ``` ```r system.time( pois_BFGS_sparse <- optim( rep(0, length=ncol(X)), H, grad_H, method = "BFGS", control = list(maxit = 10000) ) ) ``` ``` user system elapsed 0.096 0.003 0.099 ``` ```r range(pois_BFGS$par - pois_BFGS_sparse$par) ``` ``` [1] 0 0 ```