class: center, middle, inverse, title-slide # Computational Statistics
Introduction ### Niels Richard Hansen ### September 1, 2020 --- ## Computational Statistics ** Problems ** * Compute nonparametric estimates. * Compute integrals (and probabilities) `\(\ \ \ \ \ \ E f(X) = \int f(X) \ dP \ \ \ \left(P(X \in A) = \int_{(X \in A)} \ dP\right)\)`. * Optimize the likelihood. ** Methods ** * Numerical linear algebra. * Monte Carlo integration. * EM-algorithm, stochastic gradient. --- ## Example, amino acid angles <img src="PhiPsi_creative.jpg" width="400" height="400" style="display: block; margin: auto;" /> --- ## Ramachandran plot .two-column-left[ ```r qplot(phi, psi, data = phipsi) ``` <!-- --> ] .two-column-right[ ```r qplot(phi, psi, data = phipsi2) ``` <!-- --> ] --- ## Example, amino acid angles .two-column-left[ ```r hist(phipsi$phi, prob = TRUE) rug(phipsi$phi) ``` <!-- --> ] .two-column-right[ ```r hist(phipsi$psi, prob = TRUE) rug(phipsi$psi) ``` <!-- --> ] --- ## Example, amino acid angles .two-column-left[ ```r lines(density(phipsi$phi), col = "red", lwd = 2) ``` <!-- --> ] .two-column-right[ ```r lines(density(phipsi$psi), col = "red", lwd = 2) ``` <!-- --> ] --- ## Statistical topics of the course * **Smoothing:** what does `density` do? + How do we compute nonparametric estimators? + How do we choose tuning parameters? * **Simulation:** how do we efficiently simulate from a target distribution? + How do we assess results from Monte Carlo methods? + What if we cannot compute the density? * **Optimization:** how do we compute the MLE? + What if we cannot compute the likelihood? + How to deal with very large data sets? --- ## Computational topics of the course * **Implementation**: writing statistical software. + R data structures and functions + S3 object oriented programming * **Correctness**: does the implementation do the right thing? + testing + debugging + accuracy of numerical computations * **Efficiency**: minimize memory and time usage. + benchmark code for comparison + profile code for identifying bottlenecks + optimize code (Rcpp) --- ## Prerequisites in R Good working knowledge of: * Data structures (vectors, lists, data frames). * Control structures (loops, if-then-else). * Function calling. * Interactive and script usages (`source`) of R. * You don't need to be an experienced programmer. --- ## Assignments The 8 assignments covering 4 topics will form the backbone for the course. Many lectures and practicals will be build around these assignments. You all need to register (in Absalon) for the presentation of one assignment solution. * Presentations are done in groups of two-three persons. * On four Wednesdays there will be presentations with discussion and feedback. * For the exam you need to prepare four *individual* presentations, one for each topic assignment. --- ## Exam For each of the four topics you choose one out of two assignments to prepare for the exam. * The exam assessment is based on your presentation *on the basis of the entire content of the course*. * Get started immediately and work continuously on the assignments as the course progresses. --- ## R programming R functions are fundamental. They don't do anything before they are called and the call is evaluated. An R function takes a number of *arguments*, and when a function call is evaluated it computes a *return value*. An R program consists of a hierarchy of function calls. When the program is executed, function calls are evaluated and replaced by their return values. Implementations of R functions are collected into source files, which can be organized into R packages. An R script (or R Markdown document) is a collection of R function calls, which, when evaluated, compute a desired result. R programming includes activities at many different levels of sophistication and abstraction. --- ## R programming R programming can be * writing R scripts (for data analysis), interactively running scripts and building reports or other output. * writing R functions to ease repetitive tasks (avoid copy-paste), to abstract computations, to improve overview by modularization etc. * developing R packages to ease distribution, to improve usability and documentation, to clarify dependencies etc.