class: center, middle, inverse, title-slide # Exercises 1.4-1.10
solution ### Niels Richard Hansen ### September 4, 2019 --- ## Problem 1.4 ```r infrared <- read.table("../data/infrared.txt", header = TRUE) F12 <- infrared$F12 hist(log(F12)) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.4 ```r hist(log(F12), breaks = 5) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.4 ```r hist(log(infrared$F12), 20); rug(log(infrared$F12)) hist(log(infrared$F12), 100); rug(log(infrared$F12)) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.4 We observe: * The number of cells changes the impression of the distribution. * The default gives relatively few and large cells. * The default uses *Sturges' formula*: the number of cells for `\(n\)` observations is `$$\lceil \log_2(n) + 1 \rceil.$$` --- ## Problem 1.5 ```r myBreaks <- function(x, h = 5) { x <- sort(x) ux <- unique(x) i <- seq(from = 1, to = length(ux), by = h) ux[i] } ``` ```r myBreaks(c(1, 3, 2, 5, 10, 11, 1, 1, 3), 2) ``` ``` [1] 1 3 10 ``` Note, we missed the largest value 11 in `x`. --- ## Problem 1.5, correction ```r myBreaks <- function(x, h = 5) { x <- sort(x) ux <- unique(x) i <- seq(from = 1, to = length(ux), by = h) * if (i[length(i)] < length(ux)) * i[length(i) + 1] <- length(ux) ux[i] } ``` ```r myBreaks(c(1, 3, 2, 5, 10, 11, 1, 1, 3), 2) ``` ``` [1] 1 3 10 11 ``` --- ## Problem 1.5 ```r hist(log(F12), myBreaks); rug(log(F12)) hist(log(F12), function(x) myBreaks(x, 40)); rug(log(F12)) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> <img src="Ex_Solution_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.6, first version ```r myBreaks <- function(x, h = 5) { x <- sort(x) breaks <- xb <- x[1] k <- 1 for(i in seq_along(x)[-1]) { if (k < h) { k <- k + 1 } else { if (xb < x[i - 1] && x[i - 1] < x[i]) { xb <- x[i - 1] breaks <- c(breaks, xb) k <- 1 } } } breaks[length(breaks) + 1] <- x[length(x)] breaks } ``` --- ## Problem 1.6, testing ```r myBreaks(1:11, 1) ``` ``` [1] 1 2 3 4 5 6 7 8 9 10 11 ``` ```r myBreaks(1:2, 1) ``` ``` [1] 1 2 ``` ```r myBreaks(1:10, 5) ``` ``` [1] 1 5 10 ``` ```r myBreaks(1:11) ``` ``` [1] 1 5 10 11 ``` Last interval doesn't have five elements! --- ## Problem 1.6, second version ```r myBreaks <- function(x, h = 5) { x <- sort(x) breaks <- xb <- x[1] k <- 1 for(i in seq_along(x)[-1]) { if (k < h) { k <- k + 1 } else { if (xb < x[i - 1] && x[i - 1] < x[i]) { xb <- x[i - 1] breaks <- c(breaks, xb) k <- 1 } } } * last <- length(breaks) * if(k == min(h, length(x) - 1)) last <- last + 1 * breaks[last] <- x[length(x)] breaks } ``` --- ## Problem 1.6, testing again ```r myBreaks(1:11, 1) ``` ``` [1] 1 2 3 4 5 6 7 8 9 10 11 ``` ```r myBreaks(1:2, 1) ``` ``` [1] 1 2 ``` ```r myBreaks(1:10, 5) ``` ``` [1] 1 5 10 ``` ```r myBreaks(1:11) ``` ``` [1] 1 5 11 ``` --- ## Problem 1.6, more testing ```r myBreaks(c(1, 3, 2, 5, 10, 11, 1, 1, 3), 2) ``` ``` [1] 1 2 3 11 ``` ```r test <- c(3, 1, 4.2, 3, 2, 4.2, 3, 1, 2, 4, 5, 3, 3.1, 3, 4.3) myBreaks(test, 2) ``` ``` [1] 1.0 2.0 3.0 4.0 4.2 5.0 ``` ```r myBreaks(test, 3) ``` ``` [1] 1 2 3 5 ``` ```r sort(test) ``` ``` [1] 1.0 1.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.1 4.0 4.2 4.2 4.3 5.0 ``` --- ## Problem 1.6, testing on data ```r hh <- seq(1, 100, 1) breaks <- lapply(hh, function(h) myBreaks(log(F12), h)) counts <- lapply(breaks, function(b) hist(log(F12), b, plot = FALSE)$counts) any(sapply(breaks, function(x) any(duplicated(x)))) ``` ``` [1] FALSE ``` ```r all(sapply(seq_along(hh), function(i) all(counts[[i]] >= hh[i]))) ``` ``` [1] TRUE ``` First there is a test for duplicated breaks, and second there is a test for the number of observations in each interval to be larger than `\(h\)`. --- ## Problem 1.6 ```r hist(log(F12), myBreaks); rug(log(F12)) hist(log(F12), function(x) myBreaks(x, 40)); rug(log(F12)) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.7 We define a `myHist` function as requested. ```r myHist <- function(h, ...) hist(log(F12), function(x) myBreaks(x, h), ...) ``` --- ## Problem 1.7 ```r myHist(30) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.7, testing ```r myHist() ``` ``` Error in myBreaks(x, h): argument "h" is missing, with no default ``` --- ## Problem 1.7, testing ```r myHist(h = 5, freq = TRUE) ``` ``` Warning in plot.histogram(r, freq = freq1, col = col, border = border, angle = angle, : the AREAS in the plot are wrong -- rather use 'freq = FALSE' ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.7, testing ```r myHist(h = 0) ## Result depends on implementation of 'myBreaks' ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.7, testing ```r rm(F12) ``` ```r myHist(30) ``` ``` Error in hist(log(F12), function(x) myBreaks(x, h), ...): object 'F12' not found ``` --- ## Problem 1.8 ```r environment(myHist) ``` ``` <environment: R_GlobalEnv> ``` ```r environment(myHist) <- new.env() assign("F12", infrared$F12, environment(myHist)) ``` ```r ls() ``` ``` [1] "breaks" "counts" "hh" "infrared" "myBreaks" "myHist" [7] "test" ``` ```r ls(environment(myHist)) ``` ``` [1] "F12" ``` --- ## Problem 1.8 ```r myHist(40) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.9 ```r histFactory <- function(x) { function(h, ...) hist(x, myBreaks(x, h), ...) } F12 <- infrared$F12 myHist2 <- histFactory(log(F12)) ``` --- ## Problem 1.9 ```r myHist2(40) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.9 ```r myHist2 <- histFactory(log(F12)) rm(F12) ls() ``` ``` [1] "breaks" "counts" "hh" "histFactory" "infrared" [6] "myBreaks" "myHist" "myHist2" "test" ``` ```r ls(environment(myHist2)) ``` ``` [1] "x" ``` --- ## Problem 1.9 ```r myHist2(40) ``` ``` Error in hist(x, myBreaks(x, h), ...): object 'F12' not found ``` Perhaps surprisingly, a reference to `F12` was still around, and removing `F12` from the global environment resulted in an error. ```r substitute(x, environment(myHist2)) ``` ``` log(F12) ``` The explantion is that the R expression `log(F12)` was never evaluated. (It has nothing to do with the log-transformation.) --- ## Problem 1.9 tricky point ```r histFactory <- function(x) { ## force evaluation of argument `x` * force(x) function(h, ...) hist(x, myBreaks(x, h), ...) } F12 <- infrared$F12 myHist2 <- histFactory(log(F12)) ``` Due to *lazy evaluation* arguments are not evaluated until used. The value of `F12` is, without `force(x)` as above, not looked up until the first call of `myHist2`. If `F12` is removed before the first call of `myHist2`, the call will result in an error! --- ## Problem 1.9 ```r rm(F12) myHist2(40) ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-34-1.png" style="display: block; margin: auto;" /> --- ## Summary * A function can look up variables in its *enclosing environment*. * The enclosing environment is suitable for storing *local variables*. * The default enclosing environment is where the function is defined. * The global environment (the workspace) is not suitable for *local variables*, and dependence upon *global variables* should be avoided. * The enclosing environment can be set and populated manually (using `new.env` and `assign`). * A *function factory* like `histFactory` is a systematic way of creating enclosing environments for local variables. --- ## Problem 1.10 ```r tmp <- myHist(10, plot = FALSE) typeof(tmp) ``` ``` [1] "list" ``` ```r class(tmp) ``` ``` [1] "histogram" ``` --- ## Problem 1.10, internal structure ```r ## 'str' gives an overview of the internal structure of an R object str(tmp) ``` ``` List of 6 $ breaks : num [1:53] -3 -2.41 -2.21 -1.9 -1.66 ... $ counts : int [1:52] 11 10 12 12 15 10 19 11 18 10 ... $ density : num [1:52] 0.0298 0.0794 0.0616 0.0808 0.2387 ... $ mids : num [1:52] -2.7 -2.31 -2.05 -1.78 -1.61 ... $ xname : chr "log(F12)" $ equidist: logi FALSE - attr(*, "class")= chr "histogram" ``` --- ## Problem 1.10, plotting ```r plot(tmp, col = "red") ``` <img src="Ex_Solution_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" /> --- ## Problem 1.10, getting help You can find documentation for `plot` using e.g. ```r ?plot ``` However, this will be uninformative on how an object of class histogram is plotted. Try instead ```r ?plot.histogram ``` This will give the documentation for the plot method for objects of class histogram.