Exercises 1.4-1.10 solution

class: center, middle, inverse, title-slide

# Exercises 1.4-1.10 <br> solution
### Niels Richard Hansen
### September 4, 2019

---

## Problem 1.4

```r
infrared <- read.table("../data/infrared.txt", header = TRUE)
F12 <- infrared$F12
hist(log(F12))
```

---
## Problem 1.4

```r
hist(log(F12), breaks = 5)
```

---
## Problem 1.4

```r
hist(log(infrared$F12), 20); rug(log(infrared$F12))
hist(log(infrared$F12), 100); rug(log(infrared$F12))
```

---
## Problem 1.4

We observe:

* The number of cells changes the impression of the distribution.

* The default gives relatively few and large cells.

* The default uses *Sturges' formula*: the number of cells for `$n$` observations is `$$\lceil \log_2(n) + 1 \rceil.$$`

---
## Problem 1.5

```r
myBreaks <- function(x, h = 5) {
  x <- sort(x)
  ux <- unique(x)
  i <- seq(from = 1, to = length(ux), by = h)
  ux[i]
}
```

```r
myBreaks(c(1, 3, 2, 5, 10, 11, 1, 1, 3), 2)
```

```
[1]  1  3 10
```

Note, we missed the largest value 11 in `x`.

---
## Problem 1.5, correction

```r
myBreaks <- function(x, h = 5) {
  x <- sort(x)
  ux <- unique(x)
  i <- seq(from = 1, to = length(ux), by = h)
* if (i[length(i)] < length(ux))
*   i[length(i) + 1] <- length(ux)
  ux[i]
}
```

```r
myBreaks(c(1, 3, 2, 5, 10, 11, 1, 1, 3), 2)
```

```
[1]  1  3 10 11
```

---
## Problem 1.5

```r
hist(log(F12), myBreaks); rug(log(F12))
hist(log(F12), function(x) myBreaks(x, 40)); rug(log(F12))
```

---
## Problem 1.6, first version

```r
myBreaks <- function(x, h = 5) {
  x <- sort(x)
  breaks <- xb <- x[1]
  k <- 1
  for(i in seq_along(x)[-1]) {
    if (k < h) {
      k <- k + 1
    } else {
      if (xb < x[i - 1] && x[i - 1] < x[i]) {
        xb <- x[i - 1]
        breaks <- c(breaks, xb)
        k <- 1
      }
    }
  }
  breaks[length(breaks) + 1] <- x[length(x)]
  breaks
}
```

---
## Problem 1.6, testing

```r
myBreaks(1:11, 1)
```

```
 [1]  1  2  3  4  5  6  7  8  9 10 11
```

```r
myBreaks(1:2, 1)
```

```
[1] 1 2
```

```r
myBreaks(1:10, 5)
```

```
[1]  1  5 10
```

```r
myBreaks(1:11)
```

```
[1]  1  5 10 11
```

Last interval doesn't have five elements!

---
## Problem 1.6, second version

```r
myBreaks <- function(x, h = 5) {
  x <- sort(x)
  breaks <- xb <- x[1]
  k <- 1
  for(i in seq_along(x)[-1]) {
    if (k < h) {
      k <- k + 1
    } else {
      if (xb < x[i - 1] && x[i - 1] < x[i]) {
        xb <- x[i - 1]
        breaks <- c(breaks, xb)
        k <- 1
      }
    }
  }
* last <- length(breaks)
* if(k == min(h, length(x) - 1)) last <- last + 1
* breaks[last] <- x[length(x)]
  breaks
}
```

---
## Problem 1.6, testing again

```r
myBreaks(1:11, 1)
```

```
 [1]  1  2  3  4  5  6  7  8  9 10 11
```

```r
myBreaks(1:2, 1)
```

```
[1] 1 2
```

```r
myBreaks(1:10, 5)
```

```
[1]  1  5 10
```

```r
myBreaks(1:11)
```

```
[1]  1  5 11
```

---
## Problem 1.6, more testing

```r
myBreaks(c(1, 3, 2, 5, 10, 11, 1, 1, 3), 2)
```

```
[1]  1  2  3 11
```

```r
test <- c(3, 1, 4.2, 3, 2, 4.2, 3, 1, 2, 4, 5, 3, 3.1, 3, 4.3)
myBreaks(test, 2)
```

```
[1] 1.0 2.0 3.0 4.0 4.2 5.0
```

```r
myBreaks(test, 3)
```

```
[1] 1 2 3 5
```

```r
sort(test)
```

```
 [1] 1.0 1.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.1 4.0 4.2 4.2 4.3 5.0
```

---
## Problem 1.6, testing on data

```r
hh <- seq(1, 100, 1)
breaks <- lapply(hh, function(h) myBreaks(log(F12), h))
counts <- lapply(breaks, 
                 function(b) hist(log(F12), b, plot = FALSE)$counts)
any(sapply(breaks, function(x) any(duplicated(x))))
```

```
[1] FALSE
```

```r
all(sapply(seq_along(hh), function(i) all(counts[[i]] >= hh[i])))
```

```
[1] TRUE
```

First there is a test for duplicated breaks, and second there is a test for the number 
of observations in each interval to be larger than `$h$`.

---
## Problem 1.6

```r
hist(log(F12), myBreaks); rug(log(F12))
hist(log(F12), function(x) myBreaks(x, 40)); rug(log(F12))
```

---
## Problem 1.7

We define a `myHist` function as requested.

```r
myHist <- function(h, ...) 
  hist(log(F12), function(x) myBreaks(x, h), ...)
```

---
## Problem 1.7

```r
myHist(30)
```

---
## Problem 1.7, testing

```r
myHist()
```

```
Error in myBreaks(x, h): argument "h" is missing, with no default
```

---
## Problem 1.7, testing

```r
myHist(h = 5, freq = TRUE) 
```

```
Warning in plot.histogram(r, freq = freq1, col = col, border = border,
angle = angle, : the AREAS in the plot are wrong -- rather use 'freq =
FALSE'
```

---
## Problem 1.7, testing

```r
myHist(h = 0) ## Result depends on implementation of 'myBreaks'
```

---
## Problem 1.7, testing

```r
rm(F12)
```

```r
myHist(30)
```

```
Error in hist(log(F12), function(x) myBreaks(x, h), ...): object 'F12' not found
```

---
## Problem 1.8

```r
environment(myHist)
```

```
<environment: R_GlobalEnv>
```

```r
environment(myHist) <- new.env()
assign("F12", infrared$F12, environment(myHist))
```

```r
ls()
```

```
[1] "breaks"   "counts"   "hh"       "infrared" "myBreaks" "myHist"  
[7] "test"    
```

```r
ls(environment(myHist))
```

```
[1] "F12"
```

---
## Problem 1.8

```r
myHist(40)
```

---
## Problem 1.9

```r
histFactory <- function(x) {
  function(h, ...) 
     hist(x, myBreaks(x, h), ...)
}
F12 <- infrared$F12
myHist2 <- histFactory(log(F12))
```

---
## Problem 1.9

```r
myHist2(40)
```

---
## Problem 1.9

```r
myHist2 <- histFactory(log(F12))
rm(F12)
ls()
```

```
[1] "breaks"      "counts"      "hh"          "histFactory" "infrared"   
[6] "myBreaks"    "myHist"      "myHist2"     "test"       
```

```r
ls(environment(myHist2))
```

```
[1] "x"
```

---
## Problem 1.9

```r
myHist2(40)
```

```
Error in hist(x, myBreaks(x, h), ...): object 'F12' not found
```

Perhaps surprisingly, a reference to `F12` was still around, and removing `F12` from 
the global environment resulted in an error.

```r
substitute(x, environment(myHist2))
```

```
log(F12)
```

The explantion is that the R expression `log(F12)` was never evaluated. (It
has nothing to do with the log-transformation.)

---
## Problem 1.9 tricky point

```r
histFactory <- function(x) {
  ## force evaluation of argument `x`
* force(x)
  function(h, ...) 
     hist(x, myBreaks(x, h), ...)
}
F12 <- infrared$F12
myHist2 <- histFactory(log(F12))
```

Due to *lazy evaluation* arguments are not evaluated until used.

The value of `F12` is, without `force(x)` as above, not looked up until the first call of `myHist2`.

If `F12` is removed before the first call of `myHist2`, the call will result in an error!

---
## Problem 1.9

```r
rm(F12)
myHist2(40)
```

---
## Summary

* A function can look up variables in its *enclosing environment*.

* The enclosing environment is suitable for storing *local variables*.

* The default enclosing environment is where the function is defined.

* The global environment (the workspace) is not suitable for *local variables*, and dependence 
upon *global variables* should be avoided.

* The enclosing environment can be set and populated manually (using `new.env` and `assign`).

* A *function factory* like `histFactory` is a systematic way of creating enclosing environments 
for local variables.

---
## Problem 1.10

```r
tmp <- myHist(10, plot = FALSE)
typeof(tmp)
```

```
[1] "list"
```

```r
class(tmp)
```

```
[1] "histogram"
```

---
## Problem 1.10, internal structure

```r
## 'str' gives an overview of the internal structure of an R object 
str(tmp) 
```

```
List of 6
 $ breaks  : num [1:53] -3 -2.41 -2.21 -1.9 -1.66 ...
 $ counts  : int [1:52] 11 10 12 12 15 10 19 11 18 10 ...
 $ density : num [1:52] 0.0298 0.0794 0.0616 0.0808 0.2387 ...
 $ mids    : num [1:52] -2.7 -2.31 -2.05 -1.78 -1.61 ...
 $ xname   : chr "log(F12)"
 $ equidist: logi FALSE
 - attr(*, "class")= chr "histogram"
```

---
## Problem 1.10, plotting

```r
plot(tmp, col = "red") 
```

---
## Problem 1.10, getting help

You can find documentation for `plot` using e.g.

```r
?plot
```

However, this will be uninformative on how an object of class histogram is plotted. Try instead

```r
?plot.histogram
```

This will give the documentation for the plot method for objects of class histogram.