University of Copenhagen

Estrogen Data II
A gene set test

Gordon Smyth
6 November 2007

1. Aims

We continue this case study by considering significance tests using gene sets.

2. Required data

The estrogen data set is required for this lab. Set your R working directory to be ku2007limma/data/estrogen. If you saved your fitted model object in the previous exercise, then you can load it now to get started. Otherwise you will need to recreate it.

> library(limma)
> setwd("estrogen")
> load("estrogen.rdata")

3. Gene Set Tests

In this lab, we move beyond the analysis of individual genes, and consider sets of genes in microarray experiments. Another approach is to form gene sets based on a priori knowledge of common biological features shared by the genes. We consider a particular approach called gene set enrichment. We begin with a known set of genes and then test whether this set as a whole is differentially expressed in a microarray experiment. This type of test is useful when comparing one's microarray data with that of previous authors who have performed similar microarray experiments, because the lists of most differentially expressed genes reported by the previous authors can be regarded as a "gene set" and tested to determine whether the genes are also differentially expressed in the current context.

Gene set testing was introduced by Mootha et al (2003) and Lamb et al (2003). Mootha et al define the concept of a gene set enrichment test. For a given set of genes, one can test whether the set as a whole is up-regulated, down-regulated or differentially expressed with individual genes possibly going in either direction. Sometimes performing the traditional differential expression analysis of individual genes will yield no statistically significant results, but there may be stronger evidence for differential expression of gene sets.

Now we turn our attention to tests for differential expression involving a set of genes. Mootha et al. [5] and Lamb et al. [6] made this methods popular in 2003. We will use a "gene set enrichment test", which is closely based on the one defined by Mootha et al. The gene set test can be used to test whether previous author's lists of differentially expressed genes are also differentially expressed in a current experiment similar to that of the previous authors. Another possible application is to try to find differential expression in microarray experiments which show no strong differential expression when testing for individual differentially expressed genes, but they might show more evidence of differential expression when testing a predefined set of genes. Defining a useful gene set for this sort of analysis is not always trivial. One possibility is to use a set of genes which share common gene ontologies, i.e. choose a set of genes which are all associated with GOs below a certain node in the GO DAG (Directed Acyclic Graph). We will begin with some artificial examples to illustrate the concept of gene set tests with a small number of made-up t-statistics. Then we will use two sets of genes thought to be regulated by the Estrogen Receptor (ERalpha) to demonstrate testing for differential expression of gene sets in the Estrogen data set.

We will again use the Estrogen data set through the fit2 linear model fit object.

We will use two sets of genes which are thought to be ER-regulated, i.e. regulated by the Estrogen Receptor alpha. The first set, from Jin et al (2003), contains genes which have been experimentally verified to be ER-regulated.

This gene set should be differentially expressed between the breast cancer cells with estrogen reintroduced and the serum-starved breast cancer cells with no estrogen, because in the cells reintroduced to estrogen, the estrogen receptors (ERs) will bind the estrogen and as a result become activated, gaining the ability to regulate gene expression in the cells, hence resulting in differential expression between the cells with and without estrogen.

The data required for this exercise is available from ERgenes.txt. Read the gene lists into R:

> library(hgu95av2)
> library(annotate)
> fit2$genes$LocusLink <- getLL(fit2$genes$ID, "hgu95av2")
> ERgenes <- read.delim("ERgenes.txt",as.is=TRUE)
> inSet <- fit2$genes$LocusLink %in% ERgenes$LocusLink
> geneSetTest(inSet, fit2$t[,"E10"], alt="mixed")
> geneSetTest(inSet, fit2$t[,"E10"], alt="up")
> geneSetTest(inSet, fit2$t[,"E48"], alt="mixed")
> geneSetTest(inSet, fit2$t[,"E48"], alt="up")

We can see the results graphically:

> boxplot(fit2$t[,"E10"] ~ inSet)
> boxplot(fit2$t[,"E48"] ~ inSet)
> boxplot(fit2$t[,"Time"] ~ inSet)

Acknowledgements

Thanks to Hui Tang and Terry Speed for suggesting the known ER-regulated gene sets of Jin et al.

References on gene sets

  1. Jin VX, Leu YW, Liyanarachchi S, Sun H, Fan M, Nephew KP, Huang T.H. and Davuluri DV (2004). Identifying estrogen receptor alpha target genes using integrated computational genomics and chromatin immunoprecipitation microarray. Nucleic Acids Research 32, 6627-6635
  2. Lamb J, Ramaswamy S, Ford HL, Contreras B, Martinez RV, Kittrell FS, Zahnow CA, Patterson N, Golub TR, Ewen ME (2003). A mechanism of cyclin D1 action encoded in the patterns of gene expression in human cancer. Cell 114, 323-34.
  3. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle, M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman BM, Lander ES, Hirschhorn JN, Altshuler D, and Groop LC (2003). PGC-1alpha responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34, 267-73.

Return to list of Exercises