Statistical Analysis of
Microarray Expression Data with
R and Bioconductor

Organizers and sponsors:

Niels Richard Hansen

The Graduate School in Mathematics and Applications

Department of Mathematical Sciences
University of Copenhagen
Universitetsparken 5
DK-2100 Copenhagen Ø

Danish Graduate School in Biostatistics
Institute of Public Health
University of Copenhagen

The Bioinformatics Centre
University of Copenhagen

Time and Place

Last Minute Info

  • Hand-out material
    A link for the factDesign.pdf vignette in the Limma lab, estrogen data, is obsolete. Instead, take a look at the factDesign package.
  • Access to wireless network. The room is covered by a wireless network. You will need an eduroam account from your home university. You should consult your own institution for guides on how to set up your computer with an eduroam account. You may also take a look at our local guide for the Department of Mathematical Sciences, which in addition provides links to further information.

    If you don't have an eduroam account we have a guest account that you will get upon arrival. There is a pdf guide (Danish) on how to set up XP for using the guest account.

    If everything else fail we should be able to connect at least some of the computers with cables.

Time and Place

The course takes place from Monday, November 5 to Friday, November 9, 2007.

The lectures and exercises will take place at the

University of Copenhagen
Copenhagen Biocenter
Ole Maaløes Vej 5
DK-2200 Copenhagen Ø
Rooms 4-0-24/4-0-5/4-0-13
The course is organized with a mix of lectures and supervised computer exercise session each day.

Course Description

The course aims to give Ph.D.-students in statistics as well as other Ph.D.-students a good introduction to microarray data analysis using R and Bioconductor. This is achieved by inviting some of the leading researchers in statistical analysis of microarray data and developers of R-packages to give the main lectures and combine this with hands-on computer exercises.
The following distinguished experts in the use of R and Bioconductor for the analysis of microarray expression data have agreed to present the five main morning lectures: The tentative list of topics that will be covered looks as follows:
The computer lap sessions will be supervised by
  • Kasper Daniel-Hansen, Ph.D.-student, UC-Berkeley
  • Niels Richard Hansen, Associate Professor, University of Copenhagen
  • Monday 5/11. Introduction to microarray data and the biological questions, data-formats and representations in R, S4-classes (Wolfgang Huber).

    Introduction to microarrays:
    - technologies, data formats, the data representation in Bioconductor, preprocessing and normalisation, quality assessment

    Introduction to R and Bioconductor

    Simple differential expression

  • Tuesday 6/11. Linear models and Limma (Gordon Smyth).

    Morning Lectures
    - Introduction, Background correction, Moderated t-tests, Linear models I

    Morning Practical
    - Weaver data, Estrogen data I

    Afternoon lectures
    - Linear models II, Moderated F-tests, Multiple testing, Gene set tests, Duplicate spots

    Afternoon practical
    - Estrogen data II, SAHA depsi data

  • Wednesday 7/11. Machine Learning and microarray data analysis (Vincent Carey).

    Clustering and predictive modeling with high-throughput biological experiments: Basic concepts and tools
    - Motivating examples: Clustering and the yeast cell cycle; Golub's paper; some aCGH data
    - Clustering: definitions and algorithms; distances
    - Predictive modeling: definitions and algorithms

    Evaluation of methods and models. Simple and realistic simulations; example of B. Frey's message-based clustering algorithm (Science 2007). Michiels Lancet 2005. Molinaro, Simon et al. Bioinformatics 2005. Scope and limits of cross-validation.

  • Thursday 8/11. Graphs and dependence structures (Denise Scholtens).

    Graphs and Dependence
    Graphs consisting of nodes and edges are commonly used structures to represent high-throughput data in systems biology research. Often nodes are used to represent genes and/or proteins and edges are used to represent relationships among them. In this series of lectures and tutorials, we will learn about object classes in R specifically designed for handling graph-type data, and the collection of graph traversal algorithms and visualization capabilities available in Bioconductor. We will study applications of graphs to microarray and other high-throughput data, and will explore the benefits of placing commonly performed microarray data analyses into a graph theoretic context.

  • Friday 9/11. Querying external databases and metadata and generating reports (Steffen Durinck).

    Database mining with biomaRt
    - The BioMart software suite, bioMart packages for R
    - Automatic report generation using Sweave


The participants are expected to have prior experience with R and statistics in general, and an interest in the biological questions. The participants are also expected to bring their own laptop for the exercises. We require that all participants install the latest version of R (R-2.6.0, released October 3, 2007) and the latest version of Bioconductor (Bioc-2.1, scheduled to be released October 5, 2007). The precise details for installation of R and the required packages can be found here: Installation instructions.

All participants will receive a certificate of participation.

ECTS-credits: 5


Date Time Subject Room
Monday 5/11 8.45-9.00 Welcome and general information
9.00-10.45 Introduction to R and Bioconductor
11.00-12.30 Computer lab 4-0-5/4-0-13
13.15-15.00 Introduction to microarray normalisation and quality assessment
15.15-17.00 Computer lab 4-0-5/4-0-13

Tuesday 6/11 9.00-10.45 Limma
Lecture I
11.00-12.30 Computer lab 4-0-5/4-0-13
13.15-15.00 Limma
Lecture II
15.15-17.00 Computer lab 4-0-5/4-0-13

Wednesday 7/11 9.00-10.45 Clustering and predictive modeling with high-throughput biological experiments: Basic concepts and tools 4-0-24
11.00-12.30 Computer lab:
Bioconductor facilities for creating and interpreting cluster analyses.
13.15-14.30 Evaluation of methods and models. 4-0-24
14.45-16.30 Computer lab:
General exercises in machine learning with genomic data.

Thursday 8/11 9.00-10.45 Graphs and Dependence. Lecture I. 4-0-24
11.00-12.30 Computer lab. 4-0-5/4-0-13
13.15-14.30 Graphs and dependence. Lecture II. 4-0-24
14.45-16.30 Computer lab.

Friday 9/11 9.00-10.00 biomaRt
Lecture I
10.15-11.15 Computer lab
11.30-12.30 biomaRt
Lecture II
13.15-14.15 Computer Lab 4-0-5/4-0-13
14.30-15.30 Report generation

Registration and Payment

The registration has been closed

The registration fee is:

Danish Ph.D.-students: No registration fee.
Academia: DKK 1000,-
Non-academia: DKK 3000,-

which, among other things, will cover coffee/tea and snacks, lunch, and hand-out material. Non-Ph.D.-students can apply for a reduced fee, see the registration form.

The deadline for payment is

October 17, 2007

The details for payment are as follows:

Payment to:

Department of Mathematical Sciences
University of Copenhagen
Universitetsparken 5
DK-2100 Copenhagen Ø

For Danish bank transfers:

Account number: 3001 4115212125

For international bank transfer:

IBAN-code: DK41 3000 4115212125
Bank Address: Danske Bank A/S, Holmens Kanal 2, 1090 København K, Denmark

IMPORTANT; the following reference and information must be included for all transfers:

Bioconductor 2007
Your name

It is VERY IMPORTANT that you provide the number given above. If the space is limited you should ONLY provide the reference number 501000 as a minimal reference. Do NOT try to write other parts of the full reference number, if you can not write the full number.

Remember also that for INTERNATIONAL bank transfers you must pay ALL the additional costs.



Besides the hand-out material, we recommend the book:

Bioinformatics and Computational Biology Solutions Using R and Bioconductor

Directions and accommodation

Please find information on directions and accommodation on our website. Note that we have no possibility to give financial support for participants.

Last modified: Tue Nov 06 13:35:47 Romance Standard Time 2007