MATH 3880-5880 - Intro. to Statistics and DNA
Semester 2 2018/2019

[Messages] [Lectures] [Handouts] [Exercise sheets] [Practical] [Data sets] [Help with R] [Useful links] [Contact details]


Welcome to MATH3880/5880 (Intro. to) Statistics and DNA

Welcome to the webpage for the module MATH3880/5880 (Intro.) Statistics and DNA. In this webpage, you will find some information related to the lecture and workshop (tutorial). This webpage also contains datasets that you will need to work on in your R sessions. The electronic copies of the handout, exercises, and solutions are available in Minerva when they are due.


Part I - Analysis of microarray data

Part II - Statistical Genetics

Part III - Phylogenetics

Lecture slides for MATH5880 only

[Back to top of page]


The handouts are now available from minerva only. You can access them by selecting "Learning resources" in the left tab, and then "Handouts".

[Back to top of page]

Exercise sheets

The exercises are available from the minerva only. You can access them by selecting "Learning resources" in the left tab, and then select "Exercises".

[Back to top of page]

Practical Session

The practical session will be held in the Fourman Cluster on Monday 4th March 2019, at 4-6pm (Week 6 of teaching). The tasks will be distributed on Monday 11 February 2019 in printed copies and electronic copies via minerva. The deadline for the coursework will be on Monday 11 March 2019 at 5pm (one week after the practical).

Please note that we have a strict policy on late submission: unexcused late submission of coursework normally results in a deduction of 5% for each calendar day (not working day) past the submission deadline. No marks will be given for coursework submitted on or after Monday 18 March 2019.

[Back to top of page]


LPS experiment in cDNA microarrays ("LPS data")

Description: Lipopolysaccharide (LPS) is a toxin produced by gram negative bacteria. It can trigger our immune system. Monocytes are white blood cells involved in immunity (there are three types of white blood cells: Granulocytes, Monocytes and Lymphocytes; our focus in this experiement is on monocytes). They are triggered by LPS.

Our aim was to determine which effect LPS has on the gene expressions in monocytes.

In this study a LPS bolus was injected into healthy male volunteers. Just before (t=0) and one hour (t=1) after injection, blood samples were taken. From this blood monocytes were isolated, and RNA extracted. Using SMART cDNA Amplification, cDNA was made from the RNA.

Please note that there are 18 arrays in total in the study. However, for illustrative purposes, we only use 4 arrays in this module. File format: Genepix output. To download these files, put the cursor of your computer on the file name, and right click with your mouse. Choose the menu "Save target as ..." or "Save link as ..." and save the file in your folder. Be aware that sometimes your browser has a 'favourite' place to store the downloaded file. Try to search the 'Desktop' folder or 'Downloads' folder.

Array 1: 1355-5.gpr

Array 2: 1355-12.gpr

Array 3: 1358-13.gpr

Array 4: 1358-17.gpr

Description file: LPS1-info.txt

Paper: Sivapalaratnam et al. (2011) Identification of candidate genes linking systemic inflammation to atherosclerosis; results of a human in vivo LPS infusion study, BMC Medical Genomics 4: 64 (The link to the article is here, and the PDF of the paper is available in the minerva)

Breast Cancer dataset

Description: A study of Breast Cancer in Stockholm, Sweden. Platform: Affymetrix HGU133A. Total 159 arrays. Note that the data is not in raw format. It is already normalised and 'summarised'. To download the data, click here. There are many research questions are involved in this study. For our purpose, we consider the objective to identify genes that are differentially expressed between ER+ and ER- patients. ER is Estrogen Receptor, which is a protein that is activated by the hormone estrogen. ER+ basically means that the cancer in that patient is proliferated by estrogen, and ER- means that the cancer is not proliferated by estrogen.

A subset of the data, containing only 5 arrays from ER-positive patients and 5 arrays from ER-negative patients in approx. 10,000 probesets can be downloaded here.

Paper: Pawitan et al (2005) Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts, Breast Cancer Research, 7:R953-R964 (The link to the article is here, and the PDF of the paper is available in the minerva)

Prolymphocytic leukaemia (PLL)

Description: Prolymphocytic leukaemia (PLL) is a specific type of leukaemia. Lymphocytes (a type of white blood cells) has three types:

Some explanation on PLL can be found here. This link in Wikipedia will lead you to an explanation on T-PLL, which is PLL that affecting T lymphocytes.

A microarray study has been conducted to identify genes that differentially expressed in the T cells between normal individual and individual with T-PLL. The experiment is discussed here (in Array Express). The raw data of the experiment is in the CEL formats, which is zip'ed (to reduce the size of the files) and is downloadable from the Array Express link, or directly from here. You need to unzip it first to get a folder that contains the CEL files. The experimental description of the files are available in this link from Array Express. If you wish to open the experiment description in a web browser, click this link in Array Express.

Paper: Durig et al (2007) Combined single nucleotide polymorphism-based genomic mapping and global gene expression profiling identifies novel chromosomal imbalances, mechanisms and candidate genes important in the pathogenesis of T-cell prolymphocytic leukemia with inv(14)(q11q32), Leukemia, 21, 2153-2163 (The link to the article is here, and the PDF of the paper is available in the minerva)

[Back to top of page]

Help with R

Getting Started

To start R on an ISS computer, go to Start -> Programs -> Statistics -> R 3.x.y (x and y will change as new versions of R are released).

The School of Mathematics computing documentation web page includes links to some guides to help you get started with R. As a minimum, I recommend working through the document "An Introduction to R on Windows XP".

Percentage points

I've made some notes on finding percentage points in R available to help with exercises 1.

[Back to top of page]

Useful links

You might like to look at the University's official description of the module MATH3880 and MATH5880

Journal articles

Some journal articles are relevant to our discussion in the linear models for microarray data

Other links

[Back to top of page]

Contact details

My office is in Room 10.14 in the Maths Satellite (EC Stoner, level 10, between staircases 1 and 2), my email address is a.gusnanto (at) leeds dot ac dot uk. I am generally available on Tuesdays 4-5pm. For other times, please let me know in advance via email or talk to me after the lecture.

[University of Leeds] [School of Mathematics] [Department of Statistics]
Last modified: Tue Sept 22 17:13:05 GMT 2015