# MATH3714 — Linear Regression and Robustness

In this module we will study how linear regression can be used to describe and analyse the relationship between explanatory variables $x_1, \ldots, x_n$ (input) and a response variable $y$ (output). The models we will consider are of the form

$y = \beta_0 + x_1 \beta_1 + \cdots + x_p \beta_p + \varepsilon$,
where the coefficients $\beta_i$ describe how strongly the response depends on the feature $x_i$, and the residual $\varepsilon$ represents the noise, i.e. the component of the data not explicitly described by the model. We will consider the following questions:
• How to estimate the coefficients $\beta_0, \ldots, \beta_p$ from data?
• How much of the variance in $y$ is described by the $x_i$? How much by the noise $\varepsilon$?
• Is a linear model appropriate for the data?
• What happens if there are outliers in the data?

## Time Table

There will be 27 lectures (L1 to L27) and 6 example classes (E1 - E6). The schedule is given in the following table.

w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 25.09.L1 02.10.L3 09.10.L6 16.10.L8 23.10.L11 30.10.L13 06.11.L16 13.11.L18 20.11.L21 27.11.L23 04.12.L26 26.09.L2 03.10.L4 10.10.L7 17.10.L9 24.10.L12 31.10.L14 07.11.L17 14.11.L19 21.11.L22 28.11.L24 05.12.L27 27.09.E1 04.10.L5 11.10.E2 18.10.L10 25.10.E3 01.11.L15 08.11.E4 15.11.L20 22.11.E5 29.11.L25 06.12.E6

## Handouts

The following links contain pdf copies of the handouts from the lectures.

Paper copies of the handouts are ususally available from the blue drawers in front of the taught students office on level 8 of the maths building.

## Software

For the module we will use the statistical computing package R. This program is free software, and you can find the program and documentation at the R project homepage.

My recommendation would be to install the RStudio environment, which includes R, on your own computer and use this for the project. (Choose the open source version, "RStudio Desktop", on the download page.) Alternatively you can use RStudio or plain R on the university computers.

Below you can find the RStudio notebook files from the tutorials. I would recommend to download the "RStudio Notebook" to your own computer and to experiment with it in RStudio yourself (right click on the link and choose "Save link as …"); there is also a non-interactive "HTML version" which you can look at.

Useful resources for learning R include to following:

• Some introductory notes I wrote for the 2015/16 version of the MATH1712 module.
• The R manual.
• The R online help, accessed by typing help() or help.start() in R.
• The departmental web page has a list with some R tutorials.

## Data

The following data sets were used in the module.

1. A toy data set for use in homework 9: ex02-q09.csv
2. The stackloss data set built into R.
3. A toy data set for use in homework 20: ex05-q20.csv