In this tutorial we connect the summary output of the lm() command to the mathematical methods we have learned in lectures. Our aim is to make sure we know what each bit of summary output means, and how it is computed. For illustration we use the stackloss data set which is built into R.

The Summary Output of lm()

As we have done in past tutorials, we fit a linear model using lm() and consider the output of summary(), applied to this model:

m <- lm(stack.loss ~ ., data=stackloss)
summary(m)

Call:
lm(formula = stack.loss ~ ., data = stackloss)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.2377 -1.7117 -0.4551  2.3614  5.6978 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -39.9197    11.8960  -3.356  0.00375 ** 
Air.Flow      0.7156     0.1349   5.307  5.8e-05 ***
Water.Temp    1.2953     0.3680   3.520  0.00263 ** 
Acid.Conc.   -0.1521     0.1563  -0.973  0.34405    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.243 on 17 degrees of freedom
Multiple R-squared:  0.9136,    Adjusted R-squared:  0.8983 
F-statistic:  59.9 on 3 and 17 DF,  p-value: 3.016e-09

We now will go through the output section by section.

Residuals

The three lines starting with “Residuals:” give information about the range of the fitted residuals \(\hat\varepsilon_i\). We can compute the same information ourselves:

summary(resid(m))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-7.2377 -1.7117 -0.4551  0.0000  2.3614  5.6978 

The output can be used to assess whether the residuals are “small”, indicating good model fit. For this, the range of values should be compared to the magnitude of values of the original response variable.

Coefficients

The next section lists the estimated values for the regression coefficients, together with some additional information about each coefficient.

Estimate

We manually check that the coefficients are computed in the same way we learned in lecture. We use model.matrix() to get the design matrix, t() to transpose a matrix, and solve() to invert a matrix.

X <- model.matrix(m)  # design matrix
C <- solve(t(X) %*% X)
C %*% t(X) %*% stackloss$stack.loss
                   [,1]
(Intercept) -39.9196744
Air.Flow      0.7156402
Water.Temp    1.2952861
Acid.Conc.   -0.1521225

These values agree with the values listed in the saummary output, above. Thus, we have confirmed that R uses the rule we learned in lectures to compute the parameter estimates.

Standard Error

The Standard Error column gives the standard deviation of the estimate \(\beta_j\). We can verify that these values are what we expect, by using the formula \(\mathrm{Var}(\hat\beta_j) = \sigma^2 C_{jj}\), replacing the exact value \(\sigma^2\) with the estimate \(\hat\sigma^2\), and taking square roots:

sqrt(diag(C)) * 3.243
(Intercept)    Air.Flow  Water.Temp  Acid.Conc. 
 11.8946621   0.1348431   0.3679830   0.1562765 

t value

The column “t value” gives the test statistic for the hypthesis that \(\beta_j = 0\). ??t is computed at the first column divided by the second column.

Residual Standard Error

The next line lists the estimated standard deviation of the residuals. To verify, we use the formula from lectures:

sqrt(t(resid(m)) %*% resid(m) / 17)
         [,1]
[1,] 3.243364

The final two lines of the output, concerning \(R^2\) values and F-statistic show information which we have not yet discussed in lectures. We will cover the corresponding topics in the coming weeks.

LS0tCnRpdGxlOiAiVHV0b3JpYWwgMyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKSW4gdGhpcyB0dXRvcmlhbCB3ZSBjb25uZWN0IHRoZSBzdW1tYXJ5IG91dHB1dCBvZiB0aGUgYGxtKClgIGNvbW1hbmQKdG8gdGhlIG1hdGhlbWF0aWNhbCBtZXRob2RzIHdlIGhhdmUgbGVhcm5lZCBpbiBsZWN0dXJlcy4gIE91ciBhaW0gaXMgdG8KbWFrZSBzdXJlIHdlIGtub3cgd2hhdCBlYWNoIGJpdCBvZiBzdW1tYXJ5IG91dHB1dCBtZWFucywgYW5kIGhvdyBpdAppcyBjb21wdXRlZC4gIEZvciBpbGx1c3RyYXRpb24gd2UgdXNlIHRoZSBgc3RhY2tsb3NzYCBkYXRhIHNldCB3aGljaAppcyBidWlsdCBpbnRvIFIuCgoKIyBUaGUgU3VtbWFyeSBPdXRwdXQgb2YgYGxtKClgCgpBcyB3ZSBoYXZlIGRvbmUgaW4gcGFzdCB0dXRvcmlhbHMsIHdlIGZpdCBhIGxpbmVhciBtb2RlbCB1c2luZyBgbG0oKWAKYW5kIGNvbnNpZGVyIHRoZSBvdXRwdXQgb2YgYHN1bW1hcnkoKWAsIGFwcGxpZWQgdG8gdGhpcyBtb2RlbDoKCmBgYHtyfQptIDwtIGxtKHN0YWNrLmxvc3MgfiAuLCBkYXRhPXN0YWNrbG9zcykKc3VtbWFyeShtKQpgYGAKCldlIG5vdyB3aWxsIGdvIHRocm91Z2ggdGhlIG91dHB1dCBzZWN0aW9uIGJ5IHNlY3Rpb24uCgojIyBSZXNpZHVhbHMKClRoZSB0aHJlZSBsaW5lcyBzdGFydGluZyB3aXRoICJSZXNpZHVhbHM6IiBnaXZlIGluZm9ybWF0aW9uIGFib3V0IHRoZSByYW5nZQpvZiB0aGUgZml0dGVkIHJlc2lkdWFscyAkXGhhdFx2YXJlcHNpbG9uX2kkLiAgV2UgY2FuIGNvbXB1dGUgdGhlIHNhbWUKaW5mb3JtYXRpb24gb3Vyc2VsdmVzOgoKYGBge3J9CnN1bW1hcnkocmVzaWQobSkpCmBgYAoKVGhlIG91dHB1dCBjYW4gYmUgdXNlZCB0byBhc3Nlc3Mgd2hldGhlciB0aGUgcmVzaWR1YWxzIGFyZSAic21hbGwiLCBpbmRpY2F0aW5nCmdvb2QgbW9kZWwgZml0LiAgRm9yIHRoaXMsIHRoZSByYW5nZSBvZiB2YWx1ZXMgc2hvdWxkIGJlIGNvbXBhcmVkIHRvIHRoZQptYWduaXR1ZGUgb2YgdmFsdWVzIG9mIHRoZSBvcmlnaW5hbCByZXNwb25zZSB2YXJpYWJsZS4KCgojIyBDb2VmZmljaWVudHMKClRoZSBuZXh0IHNlY3Rpb24gbGlzdHMgdGhlIGVzdGltYXRlZCB2YWx1ZXMgZm9yIHRoZSByZWdyZXNzaW9uCmNvZWZmaWNpZW50cywgdG9nZXRoZXIgd2l0aCBzb21lIGFkZGl0aW9uYWwgaW5mb3JtYXRpb24KYWJvdXQgZWFjaCBjb2VmZmljaWVudC4KCiMjIyBFc3RpbWF0ZQoKV2UgbWFudWFsbHkgY2hlY2sgdGhhdCB0aGUgY29lZmZpY2llbnRzIGFyZSBjb21wdXRlZCBpbiB0aGUKc2FtZSB3YXkgd2UgbGVhcm5lZCBpbiBsZWN0dXJlLiAgV2UgdXNlIGBtb2RlbC5tYXRyaXgoKWAgdG8KZ2V0IHRoZSBkZXNpZ24gbWF0cml4LCBgdCgpYCB0byB0cmFuc3Bvc2UgYSBtYXRyaXgsIGFuZApgc29sdmUoKWAgdG8gaW52ZXJ0IGEgbWF0cml4LgoKYGBge3J9ClggPC0gbW9kZWwubWF0cml4KG0pICAjIGRlc2lnbiBtYXRyaXgKQyA8LSBzb2x2ZSh0KFgpICUqJSBYKQpDICUqJSB0KFgpICUqJSBzdGFja2xvc3Mkc3RhY2subG9zcwpgYGAKVGhlc2UgdmFsdWVzIGFncmVlIHdpdGggdGhlIHZhbHVlcyBsaXN0ZWQgaW4gdGhlIHNhdW1tYXJ5IG91dHB1dCwgYWJvdmUuClRodXMsIHdlIGhhdmUgY29uZmlybWVkIHRoYXQgUiB1c2VzIHRoZSBydWxlIHdlIGxlYXJuZWQgaW4gbGVjdHVyZXMgdG8KY29tcHV0ZSB0aGUgcGFyYW1ldGVyIGVzdGltYXRlcy4KCiMjIyBTdGFuZGFyZCBFcnJvcgoKVGhlIFN0YW5kYXJkIEVycm9yIGNvbHVtbiBnaXZlcyB0aGUgc3RhbmRhcmQgZGV2aWF0aW9uIG9mIHRoZSBlc3RpbWF0ZQokXGJldGFfaiQuICBXZSBjYW4gdmVyaWZ5IHRoYXQgdGhlc2UgdmFsdWVzIGFyZSB3aGF0IHdlIGV4cGVjdCwgYnkKdXNpbmcgdGhlIGZvcm11bGEgJFxtYXRocm17VmFyfShcaGF0XGJldGFfaikgPSBcc2lnbWFeMiBDX3tqan0kLApyZXBsYWNpbmcgdGhlIGV4YWN0IHZhbHVlICRcc2lnbWFeMiQgd2l0aCB0aGUgZXN0aW1hdGUgJFxoYXRcc2lnbWFeMiQsCmFuZCB0YWtpbmcgc3F1YXJlIHJvb3RzOgpgYGB7cn0Kc3FydChkaWFnKEMpKSAqIDMuMjQzCmBgYAoKIyMjIHQgdmFsdWUKClRoZSBjb2x1bW4gInQgdmFsdWUiIGdpdmVzIHRoZSB0ZXN0IHN0YXRpc3RpYyBmb3IgdGhlIGh5cHRoZXNpcwp0aGF0ICRcYmV0YV9qID0gMCQuICA/P3QgaXMgY29tcHV0ZWQgYXQgdGhlIGZpcnN0IGNvbHVtbiBkaXZpZGVkCmJ5IHRoZSBzZWNvbmQgY29sdW1uLgoKCiMjIFJlc2lkdWFsIFN0YW5kYXJkIEVycm9yCgpUaGUgbmV4dCBsaW5lIGxpc3RzIHRoZSBlc3RpbWF0ZWQgc3RhbmRhcmQgZGV2aWF0aW9uIG9mIHRoZSByZXNpZHVhbHMuClRvIHZlcmlmeSwgd2UgdXNlIHRoZSBmb3JtdWxhIGZyb20gbGVjdHVyZXM6CgpgYGB7cn0Kc3FydCh0KHJlc2lkKG0pKSAlKiUgcmVzaWQobSkgLyAxNykKYGBgCgpUaGUgZmluYWwgdHdvIGxpbmVzIG9mIHRoZSBvdXRwdXQsIGNvbmNlcm5pbmcgJFJeMiQgdmFsdWVzIGFuZApGLXN0YXRpc3RpYyBzaG93IGluZm9ybWF0aW9uIHdoaWNoIHdlIGhhdmUgbm90IHlldCBkaXNjdXNzZWQgaW4gbGVjdHVyZXMuCldlIHdpbGwgY292ZXIgdGhlIGNvcnJlc3BvbmRpbmcgdG9waWNzIGluIHRoZSBjb21pbmcgd2Vla3MuCg==