The Projection Matrix and Deleted Residuals

25/06/2019 3-minute read

Introduction

Assessing the goodness-of-fit and the assumptions of statistical model used is a crucial task for the data analyst. In case of linear regression models several measures can be used to evaluate the goodness-of-fit and model assumptions. For instance,

  • deleted residuals,
  • Cook distance,
  • DFFits,
  • DFBetas,
  • Covratio statistic,
  • PRESS statistic

among others.

In addition, to evaluate the model assumptions these measures depend, in some way, on the projection matrix, also known hat matrix, which is defined as follows:

\[\mathbf{H} = \mathbf{X}\left(\mathbf{X}^\top\,\mathbf{X}\right)^{-1}\mathbf{X}^\top\] where \(\mathbf{X}\) is the design matrix \(n\times p\), i.e., \[\begin{equation}\label{eq:X} \mathbf{X} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{1(p-1)}\\ 1 & x_{21} & \cdots & x_{2(p-1)}\\ \vdots& \vdots & \cdots& \vdots \\ 1 & x_{n1} & \cdots & x_{n(p-1)} \end{bmatrix} \end{equation}\] where \(x_{ij}\) is the \(i\)-th observation of \(j\)-th predictor..

The definition of mentioned measured and their interpretations can be found here.

In this post I shall explore the relationship between the projection matrix, \(\mathbf{H}\) and the deleted residuals. The later are useful mainly for influence diagnostics to detect outlier observations.

The relationship

The deleted residuals is defined as \[\begin{equation}\label{eq:di_1} d_i = y_i - \widehat{y}_{i(i)} \end{equation}\] where \(y_i\) is the \(i\)-th value of response variable and \(\widehat{y}_{i(i)}\) is the \(i\)-th predicted value of response variable without the \(i\)-th observations, that is, \[\begin{equation}\label{eq:yhat_i} \widehat{y}_{i(i)} = \mathbf{x}_i^\top \widehat{\boldsymbol{\beta}}_{(i)} \end{equation}\] where \(\mathbf{x}_i = (1, x_{i1}, \ldots, x_{i(p-1)} )^\top\) is the \(i\)-th row of matrix \(\mathbf{X}\) and \(\widehat{\boldsymbol{\beta}}_{(i)}\) is the least square estimator of regression model without the \(i\)-th observation.

As I discussed in this post the least squares estimators of vector \(\boldsymbol{\beta}\) is the orthogonal projection of vector \(\mathbf{y}\) into vector space generated by the columns of matrix \(\mathbf{X}\), \(C(\mathbf{X})\). Hence, removing the \(i\)-th observation, we obtained \[\begin{equation}\label{eq:beta_ii} \widehat{\mathbf{\beta}}_{(i)} = \left(\mathbf{X}_{(i)}^\top\,\mathbf{X}_{(i)}\right)^{-1}\mathbf{X}_{(i)}^\top\,\mathbf{y}_{(i)} \end{equation}\] where \(\mathbf{X}_{(i)}\) is the design matrix, \(\mathbf{X}\), without the \(i\)-th row and \(\mathbf{y}_{(i)}\) is the vector of response variable values without the \(i\)-th observation.

Note that the deleted residuals prevent that a supposed aberrant value, \(y_i\), influencing its respective predicted value, then the residual value, \(d_i\), tends to be larger and more likely to characterize an aberrant observation.

Also, note that using this expression to obtain the values of deleted residuals we should exclude the \(i\)-th observation and fit the model for the remaining \(n-1\) observations. This procedure is repeated \(n\) time, meaning that we must fit \(n\) regression models.

Fortunately, for the linear regression model it is not necessary to perform such procedure, which in practice could be computationally expensive, depending on the sample size \((n\)). Particularly, it is possible to express the deleted residuals in terms of the fitted model with all \(n\) observations.

Let \(e_i = y_i - \widehat{y}_i\) be the ordinary residual and e \(h_{ii} = \mathbf{x}_i^\top\left(\mathbf{X}^\top\,\mathbf{X}\right)^{-1}\mathbf{x}_i\) the diagonal element of matrix projection \(\mathbf{H}\). To obtain a formula for \(d_i\) using the fitted model with completed data we must write \(\widehat{\mathbf{\beta}}_{(i)}\) in terms of matrix \(\mathbf{X}\) and the vector \(\mathbf{y}\) with all observations. Note that \[ \mathbf{X}_{(i)}^\top\,\mathbf{X}_{(i)} = \mathbf{X}^\top\mathbf{X} - \mathbf{x}_i\mathbf{x}_i^\top \qquad \mbox{and} \qquad \mathbf{X}_{(i)}^\top\,\mathbf{y}_{(i)} = \mathbf{X}^\top\,\mathbf{y} - \mathbf{x}_iy_i \] Using the Sherman-Morrison formula we obtained \[ \left(\mathbf{X}_{(i)}^\top\,\mathbf{X}_{(i)}\right)^{-1} = \left(\mathbf{X}^\top\mathbf{X}\right)^{-1} + \dfrac{\left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i\,\mathbf{x}_i^\top \left(\mathbf{X}^\top\mathbf{X}\right)^{-1}} {1 - \mathbf{x}^\top_i\left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i} \] Hence, we can write \(\widehat{\mathbf{\beta}}_{(i)}\) as follows \[\begin{eqnarray} \widehat{\mathbf{\beta}}_{(i)} &=& \left[\left(\mathbf{X}^\top\mathbf{X}\right)^{-1} + \dfrac{\left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i\,\mathbf{x}_i^\top \left(\mathbf{X}^\top\mathbf{X}\right)^{-1}} {1 - \mathbf{x}^\top_i\left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i}\right]\, \left( \mathbf{X}^\top\,\mathbf{y} - \mathbf{x}_i y_i \right) \nonumber \\ \nonumber &=& \widehat{\mathbf{\beta}} - \left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i\, \left[\dfrac{y_i\,(1-h_{ii}) - \mathbf{x}_i^\top\widehat{\mathbf{\beta}} + h_{ii}\,y_i}{1-h_{ii}}\right]\\ &=& \widehat{\mathbf{\beta}} - \left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i\,\left[\dfrac{y_i - \mathbf{x}^\top_i\widehat{\mathbf{\beta}}}{1-h_{ii}}\right] \nonumber\\\label{eq:beta_ii_2} \therefore \widehat{\mathbf{\beta}}_{(i)}&=& \widehat{\mathbf{\beta}} - \left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i\,\cdot\dfrac{e_i}{1-h_{ii}}. \end{eqnarray}\]

Replacing the last equality in the expression of \(\widehat{y}_{i(i)}\) we conclude that \[\begin{eqnarray} d_i &=& y_i - \mathbf{x}_i^\top\left[\widehat{\mathbf{\beta}} - \left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i\,\cdot\dfrac{e_i}{1-h_{ii}}\right] \nonumber\\ &=& y_i - \mathbf{x}_i^\top\widehat{\mathbf{\beta}} + \mathbf{x}_i^\top\left(\mathbf{X}^\top\mathbf{X}\right)^{-1}\mathbf{x}_i\,\cdot\dfrac{e_i}{1-h_{ii}} \nonumber\\ &=& e_i + \dfrac{h_{ii}\,e_i}{1 - h_{ii}} \nonumber\\\nonumber \therefore d_i &=& \dfrac{e_i}{1 - h_{ii}}. \end{eqnarray}\]