Centering matrix

In mathematics and multivariate statistics, the centering matrix^[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.

Definition

The centering matrix of size n is defined as the n-by-n matrix

C_{n}=I_{n}-{\tfrac {1}{n}}{\mathbb {O}}

where $I_{n}\,$ is the identity matrix of size n and $\mathbb {O}$ is an n-by-n matrix of all 1's. This can also be written as:

C_{n}=I_{n}-{\tfrac {1}{n}}{\mathbf {1}}{\mathbf {1}}^{\top }

where $\mathbf {1}$ is the column-vector of n ones and where $\top$ denotes matrix transpose.

For example

C_{1}={\begin{bmatrix}0\end{bmatrix}}

C_{2}=\left[{\begin{array}{rrr}1&0\\\\0&1\end{array}}\right]-{\frac {1}{2}}\left[{\begin{array}{rrr}1&1\\\\1&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {1}{2}}&-{\frac {1}{2}}\\\\-{\frac {1}{2}}&{\frac {1}{2}}\end{array}}\right]

C_{3}=\left[{\begin{array}{rrr}1&0&0\\\\0&1&0\\\\0&0&1\end{array}}\right]-{\frac {1}{3}}\left[{\begin{array}{rrr}1&1&1\\\\1&1&1\\\\1&1&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {2}{3}}&-{\frac {1}{3}}&-{\frac {1}{3}}\\\\-{\frac {1}{3}}&{\frac {2}{3}}&-{\frac {1}{3}}\\\\-{\frac {1}{3}}&-{\frac {1}{3}}&{\frac {2}{3}}\end{array}}\right]

Properties

Given a column-vector, ${\mathbf {v}}\,$ of size n, the centering property of $C_{n}\,$ can be expressed as

C_{n}\,{\mathbf {v}}={\mathbf {v}}-({\tfrac {1}{n}}{\mathbf {1}}'{\mathbf {v}}){\mathbf {1}}

where ${\tfrac {1}{n}}{\mathbf {1}}'{\mathbf {v}}$ is the mean of the components of ${\mathbf {v}}\,$ .

$C_{n}\,$ is symmetric positive semi-definite.

$C_{n}\,$ is idempotent, so that $C_{n}^{k}=C_{n}$ , for $k=1,2,\ldots$ . Once the mean has been removed, it is zero and removing it again has no effect.

$C_{n}\,$ is singular. The effects of applying the transformation $C_{n}\,{\mathbf {v}}$ cannot be reversed.

$C_{n}\,$ has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.

$C_{n}\,$ has a nullspace of dimension 1, along the vector $\mathbf {1}$ .

$C_{n}\,$ is a projection matrix. That is, $C_{n}{\mathbf {v}}$ is a projection of ${\mathbf {v}}\,$ onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace $\mathbf {1}$ . (This is the subspace of all n-vectors whose components sum to zero.)

Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix $X\,$ , the multiplication $C_{m}\,X$ removes the means from each of the n columns, while $X\,C_{n}$ removes the means from each of the m rows.

The centering matrix provides in particular a succinct way to express the scatter matrix, $S=(X-\mu {\mathbf {1}}')(X-\mu {\mathbf {1}}')'$ of a data sample $X\,$ , where $\mu ={\tfrac {1}{n}}X{\mathbf {1}}$ is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

S=X\,C_{n}(X\,C_{n})'=X\,C_{n}\,C_{n}\,X\,'=X\,C_{n}\,X\,'.

$C_{n}$ is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are $k=n$ , and $p_{1}=p_{2}=\cdots =p_{n}={\frac {1}{n}}$ .

References

↑ John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.

This article is issued from Wikipedia - version of the 11/26/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.