Multivariate Gaussians

Multivariate Gaussians are often presented as intimidating objects. The density function looks kind of scary:

\[f(\vec{x}) = \frac{1}{\sqrt{(2\pi)^n \det(\Sigma)}} \exp\left( -\frac{1}{2} (\vec{x} - \vec{\mu})^{\top} \Sigma^{-1} (\vec{x} - \vec{\mu}) \right).\]

But there’s a clearer perspective: any multivariate Gaussian can be disentangled into independent 1-dimensional Gaussians. More precisely, there’s always a basis in which a multivariate Gaussian has independent coordinates. This basis is given by the eigenbasis of the covariance matrix, and the eigenvalues correspond to the variances along each axis.

Random vectors

Let \(X = (x_1, \ldots, x_k)^{\top} \in \mathbb{R}^{k}\) be a random vector. The expectation and covariance of \(X\) can be written as follows:

\[\mathbb{E}[X] = \left(\mathbb{E}[x_1], \ldots, \mathbb{E}[x_k]\right)^{\top},\] \[\text{Cov}(X) = \mathbb{E}\left[\left(X - \mathbb{E}[X]\right)\left(X - \mathbb{E}[X]\right)^{\top}\right].\]

Note that the covariance matrix \(\text{Cov}(X)\) is symmetric and positive semi-definite. To see this, let’s fix some vector \(a \in \mathbb{R}^{k}\) and consider \(a^{\top} \text{Cov}(X) a\):

\[\begin{align*} a^{\top} \text{Cov}(X) a &= a^{\top} \mathbb{E}\left[\left(X - \mathbb{E}[X]\right)\left(X - \mathbb{E}[X]\right)^{\top}\right] a \\ &= \mathbb{E}\left[a^{\top} \left(X - \mathbb{E}[X]\right)\left(X - \mathbb{E}[X]\right)^{\top} a\right] \\ &= \mathbb{E}\left[(a^{\top} (X - \mathbb{E}[X]))^2\right] \\ &\geq 0. \end{align*}\]

If we multiply a random vector \(X \in \mathbb{R}^{k}\) by a matrix \(A \in \mathbb{R}^{n \times k}\), then the covariance of the resulting random vector \(Y = AX \in \mathbb{R}^{n}\) is given by:

\[\begin{align*} \text{Cov}(Y) &= \mathbb{E}\left[\left(Y - \mathbb{E}[Y]\right)\left(Y - \mathbb{E}[Y]\right)^{\top}\right] \\ &= \mathbb{E}\left[\left(AX - A\mathbb{E}[X]\right)\left(AX - A\mathbb{E}[X]\right)^{\top}\right] \\ &= \mathbb{E}\left[A\left(X - \mathbb{E}[X]\right)\left(X - \mathbb{E}[X]\right)^{\top} A^{\top}\right] \\ &= A \mathbb{E}\left[\left(X - \mathbb{E}[X]\right)\left(X - \mathbb{E}[X]\right)^{\top}\right] A^{\top} \\ &= A \text{Cov}(X) A^{\top}. \end{align*}\]

Multivariate Gaussian distribution

Now we can define the multivariate Gaussian distribution.

Let \(g = (g_1, \ldots, g_k)^{\top} \in \mathbb{R}^{k}\) be a vector of i.i.d. standard normal random variables. Given a matrix \(A \in \mathbb{R}^{n \times k}\), the covariance of \(Ag\) is \(A \text{Cov}(g) A^{\top} = A I A^{\top} = A A^{\top}\).

The distribution of \(Ag\) is called a multivariate normal distribution with covariance \(\Sigma = A A^{\top}\), denoted \(\mathcal{N}(0, \Sigma)\).

Since \(\Sigma\) is symmetric positive semi-definite, we can write its eigendecomposition as

\[\Sigma = Q \Lambda Q^{\top},\]

where \(Q \in \mathbb{R}^{n \times n}\) is an orthogonal matrix, and \(\Lambda \in \mathbb{R}^{n \times n}\) is a diagonal matrix with eigenvalues \(\lambda_1, \ldots, \lambda_n\) on the diagonal.

Now we can take

\[A = Q \Lambda^{1/2} \quad \text{or} \quad A = Q \Lambda^{1/2} Q^{\top}\]

so that \(A A^{\top} = Q \Lambda^{1/2} \Lambda^{1/2} Q^{\top} = Q \Lambda Q^{\top} = \Sigma\).

For example, take \(X = Q \Lambda^{1/2} g\) for \(g \sim \mathcal{N}(0, I)\). Then \(X \sim \mathcal{N}(0, \Sigma)\).

Visualizing the transformation \(X = Q\Lambda^{1/2}g\)

1. Start: g ~ N(0, I)

i.i.d. standard normals

→

2. Scale: Λ^1/2g

stretch by λ₁^1/2, λ₂^1/2

→

3. Rotate: QΛ^1/2g

align to q₁, q₂

λ₁ = 2.5

λ₂ = 0.4

θ = 30°

Disentangling the multivariate Gaussian

Now label \(\vec{q}_1, \ldots, \vec{q}_n\) as the columns of \(Q\).

Expanding \(X = Q \Lambda^{1/2} g\):

\[X = \sqrt{\lambda_1} g_1 \vec{q}_1 + \ldots + \sqrt{\lambda_n} g_n \vec{q}_n = \sum_{i=1}^{n} \sqrt{\lambda_i} g_i \vec{q}_i.\]

In the orthonormal basis \(\vec{q}_1, \ldots, \vec{q}_n\), the vector \(X\) has coordinates \((\sqrt{\lambda_1} g_1, \ldots, \sqrt{\lambda_n} g_n)\). These coordinates are independent 1-dimensional Gaussian random variables, with variances \(\lambda_1, \ldots, \lambda_n\).

If \(\Sigma\) is singular (i.e. \(\det \Sigma = 0\)), some eigenvalues will be zero, say \(\lambda_{k+1} = \ldots = \lambda_n = 0\). The random vector is then concentrated on the subspace spanned by \(\vec{q}_1, \ldots, \vec{q}_k\), and has no density on \(\mathbb{R}^n\). On this subspace, \(X\) has density (in the eigenbasis coordinates):

\[f(x_1, \ldots, x_k) = \prod_{i=1}^{k} \frac{1}{\sqrt{2\pi\lambda_i}} \exp\left( -\frac{x_i^2}{2\lambda_i} \right).\]

Sources

Statistics for Applications, Lecture 4 - Dmitry Panchenko, 2006