Multivariate Distribution
In this post, we will build multivariate distribution. This is the summary of lecture "Probabilistic Deep Learning with Tensorflow 2" from Imperial College London.
- Packages
- Multivariate Distribution
- Multivariate Normal Distribution with full covariance.
- Further reading and resources
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
tfd = tfp.distributions
plt.rcParams['figure.figsize'] = (10, 6)
print("Tensorflow Version: ", tf.__version__)
print("Tensorflow Probability Version: ", tfp.__version__)
Mutlivariate distribution is the joint distribution that consists of multiple univariate random variable. Usually, one widely-used multivariate distribution is Multivariate Normal Distribution (or MVN for short). From wikipedia,
he multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.
This distribution is the element for constructing neural network architecture, such as Variational AutoEncoder (VAE).
We can make 2D multivariate Normal Distribution with diagonal covariance matrix. The formal form is like this,
$$ X \sim \mathcal{N}(\mu, \Sigma) $$
This distribution contains mean vector $\mu$,
$\mu = E[X] = (E[X_1], E[X_2], \dots, E[X_k])^T$
and $k \times k$ covariance Matrix.
$\Sigma_{i,j} = E[(X_i-\mu_i)(X_j - \mu_j)] = Cov[X_i, X_j] $
normal_diag = tfd.MultivariateNormalDiag(loc=[0, 1], scale_diag=[1, 2])
normal_diag
The usage is the same as univariate distribution. You need to define loc (mean) and scale (standard deviation) of each random variables. Above case, we used 2 random variables that one has 0 mean and 1 standard deviation, and the other is 1 mean and 2 standard deviation. Note that, this distribution has event_shape of 2.
normal_diag.sample(5)
samples = normal_diag.sample(10000)
plt.plot(0, 1, c='r', marker='.') # Center point (0, 1)
plt.scatter(samples[:, 0], samples[:, 1], marker='.', alpha=0.05)
plt.axis('equal')
plt.show()
normal_diag_batch = tfd.MultivariateNormalDiag(loc=[[0, 0], [0, 0], [0, 0]],
scale_diag=[[1, 2], [2, 1], [2, 2]])
normal_diag_batch
In this case, we built 3 batches of multivariate normal distribution. You can see the batch_shape of 3.
samples = normal_diag_batch.sample(5)
samples
In above code, the tensor shape has (5, 3, 2)
. First element shows the number of sample, and second element is batch_size. And last one is the number of random variable.
normal_diag_batch.log_prob(samples)
samples_batch = normal_diag_batch.sample(10000)
samples_batch.shape
If you are not sure about this distribution, you can plot it with each random variable.
fig, ax = plt.subplots(1, 3, sharex=True, sharey=True, figsize=(10, 5))
titles = ['cov_diag=[1, 2]', 'cov_diag=[2, 1]', 'cov_diag=[2, 2]']
for i, (ax, title) in enumerate(zip(ax, titles)):
samples = samples_batch[:, i, :]
ax.scatter(samples[:, 0], samples[:, 1], marker='.', alpha=0.05)
ax.set_title(title)
plt.show()
Multivariate Normal Distribution with full covariance.
In the previous section, we covered multivariate distribution with diagonal covariance matrix, i.e. $\Sigma = \sigma^2 I$. This is known as a spherical or isotropic Gaussian. This name comes from the spherical (or circular) contours of its probability density function, as you can see from the plot below for the two-dimensional case.
spherical_2d_gaussian = tfd.MultivariateNormalDiag(loc=[0, 0])
samples = spherical_2d_gaussian.sample(10000)
sns.jointplot(x=samples[:, 0], y=samples[:, 1], kind='kde', space=0)
plt.show()
Note that, diagonal covariance matrix results in the components of the random vector being independent.
You can define a full covariance Gaussian distribution in TensorFlow using the Distribution tfd.MultivariateNormalTriL
.
For the reference, FullTriL
stands for Full covariance with Lower Triangular matrix.
Mathematically, the parameters of a multivariate Gaussian are a mean $\mu$ and a covariance matrix $\Sigma$, and so the tfd.MultivariateNormalTriL
constructor requires two arguments:
-
loc
, a Tensor of floats corresponding to $\mu$, -
scale_tril
, a a lower-triangular matrix $L$ such that $LL^T = \Sigma$.
For a $d$-dimensional random variable, the lower-triangular matrix $L$ looks like this:
\begin{equation} L = \begin{bmatrix} l_{1, 1} & 0 & 0 & \cdots & 0 \\ l_{2, 1} & l_{2, 2} & 0 & \cdots & 0 \\ l_{3, 1} & l_{3, 2} & l_{3, 3} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{d, 1} & l_{d, 2} & l_{d, 3} & \cdots & l_{d, d} \end{bmatrix}, \end{equation}where the diagonal entries are positive: $l_{i, i} > 0$ for $i=1,\ldots,d$.
Here is an example of creating a two-dimensional Gaussian with non-diagonal covariance:
mu = [0., 0.]
scale_tril = [[1., 0.],
[0.6, 0.8]]
sigma = tf.matmul(tf.constant(scale_tril), tf.transpose(tf.constant(scale_tril)))
sigma
nonspherical_2d_gaussian = tfd.MultivariateNormalTriL(loc=mu, scale_tril=scale_tril)
nonspherical_2d_gaussian
nonspherical_2d_gaussian.mean()
nonspherical_2d_gaussian.covariance()
samples = nonspherical_2d_gaussian.sample(10000)
sns.jointplot(x=samples[:, 0], y=samples[:, 1], kind='kde', space=0, color='r')
plt.show()
As you can see, the approximate density contours are now elliptical rather than circular. This is because the components of the Gaussian are correlated.
Also note that the marginal distributions (shown on the sides of the plot) are both univariate Gaussian distributions.
In the above example, we defined the lower triangular matrix $L$ and used that to build the multivariate Gaussian distribution. The covariance matrix is easily computed from $L$ as $\Sigma = LL^T$.
The reason that we define the multivariate Gaussian distribution in this way - as opposed to directly passing in the covariance matrix - is that not every matrix is a valid covariance matrix. The covariance matrix must have the following properties:
- It is symmetric
- It is positive (semi-)definite
A symmetric matrix $M \in \mathbb{R}^{d\times d}$ is positive semi-definite if it satisfies $b^TMb \ge 0$ for all nonzero $b\in\mathbb{R}^d$. If, in addition, we have $b^TMb = 0 \Rightarrow b=0$ then $M$ is positive definite.
The Cholesky decomposition is a useful way of writing a covariance matrix. The decomposition is described by this result:
For every real-valued symmetric positive-definite matrix $M$, there is a unique lower-diagonal matrix $L$ that has positive diagonal entries for which
\begin{equation} LL^T = M \end{equation}This is called the Cholesky decomposition of $M$.
This result shows us why Gaussian distributions with full covariance are completely represented by the MultivariateNormalTriL
Distribution.
In case you have a valid covariance matrix $\Sigma$ and would like to compute the lower triangular matrix $L$ above to instantiate a MultivariateNormalTriL
object, this can be done with the tf.linalg.cholesky
function.
sigma = [[10., 5.], [5., 10.]]
scale_tril = tf.linalg.cholesky(sigma)
scale_tril
You can check the result with formal definition.
tf.linalg.matmul(scale_tril, tf.transpose(scale_tril))
If the argument to the tf.linalg.cholesky
is not positive definite, then it will fail:
bad_sigma = [[10., 11.], [11., 10.]]
try:
scale_tril = tf.linalg.cholesky(bad_sigma)
except Exception as e:
print(e)
What about positive semi-definite matrices?
In cases where the matrix is only positive semi-definite, the Cholesky decomposition exists (if the diagonal entries of $L$ can be zero) but it is not unique.
For covariance matrices, this corresponds to the degenerate case where the probability density function collapses to a subspace of the event space. This is demonstrated in the following example:
psd_mvn = tfd.MultivariateNormalTriL(loc=[0., 0.], scale_tril=[[1., 0.], [0.4, 0.]])
psd_mvn
samples = psd_mvn.sample(10000)
plt.scatter(samples[:, 0], samples[:, 1], alpha=0.5)
plt.xlim(-5, 5)
plt.ylim(-5, 5)
plt.title('Scatter plot of samples')
plt.show()
If the input to the function tf.linalg.cholesky
is positive semi-definite but not positive definite, it will also fail:
another_bad_sigma = [[10., 0.], [0., 0.]]
try:
scale_tril = tf.linalg.cholesky(another_bad_sigma)
except Exception as e:
print(e)
In summary: if the covariance matrix $\Sigma$ for your multivariate Gaussian distribution is positive-definite, then an algorithm that computes the Cholesky decomposition of $\Sigma$ returns a lower-triangular matrix $L$ such that $LL^T = \Sigma$. This $L$ can then be passed as the scale_tril
of MultivariateNormalTriL
.
You are now ready to put everything that you have learned in this reading together.
To create a multivariate Gaussian distribution with full covariance you need to:
-
Specify parameters $\mu$ and either $\Sigma$ (a symmetric positive definite matrix) or $L$ (a lower triangular matrix with positive diagonal elements), such that $\Sigma = LL^T$.
-
If only $\Sigma$ is specified, compute
scale_tril = tf.linalg.cholesky(sigma)
. -
Create the distribution:
multivariate_normal = tfd.MultivariateNormalTriL(loc=mu, scale_tril=scale_tril)
.
mu = [1., 2., 3.]
sigma = [[0.5, 0.1, 0.1],
[0.1, 1., 0.6],
[0.1, 0.6, 2.]]
scale_tril = tf.linalg.cholesky(sigma)
multivariate_normal = tfd.MultivariateNormalTriL(loc=mu, scale_tril=scale_tril)
multivariate_normal.covariance()
multivariate_normal.mean()
MultivariateNormalFullCovariance
Deprecated: There was previously a class called tfd.MultivariateNormalFullCovariance
which takes the full covariance matrix in its constructor, but this is being deprecated. Two reasons for this are:
- covariance matrices are symmetric, so specifying one directly involves passing redundant information, which involves writing unnecessary code.
- it is easier to enforce positive-definiteness through constraints on the elements of a decomposition than through a covariance matrix itself. The decomposition's only constraint is that its diagonal elements are positive, a condition that is easy to parameterize for.