Linear Discriminant Analysis (LDA)
In this post, We will implement the basis of Linear Discriminant Analysis (LDA).
import numpy as np
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
mu0 = [0, 0]
sigma0 = np.array([[1, 0.5], [0.5, 1]])
sigma1 = np.array([[1, 0.5], [0.5, 1]])
X = np.random.multivariate_normal(mu0, sigma0, 5000)
X[2000:, 0] = X[2000:, 0] + 2
X[2000:, 1] = X[2000:, 1] + 2
y = np.ones((5000,))
y[:2000] = 0
X[:6, :], y[:6]
fig, ax = plt.subplots(1, 1, figsize=(16, 10))
plt.scatter(X[:, 0], X[:, 1], s=3, c=list(map(lambda x:'blue' if x==0 else 'red', y)))
plt.show()
Formula
To implement the LDA, we need to know the bayes rule. And it requires to implement probability distribution function. Basic notation is like this:
$$ f(x \vert y) = \frac{1}{\vert 2 \pi \Sigma \vert^{\frac{1}{2}}} \text{exp} \big( -\frac{1}{2} (x - \mu)^T (\Sigma)^{-1} (x - \mu)\big) $$
In case of binary classification (like above case), we can express the two kind of conditional probability cases that $y=0$ and $y=1$.
$$ f(x \vert y=0) = \frac{1}{\vert 2 \pi \Sigma^{(0)} \vert^{\frac{1}{2}}} \text{exp} \big( -\frac{1}{2} (x - \mu^{(0)})^T (\Sigma^{(0)})^{-1} (x - \mu^{(0)})\big)\\ f(x \vert y=1) = \frac{1}{\vert 2 \pi \Sigma^{(1)} \vert^{\frac{1}{2}}} \text{exp} \big( -\frac{1}{2} (x - \mu^{(1)})^T (\Sigma^{(1)})^{-1} (x - \mu^{(1)})\big) $$In python, we can implement with numpy math operation.
mu0 = np.array([[0], [0]])
mu1 = np.array([[2], [2]])
def f_x_y_0(x):
const = 1 / (2 * np.pi * np.power(np.linalg.det(sigma0), 1/2))
return const * np.exp(-0.5 * (x - mu0).T @ np.linalg.inv(sigma0) @ (x - mu0))[0, 0]
def f_x_y_1(x):
const = 1 / (2 * np.pi * np.power(np.linalg.det(sigma1), 1/2))
return const * np.exp(-0.5 * (x - mu1).T @ np.linalg.inv(sigma1) @ (x - mu1))[0, 0]
f_x_y_0(np.array([[0.5], [0.5]]))
f_x_y_1(np.array([[0.5], [0.5]]))
Afterthat, we can define sample mean and covariance.
mu0_hat = np.mean(X[:2000, :], axis=0)
sigma0_hat = np.cov(X[:2000, :].T)
mu0_hat, sigma0_hat
mu1_hat = np.mean(X[2000:, :], axis=0)
sigma1_hat = np.cov(X[2000:, :].T)
mu1_hat, sigma1_hat
Based on these, we can define the PDF of these distributions.
def fhat_x_y_0(x):
const = 1 / (2 * np.pi * np.power(np.linalg.det(sigma0_hat), 1/2))
return const * np.exp(-0.5 * (x - mu0_hat).T @ np.linalg.inv(sigma0_hat) @ (x - mu0_hat))[0, 0]
def fhat_x_y_1(x):
const = 1 / (2 * np.pi * np.power(np.linalg.det(sigma1_hat), 1/2))
return const * np.exp(-0.5 * (x - mu1_hat).T @ np.linalg.inv(sigma1_hat) @ (x - mu1_hat))[0, 0]
fhat_x_y_0(np.array([[0.5], [0.5]]))
fhat_x_y_1(np.array([[0.5], [0.5]]))
Actually, we already know the distribution of label. Using this, we can calculate the bayes probability. The formula of bayes rule is like this:
$$ P(Y = y \vert X = x) = \frac{f(x \vert Y = y)P(Y = y)}{f(x \vert y = 0) P(Y = 0) + f(x \vert y=1)P(Y=1)} $$
p0 = 2000/5000
p1 = 1 - p0
def bayes_prob(x, y):
if y == 1:
return f_x_y_1(x) * p1 / (f_x_y_0(x) * p0 + f_x_y_1(x) * p1)
else:
return f_x_y_0(x) * p0 / (f_x_y_0(x) * p0 + f_x_y_1(x) * p1)
bayes_prob(np.array([[0.5], [0.5]]), 0)
bayes_prob(np.array([[0.5], [0.5]]), 1)
Also, we can estimate it with our hypothesis.
p0_hat = 2000 / 5000
p1_hat = 3000 / 5000
def bayes_prob_hat(x, y):
if y == 1:
return fhat_x_y_1(x) * p1_hat / (fhat_x_y_0(x) * p0_hat + fhat_x_y_1(x) * p1_hat)
else:
return fhat_x_y_0(x) * p0_hat / (fhat_x_y_0(x) * p0_hat + fhat_x_y_1(x) * p1_hat)
bayes_prob_hat(np.array([[0.5], [0.5]]), 0), bayes_prob_hat(np.array([[0.5], [0.5]]), 1)
We can confirm that sample data ([0.5], [0.5]
) is classified with label 0
.
We can do the same process with Scikit-learn.
lda = LinearDiscriminantAnalysis(n_components=1, solver='svd', store_covariance=True).fit(X, y)
lda.means_
mu0 = lda.means_[0, :]
mu1 = lda.means_[1, :]
lda_cov = lda.covariance_
mu0, mu1, lda_cov
Using trained discriminator, we can calculate the probability of each label. For doing this, we need to implement delta function to express it. The formula of delta function is like this.
$$ \delta_k(x) = x^T \Sigma^{-1} \mu_k - \frac{1}{2} \mu^T_k \Sigma^{-1} \mu_k + \log \pi_k $$
All we need to do is apply argmax operation in that probability.
def discriminator(x):
d0 = x.T @ np.linalg.inv(lda_cov) @ mu0 - 0.5 * mu0.T @ np.linalg.inv(lda_cov) @ mu0 + np.log(p0)
d1 = x.T @ np.linalg.inv(lda_cov) @ mu1 - 0.5 * mu1.T @ np.linalg.inv(lda_cov) @ mu1 + np.log(p1)
return [d0[0], d1[0]]
discriminator(np.array([[0.5], [0.5]]))
np.argmax(discriminator(np.array([[0.5], [0.5]])))
Through this, we can confirm that LDA model can classify the sample data ([0.5], [0.5]
) with 0.
So how about the accuracy of fitted LDA model?
y_hat = np.apply_along_axis(lambda x: np.argmax(discriminator(np.reshape(x, (2, 1)))), 1, X)
fig, ax = plt.subplots(1, 2, figsize=(16, 10))
ax[0].scatter(X[:, 0], X[:, 1], s=3, c=list(map(lambda x:'blue' if x==0 else 'red', y)))
ax[1].scatter(X[:, 0], X[:, 1], s=3, c=list(map(lambda x:'blue' if x==0 else 'red', y_hat)))
plt.show()
(y == y_hat).mean()
LDA is one of Linear Classifier. So we can the result of LDA classification, though some errors are occurred. As a result, LDA classifier has almost 87% accuracy of random dataset.