Sampling and log probs
In this post, we will take a look at how broadcasting rules can be applied to the prob and log_prob methods of a distribution method. This is the summary of lecture "Probabilistic Deep Learning with Tensorflow 2" from Imperial College London.
import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt
tfd = tfp.distributions
plt.rcParams['figure.figsize'] = (10, 6)
print("Tensorflow Version: ", tf.__version__)
print("Tensorflow Probability Version: ", tfp.__version__)
We will build simple exponential distribution that have 2 by 3 batch shaped. And it is univariate distribution.
exp = tfd.Exponential(rate=[[1., 1.5, 0.8], [0.3, 0.4, 1.8]])
exp
We can convert it Multivariate distribution with Independent
Distribution.
ind_exp = tfd.Independent(exp)
ind_exp
Note that we don't use reinterpreted_batch_ndims
keyword, so it will convert all but the first batch dimension into the event_shape, which is the default operation.
ind_exp.sample(4)
Refresh that when we convert the univariate distribution to independent distribution and sample it, its shape will be (sample_size, batch_size, event_shape).
Let's take a look another distribution, which use reinterpreted_batch_ndims
keyword.
rates = [
[[[1., 1.5, 0.8], [0.3, 0.4, 1.8]]],
[[[0.2, 0.4, 1.4], [0.4, 1.1, 0.9]]]
]
np.array(rates).shape
exp = tfd.Exponential(rate=rates)
exp
ind_exp = tfd.Independent(exp, reinterpreted_batch_ndims=2)
ind_exp
Now we have a rank 2 batch_shape and rank 2 event_shape.
ind_exp.sample([4, 2])
When we sample rank 2 data, its output tensor has rank 6 tensor, which has same manner of generating single sample. In details, it has [4, 2] sample size, [2, 1] batchs, and [2, 3] events.
ind_exp.log_prob(0.5)
When we want some log probability of single data, its output is broadcasting and has batch_size shape. As a general ruls, the log prob (and prob) method broadcast its input against the batch and event_shape.
ind_exp.log_prob([[0.3, 0.5, 0.8]])
So what if we want to gather log probability of complex tensors?
tf.random.uniform((5, 1, 1, 2, 1)).shape
ind_exp.log_prob(tf.random.uniform((5, 1, 1, 2, 1)))
loc = [[0.5, 1], [0.1, 0], [0, 0.2]]
scale_diag = [[2., 3], [1., 3], [4., 4]]
normal_distributions = tfd.MultivariateNormalDiag(loc=loc, scale_diag=scale_diag)
normal_distributions
normal_distributions.sample(5)
# We are broadcasting batch shapes of `loc` and `scale_diag`
# against each other
loc = [[[0.3, 1.5, 1.], [0.2, 0.4, 2.8]],
[[2., 2.3, 8], [1.4, 1., 1.3]]]
scale_diag = [0.4, 1., 0.7]
normal_distributions = tfd.MultivariateNormalDiag(loc=loc, scale_diag=scale_diag)
normal_distributions
ind_normal_distribution = tfd.Independent(normal_distributions,
reinterpreted_batch_ndims=1)
ind_normal_distribution
samples = ind_normal_distribution.sample(5)
samples.shape
inp = tf.random.uniform((2, 2, 3))
ind_normal_distribution.log_prob(inp)
inp = tf.random.uniform((2, 3))
ind_normal_distribution.log_prob(inp)
inp = tf.random.uniform((9, 2, 2, 3))
ind_normal_distribution.log_prob(inp)
inp = tf.random.uniform((5, 1, 2, 1))
ind_normal_distribution.log_prob(inp)
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import f1_score
# 1) Fetches the 20 newsgroup dataset
# 2) Performs a word count on the articles and binarizes the result
# 3) Returns the data as a numpy matrix with the labels
def get_data(categories):
newsgroups_train_data = fetch_20newsgroups(data_home='./dataset/20_Newsgroup_Data/',
subset='train', categories=categories)
newsgroups_test_data = fetch_20newsgroups(data_home='./dataset/20_Newsgroup_Data/',
subset='test', categories=categories)
n_documents = len(newsgroups_train_data['data'])
count_vectorizer = CountVectorizer(input='content', binary=True,max_df=0.25, min_df=1.01/n_documents)
train_binary_bag_of_words = count_vectorizer.fit_transform(newsgroups_train_data['data'])
test_binary_bag_of_words = count_vectorizer.transform(newsgroups_test_data['data'])
return (train_binary_bag_of_words.todense(), newsgroups_train_data['target']), (test_binary_bag_of_words.todense(), newsgroups_test_data['target'])
# to occur in every class.
def laplace_smoothing(labels, binary_data, n_classes):
# Compute the parameter estimates (adjusted fraction of documents in class that contain word)
n_words = binary_data.shape[1]
alpha = 1 # parameters for Laplace smoothing
theta = np.zeros([n_classes, n_words]) # stores parameter values - prob. word given class
for c_k in range(n_classes): # 0, 1, ..., 19
class_mask = (labels == c_k)
N = class_mask.sum() # number of articles in class
theta[c_k, :] = (binary_data[class_mask, :].sum(axis=0) + alpha)/(N + alpha*2)
return theta
categories = ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
(train_data, train_labels), (test_data, test_labels) = get_data(categories=categories)
smoothed_counts = laplace_smoothing(labels=train_labels, binary_data=train_data, n_classes=len(categories))
To now make our NB classifier we need to build three functions:
- Compute the class priors
- Build our class conditional distributions
- Put it all together and classify our data
# the dataset
def class_priors(n_classes, labels):
counts = np.zeros(n_classes)
for c_k in range(n_classes):
counts[c_k] = np.sum(np.where(labels==c_k, 1, 0))
priors = counts / np.sum(counts)
print('The class priors are {}'.format(priors))
return priors
priors = class_priors(n_classes=len(categories), labels=train_labels)
# batch_shape=number of classes and event_shape=number of features.
def make_distribution(probs):
batch_of_bernoulli = tfd.Bernoulli(probs=probs)
dist = tfd.Independent(batch_of_bernoulli, reinterpreted_batch_ndims=1)
return dist
tf_dist = make_distribution(smoothed_counts)
tf_dist
# 1) Computes the class conditional probabilities given the sample
# 2) Forms the joint likelihood
# 3) Normalises the joint likelihood and returns the log prob
def predict_sample(dist, sample, priors):
cond_probs = dist.log_prob(sample)
joint_likelihood = tf.add(np.log(priors), cond_probs)
norm_factor = tf.math.reduce_logsumexp(joint_likelihood, axis=-1, keepdims=True)
log_prob = joint_likelihood - norm_factor
return log_prob
log_probs = predict_sample(tf_dist, test_data[0], priors)
log_probs
probabilities = []
for sample, label in zip(test_data, test_labels):
probabilities.append(tf.exp(predict_sample(tf_dist, sample, priors)))
probabilities = np.asarray(probabilities)
predict_class = np.argmax(probabilities, axis=-1)
print('F1 score: ', f1_score(test_labels, predict_class, average='macro'))
clf = BernoulliNB(alpha=1)
clf.fit(train_data, train_labels)
pred = clf.predict(test_data)
print('F1 from sklearn: ', f1_score(test_labels, pred, average='macro'))