import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (8, 8)

Learning curves

Learning the digits

You're going to build a model on the digits dataset, a sample dataset that comes pre-loaded with scikit learn. The digits dataset consist of 8x8 pixel handwritten digits from 0 to 9: digits You want to distinguish between each of the 10 possible digits given an image, so we are dealing with multi-class classification. The dataset has already been partitioned into X_train, y_train, X_test, and y_test, using 30% of the data as testing data. The labels are already one-hot encoded vectors, so you don't need to use Keras to_categorical() function.

Let's build this new model!

from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

X = np.load('./dataset/digits_pixels.npy')
y = np.load('./dataset/digits_target.npy')

y = to_categorical(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# Initialize a Sequential model
model = Sequential()

# Input and hidden layer with input_shape, 16 neurons, and relu
model.add(Dense(16, input_shape=(64, ), activation='relu'))

# Output layer with 10 neurons (one per digit) and softmax
model.add(Dense(10, activation='softmax'))

# Compile your model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Test if your model is well assembled by predicting before training
print(model.predict(X_train))

[[3.2095650e-07 1.8086675e-08 5.8638907e-06 ... 6.3938733e-06
  6.9842967e-03 2.7885156e-07]
 [5.7079029e-07 4.0931591e-05 2.3189692e-02 ... 1.0301257e-03
  3.9101571e-01 6.1804475e-04]
 [2.6313660e-06 2.0355757e-02 1.2905787e-05 ... 7.8606681e-04
  2.4483080e-03 5.4859847e-04]
 ...
 [2.8147338e-06 4.8216735e-03 6.0069469e-06 ... 6.6422946e-03
  1.5737003e-03 1.4798595e-04]
 [3.2802715e-04 1.2217839e-02 1.7282410e-04 ... 3.1691663e-02
  6.9211674e-04 3.9870315e-03]
 [4.8517606e-05 1.6379059e-05 2.5347804e-03 ... 3.9538622e-02
  1.1447263e-01 6.7417235e-03]]

Predicting on training data inputs before training can help you quickly check that your model works as expected.

Is the model overfitting?

Let's train the model you just built and plot its learning curve to check out if it's overfitting!

def plot_loss(loss,val_loss):
    plt.figure();
    plt.plot(loss);
    plt.plot(val_loss);
    plt.title('Model loss');
    plt.ylabel('Loss');
    plt.xlabel('Epoch');
    plt.legend(['Train', 'Test'], loc='upper right');

h_callback = model.fit(X_train, y_train, epochs=60, validation_data=(X_test, y_test), verbose=0)

# Extract from the h_callback object loss and val_loss to plot the learning curve
plot_loss(h_callback.history['loss'], h_callback.history['val_loss'])

This graph doesn't show overfitting but convergence. It looks like your model has learned all it could from the data and it no longer improves. The test loss, although higher than the training loss, is not getting worse, so we aren't overfitting to the training data.

Do we need more data?

It's time to check whether the digits dataset model you built benefits from more training examples!

In order to keep code to a minimum, various things are already initialized and ready to use:

The model you just built.
X_train,y_train,X_test, and y_test.
The initial_weights of your model, saved after using model.get_weights().
A pre-defined list of training sizes: training_sizes.
A pre-defined early stopping callback monitoring loss: early_stop.
Two empty lists to store the evaluation results: train_accs and test_accs.

Train your model on the different training sizes and evaluate the results on X_test. End by plotting the results with plot_results().

def plot_results(train_accs,test_accs):
    plt.plot(training_sizes, train_accs, 'o-', label="Training Accuracy");
    plt.plot(training_sizes, test_accs, 'o-', label="Test Accuracy");
    plt.title('Accuracy vs Number of training samples');
    plt.xlabel('# of training samples');
    plt.ylabel('Accuracy');
    plt.legend(loc="best");

initial_weights = model.get_weights()

from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='loss', patience=1)
train_accs, test_accs = [], []
training_sizes = [125, 502, 879, 1255]

for size in training_sizes:
    # Get a fraction of training data (we only care about the training data)
    X_train_frac, y_train_frac = X_train[:size], y_train[:size]
    
    # Reset the model to the initial weights and train it on the new training data fraction
    model.set_weights(initial_weights)
    model.fit(X_train_frac, y_train_frac, epochs=50, callbacks=[early_stop])
    
    # Evaluate and store both: the training data fraction and the complete test set results
    train_accs.append(model.evaluate(X_train_frac, y_train_frac)[1])
    test_accs.append(model.evaluate(X_test, y_test)[1])
    
# Plot train vs test accuracies
plot_results(train_accs, test_accs)

Epoch 1/50
4/4 [==============================] - 0s 2ms/step - loss: 0.0757 - accuracy: 0.9840
Epoch 2/50
4/4 [==============================] - 0s 982us/step - loss: 0.0693 - accuracy: 0.9840
Epoch 3/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0622 - accuracy: 0.9840
Epoch 4/50
4/4 [==============================] - 0s 966us/step - loss: 0.0574 - accuracy: 0.9840
Epoch 5/50
4/4 [==============================] - 0s 899us/step - loss: 0.0533 - accuracy: 0.9840
Epoch 6/50
4/4 [==============================] - 0s 892us/step - loss: 0.0497 - accuracy: 0.9840
Epoch 7/50
4/4 [==============================] - 0s 872us/step - loss: 0.0458 - accuracy: 0.9920
Epoch 8/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0430 - accuracy: 0.9920
Epoch 9/50
4/4 [==============================] - 0s 902us/step - loss: 0.0395 - accuracy: 1.0000
Epoch 10/50
4/4 [==============================] - 0s 851us/step - loss: 0.0379 - accuracy: 1.0000
Epoch 11/50
4/4 [==============================] - 0s 866us/step - loss: 0.0359 - accuracy: 1.0000
Epoch 12/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0343 - accuracy: 1.0000
Epoch 13/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0330 - accuracy: 1.0000
Epoch 14/50
4/4 [==============================] - 0s 891us/step - loss: 0.0316 - accuracy: 1.0000
Epoch 15/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0301 - accuracy: 1.0000
Epoch 16/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0289 - accuracy: 1.0000
Epoch 17/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0275 - accuracy: 1.0000
Epoch 18/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0268 - accuracy: 1.0000
Epoch 19/50
4/4 [==============================] - 0s 940us/step - loss: 0.0258 - accuracy: 1.0000
Epoch 20/50
4/4 [==============================] - 0s 850us/step - loss: 0.0249 - accuracy: 1.0000
Epoch 21/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0240 - accuracy: 1.0000
Epoch 22/50
4/4 [==============================] - 0s 927us/step - loss: 0.0234 - accuracy: 1.0000
Epoch 23/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0227 - accuracy: 1.0000
Epoch 24/50
4/4 [==============================] - 0s 852us/step - loss: 0.0221 - accuracy: 1.0000
Epoch 25/50
4/4 [==============================] - 0s 791us/step - loss: 0.0216 - accuracy: 1.0000
Epoch 26/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0211 - accuracy: 1.0000
Epoch 27/50
4/4 [==============================] - 0s 859us/step - loss: 0.0205 - accuracy: 1.0000
Epoch 28/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0201 - accuracy: 1.0000
Epoch 29/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0196 - accuracy: 1.0000
Epoch 30/50
4/4 [==============================] - 0s 906us/step - loss: 0.0192 - accuracy: 1.0000
Epoch 31/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0188 - accuracy: 1.0000
Epoch 32/50
4/4 [==============================] - 0s 972us/step - loss: 0.0184 - accuracy: 1.0000
Epoch 33/50
4/4 [==============================] - 0s 935us/step - loss: 0.0180 - accuracy: 1.0000
Epoch 34/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0177 - accuracy: 1.0000
Epoch 35/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0172 - accuracy: 1.0000
Epoch 36/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0170 - accuracy: 1.0000
Epoch 37/50
4/4 [==============================] - 0s 2ms/step - loss: 0.0167 - accuracy: 1.0000
Epoch 38/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0163 - accuracy: 1.0000
Epoch 39/50
4/4 [==============================] - 0s 918us/step - loss: 0.0161 - accuracy: 1.0000
Epoch 40/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0158 - accuracy: 1.0000
Epoch 41/50
4/4 [==============================] - 0s 859us/step - loss: 0.0156 - accuracy: 1.0000
Epoch 42/50
4/4 [==============================] - 0s 957us/step - loss: 0.0153 - accuracy: 1.0000
Epoch 43/50
4/4 [==============================] - 0s 863us/step - loss: 0.0150 - accuracy: 1.0000
Epoch 44/50
4/4 [==============================] - 0s 894us/step - loss: 0.0148 - accuracy: 1.0000
Epoch 45/50
4/4 [==============================] - 0s 970us/step - loss: 0.0146 - accuracy: 1.0000
Epoch 46/50
4/4 [==============================] - 0s 902us/step - loss: 0.0143 - accuracy: 1.0000
Epoch 47/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0141 - accuracy: 1.0000
Epoch 48/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0139 - accuracy: 1.0000
Epoch 49/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0137 - accuracy: 1.0000
Epoch 50/50
4/4 [==============================] - 0s 1ms/step - loss: 0.0135 - accuracy: 1.0000
4/4 [==============================] - 0s 2ms/step - loss: 0.0133 - accuracy: 1.0000
17/17 [==============================] - 0s 1ms/step - loss: 0.2311 - accuracy: 0.9370
Epoch 1/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0563 - accuracy: 0.9900
Epoch 2/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0522 - accuracy: 0.9880
Epoch 3/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0487 - accuracy: 0.9900
Epoch 4/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0453 - accuracy: 0.9940
Epoch 5/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0431 - accuracy: 0.9940
Epoch 6/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0411 - accuracy: 0.9940
Epoch 7/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0390 - accuracy: 0.9940
Epoch 8/50
16/16 [==============================] - 0s 999us/step - loss: 0.0379 - accuracy: 0.9940
Epoch 9/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0358 - accuracy: 0.9940
Epoch 10/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0345 - accuracy: 0.9940
Epoch 11/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0334 - accuracy: 0.9960
Epoch 12/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0321 - accuracy: 0.9960
Epoch 13/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0308 - accuracy: 0.9960
Epoch 14/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0303 - accuracy: 0.9960
Epoch 15/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0292 - accuracy: 0.9960
Epoch 16/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0281 - accuracy: 0.9960
Epoch 17/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0270 - accuracy: 0.9960
Epoch 18/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0257 - accuracy: 0.9960
Epoch 19/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0254 - accuracy: 0.9960
Epoch 20/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0244 - accuracy: 0.9980
Epoch 21/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0237 - accuracy: 1.0000
Epoch 22/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0232 - accuracy: 0.9980
Epoch 23/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0223 - accuracy: 1.0000
Epoch 24/50
16/16 [==============================] - 0s 999us/step - loss: 0.0219 - accuracy: 1.0000
Epoch 25/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0211 - accuracy: 1.0000
Epoch 26/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0206 - accuracy: 1.0000
Epoch 27/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0199 - accuracy: 1.0000
Epoch 28/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0194 - accuracy: 1.0000
Epoch 29/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0188 - accuracy: 1.0000
Epoch 30/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0187 - accuracy: 1.0000
Epoch 31/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0181 - accuracy: 1.0000
Epoch 32/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0175 - accuracy: 1.0000
Epoch 33/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0166 - accuracy: 1.0000
Epoch 34/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0163 - accuracy: 1.0000
Epoch 35/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0157 - accuracy: 1.0000
Epoch 36/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0153 - accuracy: 1.0000
Epoch 37/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0148 - accuracy: 1.0000
Epoch 38/50
16/16 [==============================] - 0s 946us/step - loss: 0.0144 - accuracy: 1.0000
Epoch 39/50
16/16 [==============================] - 0s 971us/step - loss: 0.0139 - accuracy: 1.0000
Epoch 40/50
16/16 [==============================] - 0s 953us/step - loss: 0.0137 - accuracy: 1.0000
Epoch 41/50
16/16 [==============================] - 0s 935us/step - loss: 0.0132 - accuracy: 1.0000
Epoch 42/50
16/16 [==============================] - 0s 958us/step - loss: 0.0131 - accuracy: 1.0000
Epoch 43/50
16/16 [==============================] - 0s 999us/step - loss: 0.0127 - accuracy: 1.0000
Epoch 44/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0124 - accuracy: 1.0000
Epoch 45/50
16/16 [==============================] - 0s 889us/step - loss: 0.0121 - accuracy: 1.0000
Epoch 46/50
16/16 [==============================] - 0s 892us/step - loss: 0.0119 - accuracy: 1.0000
Epoch 47/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0116 - accuracy: 1.0000
Epoch 48/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0114 - accuracy: 1.0000
Epoch 49/50
16/16 [==============================] - 0s 994us/step - loss: 0.0111 - accuracy: 1.0000
Epoch 50/50
16/16 [==============================] - 0s 1ms/step - loss: 0.0109 - accuracy: 1.0000
16/16 [==============================] - 0s 922us/step - loss: 0.0105 - accuracy: 1.0000
17/17 [==============================] - 0s 847us/step - loss: 0.2073 - accuracy: 0.9407
Epoch 1/50
28/28 [==============================] - 0s 988us/step - loss: 0.0648 - accuracy: 0.9852
Epoch 2/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0583 - accuracy: 0.9863
Epoch 3/50
28/28 [==============================] - 0s 968us/step - loss: 0.0552 - accuracy: 0.9875
Epoch 4/50
28/28 [==============================] - 0s 948us/step - loss: 0.0520 - accuracy: 0.9863
Epoch 5/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0477 - accuracy: 0.9875
Epoch 6/50
28/28 [==============================] - 0s 969us/step - loss: 0.0454 - accuracy: 0.9909
Epoch 7/50
28/28 [==============================] - 0s 965us/step - loss: 0.0423 - accuracy: 0.9920
Epoch 8/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0418 - accuracy: 0.9909
Epoch 9/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0381 - accuracy: 0.9932
Epoch 10/50
28/28 [==============================] - 0s 974us/step - loss: 0.0365 - accuracy: 0.9898
Epoch 11/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0346 - accuracy: 0.9920
Epoch 12/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0329 - accuracy: 0.9909
Epoch 13/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0308 - accuracy: 0.9943
Epoch 14/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0303 - accuracy: 0.9932
Epoch 15/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0278 - accuracy: 0.9954
Epoch 16/50
28/28 [==============================] - 0s 981us/step - loss: 0.0253 - accuracy: 0.9977
Epoch 17/50
28/28 [==============================] - 0s 939us/step - loss: 0.0248 - accuracy: 0.9977
Epoch 18/50
28/28 [==============================] - 0s 978us/step - loss: 0.0246 - accuracy: 0.9966
Epoch 19/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0231 - accuracy: 0.9977
Epoch 20/50
28/28 [==============================] - 0s 926us/step - loss: 0.0215 - accuracy: 1.0000
Epoch 21/50
28/28 [==============================] - 0s 910us/step - loss: 0.0213 - accuracy: 0.9989
Epoch 22/50
28/28 [==============================] - 0s 919us/step - loss: 0.0198 - accuracy: 1.0000
Epoch 23/50
28/28 [==============================] - 0s 966us/step - loss: 0.0192 - accuracy: 1.0000
Epoch 24/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0182 - accuracy: 1.0000
Epoch 25/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0179 - accuracy: 1.0000
Epoch 26/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0178 - accuracy: 0.9989
Epoch 27/50
28/28 [==============================] - 0s 978us/step - loss: 0.0165 - accuracy: 1.0000
Epoch 28/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0152 - accuracy: 1.0000
Epoch 29/50
28/28 [==============================] - 0s 1ms/step - loss: 0.0148 - accuracy: 1.0000
Epoch 30/50
28/28 [==============================] - 0s 934us/step - loss: 0.0142 - accuracy: 1.0000
Epoch 31/50
28/28 [==============================] - 0s 1000us/step - loss: 0.0135 - accuracy: 1.0000
Epoch 32/50
28/28 [==============================] - 0s 963us/step - loss: 0.0142 - accuracy: 1.0000
28/28 [==============================] - 0s 1ms/step - loss: 0.0122 - accuracy: 1.0000
17/17 [==============================] - 0s 964us/step - loss: 0.1877 - accuracy: 0.9519
Epoch 1/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0609 - accuracy: 0.9865
Epoch 2/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0608 - accuracy: 0.9849
Epoch 3/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0541 - accuracy: 0.9873
Epoch 4/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0540 - accuracy: 0.9888
Epoch 5/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0475 - accuracy: 0.9920
Epoch 6/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0425 - accuracy: 0.9928
Epoch 7/50
40/40 [==============================] - 0s 977us/step - loss: 0.0420 - accuracy: 0.9920
Epoch 8/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0354 - accuracy: 0.9920
Epoch 9/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0352 - accuracy: 0.9928
Epoch 10/50
40/40 [==============================] - 0s 992us/step - loss: 0.0329 - accuracy: 0.9936
Epoch 11/50
40/40 [==============================] - 0s 953us/step - loss: 0.0310 - accuracy: 0.9952
Epoch 12/50
40/40 [==============================] - 0s 1ms/step - loss: 0.0288 - accuracy: 0.9952
Epoch 13/50
40/40 [==============================] - 0s 962us/step - loss: 0.0264 - accuracy: 0.9968
Epoch 14/50
40/40 [==============================] - 0s 967us/step - loss: 0.0266 - accuracy: 0.9952
40/40 [==============================] - 0s 844us/step - loss: 0.0218 - accuracy: 0.9976
17/17 [==============================] - 0s 885us/step - loss: 0.1823 - accuracy: 0.9593

Activation functions

process

Comparing activation functions

Comparing activation functions involves a bit of coding, but nothing you can't do!

You will try out different activation functions on the multi-label model you built for your farm irrigation machine in chapter 2. The function get_model('relu') returns a copy of this model and applies the 'relu' activation function to its hidden layer.

You will loop through several activation functions, generate a new model for each and train it. By storing the history callback in a dictionary you will be able to visualize which activation function performed best in the next exercise!

irrigation = pd.read_csv('./dataset/irrigation_machine.csv', index_col=0)
irrigation.head()

parcels = irrigation[['parcel_0', 'parcel_1', 'parcel_2']].to_numpy()
sensors = irrigation.drop(['parcel_0', 'parcel_1', 'parcel_2'], axis=1).to_numpy()

X_train, X_test, y_train, y_test = \
    train_test_split(sensors, parcels, test_size=0.3, stratify=parcels)

np.random.seed(1)

# Return a new model with the given activation
def get_model(act_function):
    model = Sequential()
    if act_function == 'leaky_relu':
        model.add(Dense(64, input_shape=(20, ), activation=tf.nn.leaky_relu))
    else:
        model.add(Dense(64, input_shape=(20, ), activation=act_function))

    # Add an output layer of 3 neurons with sigmoid activation
    model.add(Dense(3, activation='sigmoid'))
    # Compile your model with binary crossentropy loss
    model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
    return model

activations = ['relu', 'leaky_relu', 'sigmoid', 'tanh']

# Loop over the activation functions
activation_results = {}

for act in activations:
    # Get a new model with the current activation
    model = get_model(act)
    
    # Fit the model and store the history results
    h_callback = model.fit(X_train, y_train, epochs=100, validation_data=(X_test, y_test), verbose=0)
    activation_results[act] = h_callback
    print('Finishing with {}...'.format(act))

Finishing with relu...
Finishing with leaky_relu...
Finishing with sigmoid...
Finishing with tanh...

Comparing activation functions II

For every h_callback of each activation function in activation_results:

The h_callback.history['val_loss'] has been extracted.
The h_callback.history['val_acc'] has been extracted.

val_loss_per_function = {}
val_acc_per_function = {}

for k, v in activation_results.items():
    val_loss_per_function[k] = v.history['val_loss']
    val_acc_per_function[k] = v.history['val_accuracy']

val_loss = pd.DataFrame(val_loss_per_function)

# Call plot on the dataframe
val_loss.plot(title='validation Loss');

# Create a dataframe from val_acc_per_function
val_acc = pd.DataFrame(val_acc_per_function)

# Call plot on the dataframe
val_acc.plot(title='validation Accuracy');

You've plotted both: loss and accuracy curves. It looks like sigmoid activation worked best for this particular model as the hidden layer's activation function. It led to a model with lower validation loss and higher accuracy after 100 epochs.

Batch size and batch normalization

Mini-batches
- Advantages
  - Networks train faster (more weight updates in same amount of time)
  - Less RAM memory required, can train on huge datasets
  - Noise can help networks reach a lower error, escaping local minima
- Disadvantage
  - More iterations need to be run
  - Need to be adjusted, we need to find a good batch size
Batch Normalization
- Advantages
  - Improves gradient flow
  - Allows higher learning rates
  - Reduces dependence on weight initializations
  - Acts as an unintended form of regularization
  - Limits inter covariate shift

Changing batch sizes

You've seen models are usually trained in batches of a fixed size. The smaller a batch size, the more weight updates per epoch, but at a cost of a more unstable gradient descent. Specially if the batch size is too small and it's not representative of the entire training set.

Let's see how different batch sizes affect the accuracy of a simple binary classification model that separates red from blue dots.

You'll use a batch size of one, updating the weights once per sample in your training set for each epoch. Then you will use the entire dataset, updating the weights only once per epoch.

dots = pd.read_csv('./dataset/dots.csv')
dots.head()

X = dots.iloc[:, :-1]
y = dots.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

def get_model():
    model = Sequential()
    model.add(Dense(4,input_shape=(2,),activation='relu'))
    model.add(Dense(1,activation="sigmoid"))
    model.compile('sgd', 'binary_crossentropy', metrics=['accuracy'])
    return model

model = get_model()

# Train your model for 5 epochs with a batch size of 1
model.fit(X_train, y_train, epochs=5, batch_size=1)
print("\n The accuracy when using a batch of size 1 is: ", model.evaluate(X_test, y_test)[1])

Epoch 1/5
700/700 [==============================] - 1s 908us/step - loss: 0.6371 - accuracy: 0.5986
Epoch 2/5
700/700 [==============================] - 1s 897us/step - loss: 0.5491 - accuracy: 0.8343
Epoch 3/5
700/700 [==============================] - 1s 859us/step - loss: 0.4739 - accuracy: 0.8786
Epoch 4/5
700/700 [==============================] - 1s 854us/step - loss: 0.4095 - accuracy: 0.9014
Epoch 5/5
700/700 [==============================] - 1s 878us/step - loss: 0.3501 - accuracy: 0.9157
10/10 [==============================] - 0s 803us/step - loss: 0.3399 - accuracy: 0.9200

 The accuracy when using a batch of size 1 is:  0.9200000166893005

model = get_model()

# Fit your model for 5 epochs with a batch of size the training set
model.fit(X_train, y_train, epochs=5, batch_size=X_train.shape[0])
print("\n The accuracy when using the whole training set as batch-size was: ",
      model.evaluate(X_test, y_test)[1])

Epoch 1/5
1/1 [==============================] - 0s 751us/step - loss: 0.6923 - accuracy: 0.5357
Epoch 2/5
1/1 [==============================] - 0s 622us/step - loss: 0.6921 - accuracy: 0.5357
Epoch 3/5
1/1 [==============================] - 0s 519us/step - loss: 0.6919 - accuracy: 0.5357
Epoch 4/5
1/1 [==============================] - 0s 515us/step - loss: 0.6917 - accuracy: 0.5314
Epoch 5/5
1/1 [==============================] - 0s 475us/step - loss: 0.6915 - accuracy: 0.5286
10/10 [==============================] - 0s 828us/step - loss: 0.6952 - accuracy: 0.4367

 The accuracy when using the whole training set as batch-size was:  0.43666666746139526

You can see that accuracy is lower when using a batch size equal to the training set size. This is not because the network had more trouble learning the optimization function: Even though the same number of epochs were used for both batch sizes the number of resulting weight updates was very different!. With a batch of size the training set and 5 epochs we only get 5 updates total, each update computes and averaged gradient descent with all the training set observations. To obtain similar results with this batch size we should increase the number of epochs so that more weight updates take place.

Batch normalizing a familiar model

Remember the digits dataset you trained in the first exercise of this chapter?

digits

A multi-class classification problem that you solved using softmax and 10 neurons in your output layer. You will now build a new deeper model consisting of 3 hidden layers of 50 neurons each, using batch normalization in between layers. The kernel_initializer parameter is used to initialize weights in a similar way.

from tensorflow.keras.layers import BatchNormalization

# Build your deep network
batchnorm_model = Sequential()
batchnorm_model.add(Dense(50, input_shape=(64, ), activation='relu', kernel_initializer='normal'))
batchnorm_model.add(BatchNormalization())
batchnorm_model.add(Dense(50, activation='relu', kernel_initializer='normal'))
batchnorm_model.add(BatchNormalization())
batchnorm_model.add(Dense(50, activation='relu', kernel_initializer='normal'))
batchnorm_model.add(BatchNormalization())
batchnorm_model.add(Dense(10, activation='softmax', kernel_initializer='normal'))

# Compile your model with sgd
batchnorm_model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

Batch normalization effects

Batch normalization tends to increase the learning speed of our models and make their learning curves more stable. Let's see how two identical models with and without batch normalization compare.

The model you just built batchnorm_model is loaded for you to use. An exact copy of it without batch normalization: standard_model, is available as well.

You will compare the accuracy learning curves for both models plotting them with compare_histories_acc().

def compare_histories_acc(h1,h2):
    plt.plot(h1.history['accuracy']);
    plt.plot(h1.history['val_accuracy']);
    plt.plot(h2.history['accuracy']);
    plt.plot(h2.history['val_accuracy']);
    plt.title("Batch Normalization Effects");
    plt.xlabel('Epoch');
    plt.ylabel('Accuracy');
    plt.legend(['Train', 'Test', 'Train with Batch Normalization', 'Test with Batch Normalization'], loc='best');

X = np.load('./dataset/digits_pixels.npy')
y = np.load('./dataset/digits_target.npy')

y = to_categorical(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

standard_model = Sequential()
standard_model.add(Dense(50, input_shape=(64, ), activation='relu', kernel_initializer='normal'))
standard_model.add(Dense(50, activation='relu', kernel_initializer='normal'))
standard_model.add(Dense(50, activation='relu', kernel_initializer='normal'))
standard_model.add(Dense(10, activation='softmax', kernel_initializer='normal'))

# Compile your model with sgd
standard_model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

h1_callback = standard_model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, verbose=0)

# Train the batch normalized model you recently built, store its history callback
h2_callback = batchnorm_model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, verbose=0)

# Call compare_histories_acc passing in both model histories
compare_histories_acc(h1_callback, h2_callback)

You can see that for this deep model batch normalization proved to be useful, helping the model obtain high accuracy values just over the first 10 training epochs.

Hyperparameter tuning

Neural network hyperparameters
- Number of layers
- Number of neurons per layer
- Layer order
- Layer activations
- Batch sizes
- Learning rates
- Optimizers
- ...
Tips for neural networks hyperparameter tuning
- Random search is preferred over grid search
- Don't use many epochs
- Use a smaller sample of your dataset
- Play with batch sizes, activations, optimizers and learning rates

Preparing a model for tuning

Let's tune the hyperparameters of a binary classification model that does well classifying the breast cancer dataset.

You've seen that the first step to turn a model into a sklearn estimator is to build a function that creates it. The definition of this function is important since hyperparameter tuning is carried out by varying the arguments your function receives.

Build a simple create_model() function that receives both a learning rate and an activation function as arguments.

def create_model(learning_rate=0.01, activation='relu'):
    # Create an Adam optimizer with the given learning rate
    opt = tf.keras.optimizers.Adam(lr=learning_rate)
    
    # Create your binary classification model
    model = Sequential()
    model.add(Dense(128, input_shape=(30, ), activation=activation))
    model.add(Dense(256, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile your model with your optimizer, loss and metrics
    model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
    return model

Tuning the model parameters

It's time to try out different parameters on your model and see how well it performs!

The create_model() function you built in the previous exercise is ready for you to use.

Since fitting the RandomizedSearchCV object would take too long, the results you'd get are printed in the show_results() function. You could try random_search.fit(X,y) in the console yourself to check it does work after you have built everything else, but you will probably timeout the exercise (so copy your code first if you try this or you can lose your progress!).

You don't need to use the optional epochs and batch_size parameters when building your KerasClassifier object since you are passing them as params to the random search and this works already.

from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import KFold

# Create a KerasClassifier
model = KerasClassifier(build_fn=create_model)

# Define the parameters to try out
params = {
    'activation': ['relu', 'tanh'],
    'batch_size': [32, 128, 256],
    'epochs': [50, 100, 200],
    'learning_rate': [0.1, 0.01, 0.001]
}

# Create a randomize search cv object passing in the parameters to try out
random_search = RandomizedSearchCV(model, param_distributions=params, cv=KFold(3))
random_search_results = random_search.fit(X, y, verbose=0)

# Print results
print("Best: {} using {}".format(random_search_results.best_score_,
                                 random_search_results.best_params_))

6/6 [==============================] - 0s 836us/step - loss: 0.2679 - accuracy: 0.8579
6/6 [==============================] - 0s 978us/step - loss: 0.2163 - accuracy: 0.9000
6/6 [==============================] - 0s 876us/step - loss: 0.4303 - accuracy: 0.8042
2/2 [==============================] - 0s 893us/step - loss: 0.8017 - accuracy: 0.4895
2/2 [==============================] - 0s 920us/step - loss: 0.8923 - accuracy: 0.6211
2/2 [==============================] - 0s 840us/step - loss: 0.6281 - accuracy: 0.7725
6/6 [==============================] - 0s 972us/step - loss: 0.7374 - accuracy: 0.5105
6/6 [==============================] - 0s 988us/step - loss: 0.7063 - accuracy: 0.3789
6/6 [==============================] - 0s 1ms/step - loss: 0.5781 - accuracy: 0.7725
1/1 [==============================] - 0s 643us/step - loss: 0.6987 - accuracy: 0.5105
1/1 [==============================] - 0s 604us/step - loss: 0.9258 - accuracy: 0.3789
1/1 [==============================] - 0s 756us/step - loss: 0.6346 - accuracy: 0.7725
6/6 [==============================] - 0s 863us/step - loss: 0.6929 - accuracy: 0.5105
6/6 [==============================] - 0s 992us/step - loss: 1.3970 - accuracy: 0.3789
6/6 [==============================] - 0s 822us/step - loss: 0.7859 - accuracy: 0.2275
2/2 [==============================] - 0s 1ms/step - loss: 0.3839 - accuracy: 0.8842
2/2 [==============================] - 0s 886us/step - loss: 0.2792 - accuracy: 0.9053
2/2 [==============================] - 0s 845us/step - loss: 0.3960 - accuracy: 0.8519
1/1 [==============================] - 0s 628us/step - loss: 0.2547 - accuracy: 0.8947
1/1 [==============================] - 0s 752us/step - loss: 0.2135 - accuracy: 0.9105
1/1 [==============================] - 0s 623us/step - loss: 0.2748 - accuracy: 0.9101
6/6 [==============================] - 0s 871us/step - loss: 0.8303 - accuracy: 0.4895
6/6 [==============================] - 0s 852us/step - loss: 0.3743 - accuracy: 0.8947
6/6 [==============================] - 0s 872us/step - loss: 0.5416 - accuracy: 0.7725
2/2 [==============================] - 0s 891us/step - loss: 0.7883 - accuracy: 0.4895
2/2 [==============================] - 0s 866us/step - loss: 0.6587 - accuracy: 0.6263
2/2 [==============================] - 0s 964us/step - loss: 0.6417 - accuracy: 0.7725
2/2 [==============================] - 0s 805us/step - loss: 0.3068 - accuracy: 0.8895
2/2 [==============================] - 0s 942us/step - loss: 0.6565 - accuracy: 0.3789
2/2 [==============================] - 0s 854us/step - loss: 0.2880 - accuracy: 0.9206
Best: 0.9051053524017334 using {'learning_rate': 0.01, 'epochs': 50, 'batch_size': 256, 'activation': 'relu'}

Training with cross-validation

Time to train your model with the best parameters found: 0.01 for the learning rate, 100 epochs, a 128 batch_size and relu activations.

The create_model() function from the previous exercise is ready for you to use. X and y are loaded as features and labels.

Use the best values found for your model when creating your KerasClassifier object so that they are used when performing cross_validation.

End this chapter by training an awesome tuned model on the breast cancer dataset!

from sklearn.model_selection import cross_val_score

# Create a KerasClassifier
model = KerasClassifier(build_fn=create_model, 
                        epochs=100, batch_size=128, verbose=0)

# Calculate the accuracy score for each fold
kfolds = cross_val_score(model, X, y, cv=3)

# Print the mean accuracy
print('The mean accuracy was: ', kfolds.mean())

# Print the accuracy standard deviation
print('With a standard deviation of: ', kfolds.std())

The mean accuracy was:  0.9261858264605204
With a standard deviation of:  0.00430127209155373

	0	1	2
0	0.242655	0.038320	1
1	0.044330	-0.056673	1
2	-0.786777	-0.757186	0
3	0.004067	0.131172	1
4	-0.164107	0.150650	1

	sensor_0	sensor_1	sensor_2	sensor_3	sensor_4	sensor_5	sensor_6	sensor_7	sensor_8	sensor_9	...	sensor_13	sensor_14	sensor_15	sensor_16	sensor_17	sensor_18	sensor_19	parcel_0	parcel_1
0	1.0	2.0	1.0	7.0	0.0	1.0	1.0	4.0	0.0	3.0	...	8.0	1.0	0.0	2.0	1.0	9.0	2.0	0	1
1	5.0	1.0	3.0	5.0	2.0	2.0	1.0	2.0	3.0	1.0	...	4.0	5.0	5.0	2.0	2.0	2.0	7.0	0	0
2	3.0	1.0	4.0	3.0	4.0	0.0	1.0	6.0	0.0	2.0	...	3.0	3.0	1.0	0.0	3.0	1.0	0.0	1	1
3	2.0	2.0	4.0	3.0	5.0	0.0	3.0	2.0	2.0	5.0	...	4.0	1.0	1.0	4.0	1.0	3.0	2.0	0	0
4	4.0	3.0	3.0	2.0	5.0	1.0	3.0	1.0	1.0	2.0	...	1.0	3.0	2.0	2.0	1.0	1.0	0.0	1	1

	sensor_0	sensor_1	sensor_2	sensor_3	sensor_4	sensor_5	sensor_6	sensor_7	sensor_8	sensor_9	...	sensor_13	sensor_14	sensor_15	sensor_16	sensor_17	sensor_18	sensor_19	parcel_0	parcel_1
0	1.0	2.0	1.0	7.0	0.0	1.0	1.0	4.0	0.0	3.0	...	8.0	1.0	0.0	2.0	1.0	9.0	2.0	0	1
1	5.0	1.0	3.0	5.0	2.0	2.0	1.0	2.0	3.0	1.0	...	4.0	5.0	5.0	2.0	2.0	2.0	7.0	0	0
2	3.0	1.0	4.0	3.0	4.0	0.0	1.0	6.0	0.0	2.0	...	3.0	3.0	1.0	0.0	3.0	1.0	0.0	1	1
3	2.0	2.0	4.0	3.0	5.0	0.0	3.0	2.0	2.0	5.0	...	4.0	1.0	1.0	4.0	1.0	3.0	2.0	0	0
4	4.0	3.0	3.0	2.0	5.0	1.0	3.0	1.0	1.0	2.0	...	1.0	3.0	2.0	2.0	1.0	1.0	0.0	1	1

	sensor_0	sensor_1	sensor_2	sensor_3	sensor_4	sensor_5	sensor_6	sensor_7	sensor_8	sensor_9	...	sensor_13	sensor_14	sensor_15	sensor_16	sensor_17	sensor_18	sensor_19	parcel_0	parcel_1
0	1.0	2.0	1.0	7.0	0.0	1.0	1.0	4.0	0.0	3.0	...	8.0	1.0	0.0	2.0	1.0	9.0	2.0	0	1
1	5.0	1.0	3.0	5.0	2.0	2.0	1.0	2.0	3.0	1.0	...	4.0	5.0	5.0	2.0	2.0	2.0	7.0	0	0
2	3.0	1.0	4.0	3.0	4.0	0.0	1.0	6.0	0.0	2.0	...	3.0	3.0	1.0	0.0	3.0	1.0	0.0	1	1
3	2.0	2.0	4.0	3.0	5.0	0.0	3.0	2.0	2.0	5.0	...	4.0	1.0	1.0	4.0	1.0	3.0	2.0	0	0
4	4.0	3.0	3.0	2.0	5.0	1.0	3.0	1.0	1.0	2.0	...	1.0	3.0	2.0	2.0	1.0	1.0	0.0	1	1