import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams['figure.figsize'] = (8, 8)

Creating a keras model

  • Model building steps
    • Specify Architecture
    • Compile
    • Fit
    • Predict
      Note: In the lecture, keras framework was used. But in this page, keras with tensorflow (tf.keras) will be used.

Understanding your data

You will soon start building models in Keras to predict wages based on various professional and demographic factors. Before you start building a model, it's good to understand your data by performing some exploratory analysis.

The data is pre-loaded into a pandas DataFrame called df. Use the .head() and .describe() methods.

The target variable you'll be predicting is wage_per_hour. Some of the predictor variables are binary indicators, where a value of 1 represents True, and 0 represents False.

df = pd.read_csv('./dataset/hourly_wages.csv')
df.head()
wage_per_hour union education_yrs experience_yrs age female marr south manufacturing construction
0 5.10 0 8 21 35 1 1 0 1 0
1 4.95 0 9 42 57 1 1 0 1 0
2 6.67 0 12 1 19 0 0 0 1 0
3 4.00 0 12 4 22 0 0 0 0 0
4 7.50 0 12 17 35 0 1 0 0 0
df.describe()
wage_per_hour union education_yrs experience_yrs age female marr south manufacturing construction
count 534.000000 534.000000 534.000000 534.000000 534.000000 534.000000 534.000000 534.000000 534.000000 534.000000
mean 9.024064 0.179775 13.018727 17.822097 36.833333 0.458801 0.655431 0.292135 0.185393 0.044944
std 5.139097 0.384360 2.615373 12.379710 11.726573 0.498767 0.475673 0.455170 0.388981 0.207375
min 1.000000 0.000000 2.000000 0.000000 18.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 5.250000 0.000000 12.000000 8.000000 28.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 7.780000 0.000000 12.000000 15.000000 35.000000 0.000000 1.000000 0.000000 0.000000 0.000000
75% 11.250000 0.000000 15.000000 26.000000 44.000000 1.000000 1.000000 1.000000 0.000000 0.000000
max 44.500000 1.000000 18.000000 55.000000 64.000000 1.000000 1.000000 1.000000 1.000000 1.000000

Specifying a model

Now you'll get to work with your first model in Keras, and will immediately be able to run more complex neural network models on larger datasets compared to the first two chapters.

To start, you'll take the skeleton of a neural network and add a hidden layer and an output layer. You'll then fit that model and see Keras do the optimization so your model continually gets better.

As a start, you'll predict workers wages based on characteristics like their industry, education and level of experience. You can find the dataset in a pandas dataframe called df. For convenience, everything in df except for the target has been converted to a NumPy matrix called predictors. The target, wage_per_hour, is available as a NumPy matrix called target.

import tensorflow as tf

predictors = df.iloc[:, 1:].to_numpy()
target = df.iloc[:, 0].to_numpy()
n_cols = predictors.shape[1]

# Set up the model: model
model = tf.keras.Sequential()

# Add the first layer
model.add(tf.keras.layers.Dense(50, activation='relu', input_shape=(n_cols, )))

# Add the second layer
model.add(tf.keras.layers.Dense(32, activation='relu'))

# Add the output layer
model.add(tf.keras.layers.Dense(1))

Compiling and fitting a model

  • Why you need to compile your model
    • Specify the optimizer
      • Many options and mathematically complex
      • "Adam" is usually a good choice
    • Loss function
      • "mean_squared_error"
  • Fitting a model
    • Applying backpropagation and gradient descent with your data to update the weights
    • Scaling data before fitting can ease optimization

Compiling the model

You're now going to compile the model you specified earlier. To compile the model, you need to specify the optimizer and loss function to use. You can read more about 'adam' optimizer as well as other keras optimizers here, and if you are really curious to learn more, you can read the original paper that introduced the Adam optimizer.

In this exercise, you'll use the Adam optimizer and the mean squared error loss function. Go for it!

model.compile(optimizer='adam', loss='mean_squared_error')

# Verify that model contains information from compiling
print("Loss function: " + model.loss)
Loss function: mean_squared_error

Fitting the model

You're at the most fun part. You'll now fit the model. Recall that the data to be used as predictive features is loaded in a NumPy matrix called predictors and the data to be predicted is stored in a NumPy matrix called target. Your model is pre-written and it has been compiled with the code from the previous exercise.

model.fit(predictors, target, epochs=10);
Epoch 1/10
17/17 [==============================] - 0s 1ms/step - loss: 31.8248
Epoch 2/10
17/17 [==============================] - 0s 1ms/step - loss: 26.8120
Epoch 3/10
17/17 [==============================] - 0s 1ms/step - loss: 24.3179
Epoch 4/10
17/17 [==============================] - 0s 984us/step - loss: 22.8884
Epoch 5/10
17/17 [==============================] - 0s 962us/step - loss: 21.6503
Epoch 6/10
17/17 [==============================] - 0s 1ms/step - loss: 21.3223
Epoch 7/10
17/17 [==============================] - 0s 878us/step - loss: 21.2016
Epoch 8/10
17/17 [==============================] - 0s 962us/step - loss: 21.2261
Epoch 9/10
17/17 [==============================] - 0s 967us/step - loss: 21.1657
Epoch 10/10
17/17 [==============================] - 0s 894us/step - loss: 21.3281

Classification models

  • Classification
    • categorical_crossentropy loss function
    • Similar to log loss: Lower is better
    • Add metrics=['accuracy'] to compile step for easy-to-understand diagnostics
    • Output layers has separate node for each possible outcome, and uses softmax activation

Understanding your classification data

Now you will start modeling with a new dataset for a classification problem. This data includes information about passengers on the Titanic. You will use predictors such as age, fare and where each passenger embarked from to predict who will survive. This data is from a tutorial on data science competitions. Look here for descriptions of the features.

It's smart to review the maximum and minimum values of each variable to ensure the data isn't misformatted or corrupted. What was the maximum age of passengers on the Titanic?

df = pd.read_csv('./dataset/titanic_all_numeric.csv')
df.head()
survived pclass age sibsp parch fare male age_was_missing embarked_from_cherbourg embarked_from_queenstown embarked_from_southampton
0 0 3 22.0 1 0 7.2500 1 False 0 0 1
1 1 1 38.0 1 0 71.2833 0 False 1 0 0
2 1 3 26.0 0 0 7.9250 0 False 0 0 1
3 1 1 35.0 1 0 53.1000 0 False 0 0 1
4 0 3 35.0 0 0 8.0500 1 False 0 0 1
df.describe()
survived pclass age sibsp parch fare male embarked_from_cherbourg embarked_from_queenstown embarked_from_southampton
count 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000 891.000000
mean 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208 0.647587 0.188552 0.086420 0.722783
std 0.486592 0.836071 13.002015 1.102743 0.806057 49.693429 0.477990 0.391372 0.281141 0.447876
min 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 0.000000 2.000000 22.000000 0.000000 0.000000 7.910400 0.000000 0.000000 0.000000 0.000000
50% 0.000000 3.000000 29.699118 0.000000 0.000000 14.454200 1.000000 0.000000 0.000000 1.000000
75% 1.000000 3.000000 35.000000 1.000000 0.000000 31.000000 1.000000 0.000000 0.000000 1.000000
max 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200 1.000000 1.000000 1.000000 1.000000

Last steps in classification models

You'll now create a classification model using the titanic dataset, which has been pre-loaded into a DataFrame called df. You'll take information about the passengers and predict which ones survived.

The predictive variables are stored in a NumPy array predictors. The target to predict is in df.survived, though you'll have to manipulate it for keras. The number of predictive features is stored in n_cols.

Here, you'll use the 'sgd' optimizer, which stands for Stochastic Gradient Descent.

predictors = df.iloc[:, 1:].astype(np.float32).to_numpy()
target = df.survived.astype(np.float32).to_numpy()
n_cols = predictors.shape[1]
from tensorflow.keras.utils import to_categorical

# Convert the target to categorical: target
target = to_categorical(target)

# Set up the model
model = tf.keras.Sequential()

# Add the first layer
model.add(tf.keras.layers.Dense(32, activation='relu', input_shape=(n_cols, )))

# Add the second layer
model.add(tf.keras.layers.Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model.fit(predictors, target, epochs=10);
Epoch 1/10
28/28 [==============================] - 0s 1ms/step - loss: 3.0240 - accuracy: 0.5623
Epoch 2/10
28/28 [==============================] - 0s 1ms/step - loss: 1.1196 - accuracy: 0.6072
Epoch 3/10
28/28 [==============================] - 0s 1ms/step - loss: 0.7049 - accuracy: 0.6700
Epoch 4/10
28/28 [==============================] - 0s 992us/step - loss: 0.6425 - accuracy: 0.6622
Epoch 5/10
28/28 [==============================] - 0s 1ms/step - loss: 0.6249 - accuracy: 0.6857
Epoch 6/10
28/28 [==============================] - 0s 1ms/step - loss: 0.6098 - accuracy: 0.6655
Epoch 7/10
28/28 [==============================] - 0s 997us/step - loss: 0.6163 - accuracy: 0.6835
Epoch 8/10
28/28 [==============================] - 0s 953us/step - loss: 0.6120 - accuracy: 0.6869
Epoch 9/10
28/28 [==============================] - 0s 979us/step - loss: 0.6062 - accuracy: 0.6813
Epoch 10/10
28/28 [==============================] - 0s 1ms/step - loss: 0.6040 - accuracy: 0.6981

Using models

  • Using models
    • Save
    • Load
    • Make predictions

Making predictions

The trained network from your previous coding exercise is now stored as model. New data to make predictions is stored in a NumPy array as pred_data. Use model to make predictions on your new data.

In this exercise, your predictions will be probabilities, which is the most common way for data scientists to communicate their predictions to colleagues.

pred_data = pd.read_csv('./dataset/titanic_pred.csv').astype(np.float32).to_numpy()
predictions = model.predict(pred_data)

# Calculate predicted probability of survival: predicted_prob_true
predicted_prob_true = predictions[:, 1]

# Print predicted_prob_true
print(predicted_prob_true)
[0.22180589 0.42722526 0.8252238  0.5213572  0.22075608 0.19369191
 0.12364192 0.33542025 0.18006396 0.60253584 0.2387915  0.2789472
 0.19096325 0.55081165 0.20018886 0.17208369 0.28164333 0.5258215
 0.10675837 0.49150914 0.6866302  0.23700541 0.12893575 0.2803661
 0.64113325 0.18212865 0.5681544  0.6431076  0.18872282 0.6477654
 0.47363198 0.5719546  0.20094132 0.26802042 0.32747698 0.6857638
 0.29811734 0.19867247 0.57263356 0.47933838 0.30061284 0.38860667
 0.5223013  0.1648294  0.34351107 0.11793773 0.5328546  0.1723291
 0.50598216 0.7348816  0.56341636 0.02569531 0.4972519  0.5847254
 0.28134227 0.3443733  0.8771691  0.20426093 0.38939697 0.20094132
 0.18658745 0.35565624 0.2363476  0.56039864 0.3243851  0.16688626
 0.34187257 0.5749924  0.21157199 0.4853801  0.23887387 0.5775004
 0.17202988 0.10031664 0.4612051  0.37792704 0.32310244 0.3113476
 0.1973361  0.6601293  0.483489   0.17888792 0.32240736 0.2537471
 0.2399641  0.29062274 0.2910817  0.5475278  0.37357914 0.5207849
 0.17842494]