Fitting the Future with time series analysis
What lies ahead in this chapter is you predicting what lies ahead in your data. You'll learn how to use the elegant statsmodels package to fit ARMA, ARIMA and ARMAX models. Then you'll use your models to predict the uncertain future of stock prices! This is the Summary of lecture "ARIMA Models in Python", via datacamp.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['figure.figsize'] = (10, 5)
plt.style.use('fivethirtyeight')
Fitting time series models
- Introduction to ARMAX models
- Exogenous ARMA
- Use external variables as well as time series
- ARMAX = ARMA + linear regression
- ARMAX equation
- ARMA(1, 1) model: $$ y_t = a_1 y_{t-1} + m_1 \epsilon_{t-1} + \epsilon_t $$
- ARMAX(1, 1) model: $$ y_t = x_1 z_t + a_1 + y_{t-1} + m_1 \epsilon_{t-1} + \epsilon_t $$
Fitting AR and MA models
In this exercise you will fit an AR and an MA model to some data. The data here has been generated using the arma_generate_sample()
function we used before.
You know the real AR and MA parameters used to create this data so it is a really good way to gain some confidence with ARMA models and know you are doing it right. In the next exercise you'll move onto some real world data with confidence.
sample = pd.read_csv('./dataset/sample.csv', index_col=0)
sample.head()
from statsmodels.tsa.arima_model import ARMA
# Instantiate the model
model = ARMA(sample['timeseries_1'], order=(2, 0))
# Fit the model
results = model.fit()
# Print summary
print(results.summary())
model = ARMA(sample['timeseries_2'], order=(0, 3))
# Fit the model
results = model.fit()
# Print summary
print(results.summary())
earthquake = pd.read_csv('./dataset/earthquakes.csv', index_col='date', parse_dates=True)
earthquake.drop(['Year'], axis=1, inplace=True)
earthquake.head()
model = ARMA(earthquake['earthquakes_per_year'], order=(3, 1))
# Fit the model
results = model.fit()
# Print model fit summary
print(results.summary())
Fitting an ARMAX model
In this exercise you will fit an ARMAX model to a time series which represents the wait times at an accident and emergency room for urgent medical care.
The variable you would like to model is the wait times to be seen by a medical professional wait_times_hrs
. This may be related to an exogenous variable that you measured nurse_count
which is the number of nurses on shift at any given time. These can be seen below.
hospital = pd.read_csv('./dataset/hospital.csv', index_col=0, parse_dates=True)
hospital.head()
hospital.plot(subplots=True);
This is a particularly interesting case of time series modeling as, if the number of nurses has an effect, you could change this to affect the wait times.
model = ARMA(hospital['wait_times_hrs'], order=(2, 1), exog=hospital['nurse_count'])
# Fit the model
results = model.fit()
# Print model fit summary
print(results.summary())
Generating one-step-ahead predictions
It is very hard to forecast stock prices. Classic economics actually tells us that this should be impossible because of market clearing.
Your task in this exercise is to attempt the impossible and predict the Amazon stock price anyway.
In this exercise you will generate one-step-ahead predictions for the stock price as well as the uncertainty of these predictions.
amazon = pd.read_csv('./dataset/amazon_close.csv', parse_dates=True, index_col='date')
amazon = amazon.iloc[::-1]
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(amazon.loc['2018-01-01':'2019-02-08'], order=(3, 1, 3), seasonal_order=(1, 0, 1, 7),
enforce_invertibility=False,
enforce_stationarity=False,
simple_differencing=False,
measurement_error=False,
k_trend=0)
results = model.fit()
results.summary()
one_step_forecast = results.get_prediction(start=-30)
# Extract prediction mean
mean_forecast = one_step_forecast.predicted_mean
# Get confidence intervals of predictions
confidence_intervals = one_step_forecast.conf_int()
# Select lower and upper confidence limits
lower_limits = confidence_intervals.loc[:,'lower close']
upper_limits = confidence_intervals.loc[:,'upper close']
# Print best estimate predictions
print(mean_forecast.values)
Plotting one-step-ahead predictions
Now that you have your predictions on the Amazon stock, you should plot these predictions to see how you've done.
You made predictions over the latest 30 days of data available, always forecasting just one day ahead. By evaluating these predictions you can judge how the model performs in making predictions for just the next day, where you don't know the answer.
plt.plot(amazon.index, amazon['close'], label='observed');
# Plot your mean predictions
plt.plot(mean_forecast.index, mean_forecast, color='r', label='forecast');
# shade the area between your confidence limits
plt.fill_between(lower_limits.index, lower_limits, upper_limits, color='pink');
# Set labels, legends
plt.xlabel('Date');
plt.ylabel('Amazon Stock Price - Close USD');
plt.legend();
Generating dynamic forecasts
Now lets move a little further into the future, to dynamic predictions. What if you wanted to predict the Amazon stock price, not just for tomorrow, but for next week or next month? This is where dynamical predictions come in.
Remember that in the video you learned how it is more difficult to make precise long-term forecasts because the shock terms add up. The further into the future the predictions go, the more uncertain. This is especially true with stock data and so you will likely find that your predictions in this exercise are not as precise as those in the last exercise.
dynamic_forecast = results.get_prediction(start=-30, dynamic=True)
# Extract prediction mean
mean_forecast = dynamic_forecast.predicted_mean
# Get confidence intervals of predictions
confidence_intervals = dynamic_forecast.conf_int()
# Select lower and upper confidence limits
lower_limits = confidence_intervals.loc[:, 'lower close']
upper_limits = confidence_intervals.loc[:, 'upper close']
# Print bet estimate predictions
print(mean_forecast.values)
Plotting dynamic forecasts
Time to plot your predictions. Remember that making dynamic predictions, means that your model makes predictions with no corrections, unlike the one-step-ahead predictions. This is kind of like making a forecast now for the next 30 days, and then waiting to see what happens before comparing how good your predictions were.
plt.plot(amazon.index, amazon['close'], label='observed');
# Plot your mean forecast
plt.plot(mean_forecast.index, mean_forecast, label='forecast');
# Shade the area between your confidence limits
plt.fill_between(lower_limits.index, lower_limits, upper_limits, color='pink');
# set labels, legends
plt.xlabel('Date');
plt.ylabel('Amazon Stock Price - Close USD');
plt.legend();
Differencing and fitting ARMA
In this exercise you will fit an ARMA model to the Amazon stocks dataset. As you saw before, this is a non-stationary dataset. You will use differencing to make it stationary so that you can fit an ARMA model.
In the next section you'll make a forecast of the differences and use this to forecast the actual values.
amazon_diff = amazon.diff().dropna()
# Create ARMA(2, 2) model
arma = SARIMAX(amazon_diff, order=(2, 0, 2))
# Fit model
arma_results = arma.fit()
# Print fit summary
print(arma_results.summary())
Unrolling ARMA forecast
Now you will use the model that you trained in the previous exercise arma
in order to forecast the absolute value of the Amazon stocks dataset. Remember that sometimes predicting the difference could be enough; will the stocks go up, or down; but sometimes the absolute value is key.
arma_diff_forecast = arma_results.get_forecast(steps=10).predicted_mean
# Integrate the difference forecast
arma_int_forecast = np.cumsum(arma_diff_forecast)
# Make absolute value forecast
arma_value_forecast = arma_int_forecast + amazon.iloc[-1, 0]
# Print forecast
print(arma_value_forecast)
Fitting an ARIMA model
In this exercise you'll learn how to be lazy in time series modeling. Instead of taking the difference, modeling the difference and then integrating, you're just going to lets statsmodels do the hard work for you.
You'll repeat the same exercise that you did before, of forecasting the absolute values of the Amazon stocks dataset, but this time with an ARIMA model.
arima = SARIMAX(amazon, order=(2, 1, 2))
# Fit ARIMA model
arima_results = arima.fit()
# Make ARIMA forecast of next 10 values
arima_value_forecast = arima_results.get_forecast(steps=10).predicted_mean
# Print forecast
print(arima_value_forecast)
plt.plot(amazon.index[-100:], amazon.iloc[-100:]['close'], label='observed');
# Plot your mean forecast
rng = pd.date_range(start='2019-02-08', end='2019-02-21', freq='b')
plt.plot(rng, arima_value_forecast.values, label='forecast');
# Shade the area between your confidence limits
# plt.fill_between(lower_limits.index, lower_limits, upper_limits, color='pink');
# set labels, legends
plt.xlabel('Date');
plt.ylabel('Amazon Stock Price - Close USD');
plt.legend();
plt.savefig('../images/arima_forecast.png')