import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['figure.figsize'] = (10, 5)

Introduction to Course

A "Thin" Application of Time Series

Google Trends allows users to see how often a term is searched for. We downloaded a file from Google Trends containing the frequency over time for the search word "diet". A first step when analyzing a time series is to visualize the data with a plot. You should be able to clearly see a gradual decrease in searches for "diet" throughout the calendar year, hitting a low around the December holidays, followed by a spike in searches around the new year as people make New Year's resolutions to lose weight.

Like many time series datasets you will be working with, the index of dates are strings and should be converted to a datetime index before plotting.

  • Preprocess
diet = pd.read_csv('./dataset/diet.csv', index_col=0)
6/26/2011 70
7/3/2011 71
7/10/2011 73
7/17/2011 74
7/24/2011 72
diet.index = pd.to_datetime(diet.index)

# Plot the entire time series diet and show gridlines
plt.title('Seasonal trend of "Diet" keywords');

# Slice the dataset to keep only 2012
diet2012 = diet[diet.index.year == 2012]

# Plot 2012 data
plt.title('2012 trend of "Diet" keywords');

Merging Time Series With Different Dates

Stock and bond markets in the U.S. are closed on different days. For example, although the bond market is closed on Columbus Day (around Oct 12) and Veterans Day (around Nov 11), the stock market is open on those days. One way to see the dates that the stock market is open and the bond market is closed is to convert both indexes of dates into sets and take the difference in sets.

The pandas .join() method is a convenient tool to merge the stock and bond DataFrames on dates when both markets are open. Stock prices and 10-year US Government bond yields is downloaded from FRED

  • Preprocess
stocks = pd.read_csv('./dataset/stocks.csv', index_col=0)
bonds = pd.read_csv('./dataset/bonds.csv', index_col=0)
stocks.index = pd.to_datetime(stocks.index)
bonds.index = pd.to_datetime(bonds.index)
set_stock_dates = set(stocks.index)
set_bond_dates = set(bonds.index)

# Take the difference between the sets and print
print(set_stock_dates - set_bond_dates)

# Merge stocks and bonds DataFrame using join()
stocks_and_bonds = stocks.join(bonds, how='inner')
{Timestamp('2016-11-11 00:00:00'), Timestamp('2013-11-11 00:00:00'), Timestamp('2010-11-11 00:00:00'), Timestamp('2007-11-12 00:00:00'), Timestamp('2011-10-10 00:00:00'), Timestamp('2009-10-12 00:00:00'), Timestamp('2009-11-11 00:00:00'), Timestamp('2017-06-09 00:00:00'), Timestamp('2010-10-11 00:00:00'), Timestamp('2012-10-08 00:00:00'), Timestamp('2011-11-11 00:00:00'), Timestamp('2007-10-08 00:00:00'), Timestamp('2014-10-13 00:00:00'), Timestamp('2008-11-11 00:00:00'), Timestamp('2015-11-11 00:00:00'), Timestamp('2012-11-12 00:00:00'), Timestamp('2008-10-13 00:00:00'), Timestamp('2013-10-14 00:00:00'), Timestamp('2016-10-10 00:00:00'), Timestamp('2015-10-12 00:00:00'), Timestamp('2014-11-11 00:00:00')}

Correlation of Two Time Series

Correlation of Stocks and Bonds

Investors are often interested in the correlation between the returns of two different assets for asset allocation and hedging purposes. In this exercise, you'll try to answer the question of whether stocks are positively or negatively correlated with bonds. Scatter plots are also useful for visualizing the correlation between the two variables.

Keep in mind that you should compute the correlations on the percentage changes rather than the levels.

returns = stocks_and_bonds.pct_change()

# Compute correlation using corr()
correlation = returns['SP500'].corr(returns['US10Y'])
print("Correlation of stocks and interest rates: ", correlation)

# Make scatter plot
plt.scatter(returns['SP500'], returns['US10Y']);
Correlation of stocks and interest rates:  0.4119448886249272

The positive correlation means that when interest rates go down, stock prices go down. For example, during crises like 9/11, investors sold stocks and moved their money to less risky bonds (this is sometimes referred to as a 'flight to quality'). During these periods, stocks drop and interest rates drop as well. Of course, there are times when the opposite relationship holds too.

Flying Saucers Aren't Correlated to Flying Markets

Two trending series may show a strong correlation even if they are completely unrelated. This is referred to as "spurious correlation". That's why when you look at the correlation of say, two stocks, you should look at the correlation of their returns and not their levels.

To illustrate this point, calculate the correlation between the levels of the stock market and the annual sightings of UFOs. Both of those time series have trended up over the last several decades, and the correlation of their levels is very high. Then calculate the correlation of their percent changes. This will be close to zero, since there is no relationship between those two series.

UFO data was downloaded from

  • Preprocess
DJI = pd.read_csv('./dataset/DJI.csv', index_col=0)
DJI.columns = ['DJI']
1941 110.96
1942 119.40
1943 135.89
1944 152.32
1945 192.91
UFO = pd.read_csv('./dataset/UFO.csv', index_col=0)
UFO.columns = ['UFO']
1941 1
1942 2
1943 9
1944 9
1945 9
UFO.index = pd.to_datetime(UFO.index, format="%Y")
DJI.index = pd.to_datetime(DJI.index, format="%Y")
levels = UFO.join(DJI, how='inner')
1941-01-01 1 110.96
1942-01-01 2 119.40
1943-01-01 9 135.89
1944-01-01 9 152.32
1945-01-01 9 192.91
plt.ylabel('Dow Jones Average/Number of Sightings')
correlation1 = levels['DJI'].corr(levels['UFO'])
print("Correlation of levels: ", correlation1)

# Compute correlation fo percent changes
changes = levels.pct_change()
correlation2 = changes['DJI'].corr(changes['UFO'])
print("Correlation of changes: ", correlation2)
Correlation of levels:  0.9399762210726428
Correlation of changes:  0.06026935462405373

Simple Linear Regression

  • What is a Regression?
    • Simple linear regression: $y_t = \alpha + \beta x_t + \epsilon_t$
  • Relationship between R-Squared and Correlation
    • $[corr(x, y)]^2 = R^2$
    • $sign(corr) = sign(\text{regression slope})$

Looking at a Regression's R-Squared

R-squared measures how closely the data fit the regression line, so the R-squared in a simple regression is related to the correlation between the two variables. In particular, the magnitude of the correlation is the square root of the R-squared and the sign of the correlation is the sign of the regression coefficient.

In this exercise, you will start using the statistical package statsmodels, which performs much of the statistical modeling and testing that is found in R and software packages like SAS and MATLAB.

You will take two series, x and y, compute their correlation, and then regress y on x using the function OLS(y,x) in the statsmodels.api library (note that the dependent, or right-hand side variable y is the first argument). Most linear regressions contain a constant term which is the intercept (the $\alpha$ in the regression $y_t = \alpha + \beta x_t + \epsilon_t$). To include a constant using the function OLS(), you need to add a column of 1's to the right hand side of the regression.

  • Preprocess
df_x = pd.read_csv('./dataset/x.csv', index_col=0, header=None)
df_y = pd.read_csv('./dataset/y.csv', index_col=0, header=None)

df_x.columns = ['x']
df_y.columns = ['y']

x = df_x.reset_index(drop=True)['x']
y = df_y.reset_index(drop=True)['y']
0     -0.835129
1     -0.061004
2     -0.194677
3     -2.461142
4      1.040073
995   -1.017080
996   -0.430943
997    1.989779
998   -1.171907
999   -1.565902
Name: y, Length: 1000, dtype: float64
import statsmodels.api as sm

# Compute correlation of x and y
correlation = x.corr(y)
print("The correlation between x and y is %4.2f" % (correlation))

# Convert the Series x to a DataFrame and name the column x
dfx = pd.DataFrame(x, columns=['x'])

# Add a constant to the DataFrame dfx
dfx1 = sm.add_constant(dfx)

# Regress y on dfx1
result = sm.OLS(y, dfx1).fit()

# Print out the results and look at the relationship between R-squared and the correlation above
The correlation between x and y is -0.90
                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.818
Model:                            OLS   Adj. R-squared:                  0.817
Method:                 Least Squares   F-statistic:                     4471.
Date:                Sun, 07 Jun 2020   Prob (F-statistic):               0.00
Time:                        20:39:54   Log-Likelihood:                -560.10
No. Observations:                1000   AIC:                             1124.
Df Residuals:                     998   BIC:                             1134.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
const         -0.0052      0.013     -0.391      0.696      -0.032       0.021
x             -0.9080      0.014    -66.869      0.000      -0.935      -0.881
Omnibus:                        0.048   Durbin-Watson:                   2.066
Prob(Omnibus):                  0.976   Jarque-Bera (JB):                0.103
Skew:                          -0.003   Prob(JB):                        0.950
Kurtosis:                       2.951   Cond. No.                         1.03

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.


  • Correlation of a time series with a lagged copy of itself
  • Lag-one autocorrelation
  • Also called serial correlation

One puzzling anomaly with stocks is that investors tend to overreact to news. Following large jumps, either up or down, stock prices tend to reverse. This is described as mean reversion in stock prices: prices tend to bounce back, or revert, towards previous levels after large moves, which are observed over time horizons of about a week. A more mathematical way to describe mean reversion is to say that stock returns are negatively autocorrelated.

This simple idea is actually the basis for a popular hedge fund strategy. If you're curious to learn more about this hedge fund strategy (although it's not necessary reading for anything else later in the course), see here.

You'll look at the autocorrelation of weekly returns of MSFT stock from 2012 to 2017. You'll start with a DataFrame MSFT of daily prices. You should use the .resample() method to get weekly prices and then compute returns from prices. Use the pandas method .autocorr() to get the autocorrelation and show that the autocorrelation is negative. Note that the .autocorr() method only works on Series, not DataFrames (even DataFrames with one column), so you will have to select the column in the DataFrame.

  • Preprocess
MSFT = pd.read_csv('./dataset/MSFT.csv', index_col=0)
MSFT.index = pd.to_datetime(MSFT.index, format="%m/%d/%Y")
Adj Close
2012-08-06 26.107651
2012-08-07 26.377876
2012-08-08 26.438896
2012-08-09 26.587088
2012-08-10 26.517351
MSFT = MSFT.resample(rule='W').last()

# Compute the percentage change of prices
returns = MSFT.pct_change()

# Compute and print the autocorrelation of returns
autocorrelation = returns['Adj Close'].autocorr()
print('The auto correlation of weekly returns is %4.2f' % (autocorrelation))
The auto correlation of weekly returns is -0.16

Are Interest Rates Autocorrelated?

When you look at daily changes in interest rates, the autocorrelation is close to zero. However, if you resample the data and look at annual changes, the autocorrelation is negative. This implies that while short term changes in interest rates may be uncorrelated, long term changes in interest rates are negatively autocorrelated. A daily move up or down in interest rates is unlikely to tell you anything about interest rates tomorrow, but a move in interest rates over a year can tell you something about where interest rates are going over the next year. And this makes some economic sense: over long horizons, when interest rates go up, the economy tends to slow down, which consequently causes interest rates to fall, and vice versa.

  • Preprocess
daily_rates = pd.read_csv('./dataset/daily_rates.csv', index_col=0, parse_dates=['DATE'])
1962-01-02 4.06
1962-01-03 4.03
1962-01-04 3.99
1962-01-05 4.02
1962-01-08 4.03
daily_diff = daily_rates.diff()

# Compute and print the autocorrelation of daily changes
autocorrelation_daily = daily_diff['US10Y'].autocorr()
print("The autocorrelation of daily interest rate changes is %4.2f" %(autocorrelation_daily))

# Convert the daily data to annual data
yearly_rates = daily_rates.resample(rule='A').last()

# Repeat above for annual data
yearly_diff = yearly_rates.diff()
autocorrelation_yearly = yearly_diff['US10Y'].autocorr()
print("The autocorrelation of annual interest rate changes is %4.2f" %(autocorrelation_yearly))
The autocorrelation of daily interest rate changes is 0.07
The autocorrelation of annual interest rate changes is -0.22