Working with Time Series in Pandas
A Summary of lecture "Manipulating Time Series Data in Python", via datacamp
- How to use dates & times with pandas
- Indexing & resampling time series
- Lags, changes, and returns for stock price series
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['figure.figsize'] = (10, 5)
How to use dates & times with pandas
- Date & time series functionality
- At the root: data types for date & time information
- Objects for points in time and periods
- Attributes & methods reflect time-related details
- Sequences of dates & periods
- Series or DataFrame columns
- Index: convert object into Time Series
- Many Series/DataFrame methods rely on time information in the index to provide time-series functinoality
- At the root: data types for date & time information
Your first time series
You have learned in the video how to create a sequence of dates using pd.date_range()
. You have also seen that each date in the resulting pd.DatetimeIndex
is a pd.Timestamp
with various attributes that you can access to obtain information about the date.
Now, you'll create a week of data, iterate over the result, and obtain the dayofweek
and weekday_name
for each date.
seven_days = pd.date_range(start='2017-1-1',periods=7)
# Iterate over the dates and print the number and name of the weekday
for day in seven_days:
print(day.dayofweek, day.day_name())
weekday_name
attribute is deprecated since 0.23.0. Instead, use .day_name()
method.
Indexing & resampling time series
- Time series transformation
- Basic time series transformations include:
- Parsing string dates and convert to
datetime64
- Selecting & slicing for specific subperiods
- Setting & changing
DateTimeIndex
frequency- Upsampling : Higher frequency implies new dates -> missing data
- Parsing string dates and convert to
- Basic time series transformations include:
Create a time series of air quality data
You have seen in the video how to deal with dates that are not in the correct format, but instead are provided as string
types, represented as dtype
object in pandas.
We have prepared a data set with air quality data (ozone, pm25, and carbon monoxide for NYC, 2000-2017) for you to practice the use of pd.to_datetime()
.
data = pd.read_csv('./dataset/nyc.csv')
# Inspect data
print(data.info())
# Convert the date column to datetime64
data['date'] = pd.to_datetime(data['date'])
# Set date column as index
data.set_index('date', inplace=True)
# Inspect data
print(data.info())
# Plot data
data.plot(subplots=True);
yahoo = pd.read_csv('./dataset/yahoo.csv')
yahoo['date'] = pd.to_datetime(yahoo['date'])
yahoo.set_index('date', inplace=True)
yahoo.head()
prices = pd.DataFrame()
# Select data for each year and concatenate with prices here
for year in ['2013', '2014', '2015']:
price_per_year = yahoo.loc[year, ['price']].reset_index(drop=True)
price_per_year.rename(columns={'price':year}, inplace=True)
prices = pd.concat([prices, price_per_year], axis=1)
# Plot prices
prices.plot();
Set and change time series frequency
In the video, you have seen how to assign a frequency to a DateTimeIndex, and then change this frequency.
Now, you'll use data on the daily carbon monoxide concentration in NYC, LA and Chicago from 2005-17.
You'll set the frequency to calendar daily and then resample to monthly frequency, and visualize both series to see how the different frequencies affect the data.
co = pd.read_csv('./dataset/co_cities.csv')
co['date'] = pd.to_datetime(co['date'])
co.set_index('date', inplace=True)
co.head()
print(co.info())
# Set the frequency to calendar daily
co = co.asfreq('D')
# Plot the data
co.plot(subplots=True);
# Set Frequency to monthly
co = co.asfreq('M')
# Plot the data
co.plot(subplots=True);
Lags, changes, and returns for stock price series
- Basic time series calculations
- Typical Time Series manipulations include:
- Shift or lag values back or forward back in time
- Get the difference in value for a given time period
- Compute the percent change over any number of periods
_
pandas
built-in methods rely onpd.DataTimeIndex
- Typical Time Series manipulations include:
Shifting stock prices across time
The first method to manipulate time series that you saw in the video was .shift()
, which allows you shift all values in a Series
or DataFrame
by a number of periods to a different time along the DateTimeIndex
.
Let's use this to visually compare a stock price series for Google shifted 90 business days into both past and future.
google = pd.read_csv('./dataset/google.csv', parse_dates=['Date'], index_col='Date')
# Set data frequency to business daily
google = google.asfreq('B')
# Create 'lagged' and 'shifted'
google['lagged'] = google['Close'].shift(periods=-90)
google['shifted'] = google['Close'].shift(periods=90)
# Plot the google price series
google.plot();
plt.savefig('../images/google_lagged.png')
yahoo = yahoo.asfreq('B')
yahoo['shifted_30'] = yahoo['price'].shift(periods=30)
# Subtract shifted_30 from price
yahoo['change_30'] = yahoo['price'] - yahoo['shifted_30']
# Get the 30-day price difference
yahoo['diff_30'] = yahoo['price'].diff(periods=30)
# Inspect the last five rows of price
print(yahoo['price'].tail(5))
# Show the value_counts of the difference between change_30 and diff_30
print(yahoo['diff_30'].sub(yahoo['change_30']).value_counts())
google = pd.read_csv('./dataset/google.csv', parse_dates=['Date'], index_col='Date')
# Set data frequency to business daily
google = google.asfreq('D')
google['daily_return'] = google['Close'].pct_change(periods=1) * 100
# Create monthly_return
google['monthly_return'] = google['Close'].pct_change(periods=30) * 100
# Create annual_return
google['annual_return'] = google['Close'].pct_change(periods=360) * 100
# Plot the result
google.plot(subplots=True);