# Case Study in time series analysis

This chapter will give you a chance to practice all the concepts covered in the course. You will visualize the unemployment rate in the US from 2000 to 2010. This is the Summary of lecture "Visualizing Time-Series data in Python", via datacamp.

- Apply your knowledge to a new dataset
- Beyond summary statistics
- Decompose time series data
- Compute correlations between time series

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['figure.figsize'] = (10, 5)
plt.style.use('fivethirtyeight')
```

### Explore the Jobs dataset

In this exercise, you will explore the new `jobs`

DataFrame, which contains the unemployment rate of different industries in the USA during the years of 2000-2010. As you will see, the dataset contains time series for 16 industries and across 122 timepoints (one per month for 10 years). In general, the typical workflow of a Data Science project will involve data cleaning and exploration, so we will begin by reading in the data and checking for missing values.

```
jobs = pd.read_csv('./dataset/ch5_employment.csv')
# Print first five lines of your DataFrame
print(jobs.head(5))
# Check the type of each column in your DataFrame
print(jobs.dtypes)
# Convert datestamp column to a datetime object
jobs['datestamp'] = pd.to_datetime(jobs['datestamp'])
# Set the datestamp columns as the index of your DataFrame
jobs = jobs.set_index('datestamp')
# Check the number of missing values in each columns
print(jobs.isnull().sum())
```

### Describe time series data with boxplots

You should always explore the distribution of the variables, and because you are working with time series, you will explore their properties using boxplots and numerical summaries. As a reminder, you can plot data in a DataFrame as boxplots with the command:

```
df.boxplot(fontsize=6, vert=False)
```

Notice the introduction of the new parameter `vert`

, which specifies whether to plot the boxplots horizontally or vertically.

```
jobs.boxplot(fontsize=8, vert=False);
# Generate numberical summaries
print(jobs.describe())
```

### Plot all the time series in your dataset

The `jobs`

DataFrame contains 16 time series representing the unemployment rate of various industries between 2001 and 2010. This may seem like a large amount of time series to visualize at the same time, but Chapter 4 introduced you to facetted plots. In this exercise, you will explore some of the time series in the `jobs`

DataFrame and look to extract some meaningful information from these plots.

```
jobs_subset = jobs[['Finance', 'Information', 'Manufacturing', 'Construction']]
# Print the first 5 rows of jobs_subset
print(jobs_subset.head(5))
# Create a facetted graph with 2 rows and 2 columns
ax = jobs_subset.plot(subplots=True,
layout=(2, 2),
sharex=False,
sharey=False,
linewidth=0.7,
fontsize=8,
legend=False);
```

### Annotate significant events in time series data

When plotting the `Finance`

, `Information`

, `Manufacturing`

and `Construction`

time series of the `jobs`

DataFrame, you observed a distinct increase in unemployment rates during 2001 and 2008. In general, time series plots can be made even more informative if you include additional annotations that emphasize specific observations or events. This allows you to quickly highlight parts of the graph to viewers, and can help infer what may have caused a specific event.

Recall that you have already set the `datestamp`

column as the index of the `jobs`

DataFrame, so you are prepared to directly annotate your plots with vertical or horizontal lines.

```
ax = jobs.plot(colormap='Spectral', fontsize=6, linewidth=0.8);
# Set labels and legend
ax.set_xlabel('Date', fontsize=10);
ax.set_ylabel('Unemployment Rate', fontsize=10);
ax.set_title('Unemployment rate of U.S. workers by industry', fontsize=10);
ax.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
# Annotate your plots with vertical lines
ax.axvline('2001-07-01', color='blue', linestyle='--', linewidth=0.8);
ax.axvline('2008-09-01', color='blue', linestyle='--', linewidth=0.8);
```

### Plot monthly and yearly trends

Like we saw in Chapter 2, when the index of a DataFrame is of the `datetime`

type, it is possible to directly extract the day, month or year of each date in the index. As a reminder, you can extract the year of each date in the index using the `.index.year`

attribute. You can then use the `.groupby()`

and `.mean()`

methods to compute the mean annual value of each time series in your DataFrame:

```
index_year = df.index.year
df_by_year = df.groupby(index_year).mean()
```

You will now apply what you have learned to display the aggregate mean values of each time series in the `jobs`

DataFrame.

```
index_month = jobs.index.month
# Compute the mean unemployment rate for each month
jobs_by_month = jobs.groupby(index_month).mean()
# Plot the mean unemployment rate for each month
ax = jobs_by_month.plot(fontsize=8, linewidth=1);
# Set axis labels and legend
ax.set_xlabel('Month', fontsize=10);
ax.set_ylabel('Mean unemployment rate', fontsize=10);
ax.legend(bbox_to_anchor=(0.8, 0.6), fontsize=10);
```

```
index_year = jobs.index.year
# Compute the mean unemployment rate for each year
jobs_by_year = jobs.groupby(index_year).mean()
# Plot the mean unemployment rate for each year
ax = jobs_by_year.plot(fontsize=8, linewidth=1);
# Set axis labels and legend
ax.set_xlabel('Year', fontsize=10);
ax.set_ylabel('Mean unemployment rate', fontsize=10);
ax.legend(bbox_to_anchor=(0.1, 0.5), fontsize=10);
```

### Apply time series decomposition to your dataset

You will now perform time series decomposition on multiple time series. You can achieve this by leveraging the Python dictionary to store the results of each time series decomposition.

In this exercise, you will initialize an empty dictionary with a set of curly braces, `{}`

, use a for loop to iterate through the columns of the DataFrame and apply time series decomposition to each time series. After each time series decomposition, you place the results in the dictionary by using the command `my_dict[key] = value`

, where `my_dict`

is your dictionary, `key`

is the name of the column/time series, and `value`

is the decomposition object of that time series.

```
import statsmodels.api as sm
# Initialize dictionary
jobs_decomp = {}
# Get the names of each time series in the DataFrame
jobs_names = jobs.columns
# run time series decomposition on each time series of the DataFrame
for ts in jobs_names:
ts_decomposition = sm.tsa.seasonal_decompose(jobs[ts])
jobs_decomp[ts] = ts_decomposition
```

### Visualize the seasonality of multiple time series

You will now extract the `seasonality`

component of `jobs_decomp`

to visualize the seasonality in these time series. Note that before plotting, you will have to convert the dictionary of `seasonality`

components into a DataFrame using the `pd.DataFrame.from_dict()`

function.

```
jobs_seasonal = {}
```

```
for ts in jobs_names:
jobs_seasonal[ts] = jobs_decomp[ts].seasonal
# Create a DataFrame from the jobs_seasonal dictionary
seasonality_df = pd.DataFrame.from_dict(jobs_seasonal)
# Remove the label for the index
seasonality_df.index.name = None
# Create a faceted plot of the seasonality_df DataFrame
seasonality_df.plot(subplots=True,
layout=(4, 4),
sharey=False,
fontsize=2,
linewidth=0.3,
legend=False);
```

### Correlations between multiple time series

In the previous exercise, you extracted the `seasonal`

component of each time series in the `jobs`

DataFrame and stored those results in new DataFrame called `seasonality_df`

. In the context of jobs data, it can be interesting to compare seasonality behavior, as this may help uncover which job industries are the most similar or the most different.

This can be achieved by using the `seasonality_df`

DataFrame and computing the correlation between each time series in the dataset. In this exercise, you will leverage what you have learned in Chapter 4 to compute and create a clustermap visualization of the correlations between time series in the `seasonality_df`

DataFrame.

```
seasonality_corr = seasonality_df.corr(method='spearman')
# Customize the clustermap of the seasonality_corr correlation matrix
fig = sns.clustermap(seasonality_corr,
annot=True,
annot_kws={"size": 4},
linewidths=.4,
figsize=(15, 10));
plt.setp(fig.ax_heatmap.yaxis.get_majorticklabels(), rotation=0);
plt.setp(fig.ax_heatmap.xaxis.get_majorticklabels(), rotation=90);
plt.savefig('../images/jobs_clustermap.png')
```