Plotting time-series
Time series data is data that is recorded. Visualizing this type of data helps clarify trends and illuminates relationships between data. This is the Summary of lecture "Introduction to Data Visualization with Matplotlib", via datacamp.
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10, 5)
Read data with a time index
Pandas DataFrame objects can have an index that denotes time. This is useful because Matplotlib recognizes that these measurements represent time and labels the values on the axis accordingly.
In this exercise, you will read data from a CSV file called climate_change.csv
that contains measurements of CO2 levels and temperatures made on the 6th of every month from 1958 until 2016. You will use Pandas read_csv
function.
To designate the index as a DateTimeIndex
, you will use the parse_dates
and index_col
key-word arguments both to parse this column as a variable that contains dates and also to designate it as the index for this DataFrame.
climate_change = pd.read_csv("./dataset/climate_change.csv", parse_dates=["date"], index_col=["date"])
Plot time-series data
To plot time-series data, we use the Axes
object plot
command. The first argument to this method are the values for the x-axis and the second argument are the values for the y-axis.
This exercise provides data stored in a DataFrame called climate_change
. This variable has a time-index with the dates of measurements and two data columns: "co2"
and "relative_temp"
.
In this case, the index of the DataFrame would be used as the x-axis values and we will plot the values stored in the "relative_temp"
column as the y-axis values. We will also properly label the x-axis and y-axis.
fig, ax = plt.subplots()
# Add the time-series for "relative_temp" to the plot
ax.plot(climate_change.index, climate_change["relative_temp"]);
# Set the x-axis label
ax.set_xlabel("Time");
# Set the y-axis label
ax.set_ylabel("Relative temperature (Celsius)");
Using a time index to zoom in
When a time-series is represented with a time index, we can use this index for the x-axis when plotting. We can also select a to zoom in on a particular period within the time-series using Pandas' indexing facilities. In this exercise, you will select a portion of a time-series dataset and you will plot that period.
fig, ax = plt.subplots()
# Create variable seventies with data from "1970-01-01" to "1979-12-31"
seventies = climate_change["1970-01-01":"1979-12-31"];
# Add the time-series for "co2" data from seventies to the plot
ax.plot(seventies.index, seventies["co2"]);
Plotting two variables
If you want to plot two time-series variables that were recorded at the same times, you can add both of them to the same subplot.
If the variables have very different scales, you'll want to make sure that you plot them in different twin Axes objects. These objects can share one axis (for example, the time, or x-axis) while not sharing the other (the y-axis).
To create a twin Axes object that shares the x-axis, we use the twinx
method.
In this exercise, you'll have access to a DataFrame that has the climate_change
data loaded into it. This DataFrame was loaded with the "date"
column set as a DateTimeIndex
, and it has a column called "co2"
with carbon dioxide measurements and a column called "relative_temp"
with temperature measurements.
fig, ax = plt.subplots()
# Plot the CO2 variable in blue
ax.plot(climate_change.index, climate_change["co2"], color="blue");
# Create a twin Axes that shares the x-axis
ax2 = ax.twinx();
# Plot the relative temperature in red
ax2.plot(climate_change.index, climate_change["relative_temp"], color="red");
Defining a function that plots time-series data
Once you realize that a particular section of code that you have written is useful, it is a good idea to define a function that saves that section of code for you, rather than copying it to other parts of your program where you would like to use this code.
Here, we will define a function that takes inputs such as a time variable and some other variable and plots them as x and y inputs. Then, it sets the labels on the x- and y-axis and sets the colors of the y-axis label, the y-axis ticks and the tick labels.
def plot_timeseries(axes, x, y, color, xlabel, ylabel):
# Plot the inputs x,y in the provided color
axes.plot(x, y, color=color)
# Set the x-axis label
axes.set_xlabel(xlabel)
# Set the y-axis label
axes.set_ylabel(ylabel, color=color)
# Set the colors tick params for y-axis
axes.tick_params('y', colors=color)
Using a plotting function
Defining functions allows us to reuse the same code without having to repeat all of it. Programmers sometimes say "Don't repeat yourself".
In the previous exercise, you defined a function called plot_timeseries
:
plot_timeseries(axes, x, y, color, xlabel, ylabel)
that takes an Axes object (as the argument axes
), time-series data (as x
and y
arguments) the name of a color (as a string, provided as the color
argument) and x-axis and y-axis labels (as xlabel
and ylabel
arguments).
fig, ax = plt.subplots()
# Plot the CO2 levels time-series in blue
plot_timeseries(ax, climate_change.index, climate_change["co2"], "blue", "Time (years)", "CO2 levels")
# Create a twin Axes object that shares the x-axis
ax2 = ax.twinx();
# Plot the relative temperature data in red
plot_timeseries(ax2, climate_change.index, climate_change["relative_temp"], "red", "Time (years)", "Relative temperature (Celsius)")
Annotating a plot of time-series data
Annotating a plot allows us to highlight interesting information in the plot. For example, in describing the climate change dataset, we might want to point to the date at which the relative temperature first exceeded 1 degree Celsius.
For this, we will use the annotate
method of the Axes object. In this exercise, you will have the DataFrame called climate_change
loaded into memory. Using the Axes methods, plot only the relative temperature column as a function of dates, and annotate the data.
fig, ax = plt.subplots()
# Plot the relative temperature data
ax.plot(climate_change.index, climate_change["relative_temp"]);
# Annotate the date at which temperatures exceeded 1 degree
ax.annotate('>1 degree', (pd.Timestamp('2015-10-06'), 1));
Plotting time-series: putting it all together
In this exercise, you will plot two time-series with different scales on the same Axes, and annotate the data from one of these series.
The CO2/temperatures data is provided as a DataFrame called climate_change
. You should also use the function that we have defined before, called plot_timeseries
, which takes an Axes object (as the axes
argument) plots a time-series (provided as x
and y
arguments), sets the labels for the x-axis and y-axis and sets the color for the data, and for the y tick/axis labels:
plot_timeseries(axes, x, y, color, xlabel, ylabel)
Then, you will annotate with text an important time-point in the data: on 2015-10-06, when the temperature first rose to above 1 degree over the average.
fig, ax = plt.subplots()
# Plot the CO2 levels time-series in blue
plot_timeseries(ax, climate_change.index, climate_change['co2'], 'blue', "Time (years)", "CO2 levels")
# Create an Axes object that shares the x-axis
ax2 = ax.twinx()
# Plot the relative temperature data in red
plot_timeseries(ax2, climate_change.index, climate_change["relative_temp"], 'red', "Time (years)", "Relative temp (Celsius)")
# Annotate point with relative temperature >1 degree
ax2.annotate(">1 degree",
xytext=(pd.Timestamp('2008-10-06'), -0.2),
xy=(pd.Timestamp('2015-10-06'), 1), arrowprops=dict(arrowstyle='->', color='gray'));