Time series data is data that is recorded. Visualizing this type of data helps clarify trends and illuminates relationships between data. This is the Summary of lecture "Introduction to Data Visualization with Matplotlib", via datacamp.
- Plotting time-series data
- Plotting time-series with different variables
- Annotating time-series data
import pandas as pd import matplotlib.pyplot as plt plt.rcParams['figure.figsize'] = (10, 5)
Read data with a time index
Pandas DataFrame objects can have an index that denotes time. This is useful because Matplotlib recognizes that these measurements represent time and labels the values on the axis accordingly.
In this exercise, you will read data from a CSV file called
climate_change.csv that contains measurements of CO2 levels and temperatures made on the 6th of every month from 1958 until 2016. You will use Pandas
To designate the index as a
DateTimeIndex, you will use the
index_col key-word arguments both to parse this column as a variable that contains dates and also to designate it as the index for this DataFrame.
climate_change = pd.read_csv("./dataset/climate_change.csv", parse_dates=["date"], index_col=["date"])
Plot time-series data
To plot time-series data, we use the
plot command. The first argument to this method are the values for the x-axis and the second argument are the values for the y-axis.
This exercise provides data stored in a DataFrame called
climate_change. This variable has a time-index with the dates of measurements and two data columns:
In this case, the index of the DataFrame would be used as the x-axis values and we will plot the values stored in the
"relative_temp" column as the y-axis values. We will also properly label the x-axis and y-axis.
fig, ax = plt.subplots() # Add the time-series for "relative_temp" to the plot ax.plot(climate_change.index, climate_change["relative_temp"]); # Set the x-axis label ax.set_xlabel("Time"); # Set the y-axis label ax.set_ylabel("Relative temperature (Celsius)");
Using a time index to zoom in
When a time-series is represented with a time index, we can use this index for the x-axis when plotting. We can also select a to zoom in on a particular period within the time-series using Pandas' indexing facilities. In this exercise, you will select a portion of a time-series dataset and you will plot that period.
fig, ax = plt.subplots() # Create variable seventies with data from "1970-01-01" to "1979-12-31" seventies = climate_change["1970-01-01":"1979-12-31"]; # Add the time-series for "co2" data from seventies to the plot ax.plot(seventies.index, seventies["co2"]);
Plotting two variables
If you want to plot two time-series variables that were recorded at the same times, you can add both of them to the same subplot.
If the variables have very different scales, you'll want to make sure that you plot them in different twin Axes objects. These objects can share one axis (for example, the time, or x-axis) while not sharing the other (the y-axis).
To create a twin Axes object that shares the x-axis, we use the
In this exercise, you'll have access to a DataFrame that has the
climate_change data loaded into it. This DataFrame was loaded with the
"date" column set as a
DateTimeIndex, and it has a column called
"co2" with carbon dioxide measurements and a column called
"relative_temp" with temperature measurements.
fig, ax = plt.subplots() # Plot the CO2 variable in blue ax.plot(climate_change.index, climate_change["co2"], color="blue"); # Create a twin Axes that shares the x-axis ax2 = ax.twinx(); # Plot the relative temperature in red ax2.plot(climate_change.index, climate_change["relative_temp"], color="red");
Defining a function that plots time-series data
Once you realize that a particular section of code that you have written is useful, it is a good idea to define a function that saves that section of code for you, rather than copying it to other parts of your program where you would like to use this code.
Here, we will define a function that takes inputs such as a time variable and some other variable and plots them as x and y inputs. Then, it sets the labels on the x- and y-axis and sets the colors of the y-axis label, the y-axis ticks and the tick labels.
def plot_timeseries(axes, x, y, color, xlabel, ylabel): # Plot the inputs x,y in the provided color axes.plot(x, y, color=color) # Set the x-axis label axes.set_xlabel(xlabel) # Set the y-axis label axes.set_ylabel(ylabel, color=color) # Set the colors tick params for y-axis axes.tick_params('y', colors=color)
Using a plotting function
Defining functions allows us to reuse the same code without having to repeat all of it. Programmers sometimes say "Don't repeat yourself".
In the previous exercise, you defined a function called
plot_timeseries(axes, x, y, color, xlabel, ylabel)
that takes an Axes object (as the argument
axes), time-series data (as
y arguments) the name of a color (as a string, provided as the
color argument) and x-axis and y-axis labels (as
fig, ax = plt.subplots() # Plot the CO2 levels time-series in blue plot_timeseries(ax, climate_change.index, climate_change["co2"], "blue", "Time (years)", "CO2 levels") # Create a twin Axes object that shares the x-axis ax2 = ax.twinx(); # Plot the relative temperature data in red plot_timeseries(ax2, climate_change.index, climate_change["relative_temp"], "red", "Time (years)", "Relative temperature (Celsius)")
Annotating a plot of time-series data
Annotating a plot allows us to highlight interesting information in the plot. For example, in describing the climate change dataset, we might want to point to the date at which the relative temperature first exceeded 1 degree Celsius.
For this, we will use the
annotate method of the Axes object. In this exercise, you will have the DataFrame called
climate_change loaded into memory. Using the Axes methods, plot only the relative temperature column as a function of dates, and annotate the data.
fig, ax = plt.subplots() # Plot the relative temperature data ax.plot(climate_change.index, climate_change["relative_temp"]); # Annotate the date at which temperatures exceeded 1 degree ax.annotate('>1 degree', (pd.Timestamp('2015-10-06'), 1));
Plotting time-series: putting it all together
In this exercise, you will plot two time-series with different scales on the same Axes, and annotate the data from one of these series.
The CO2/temperatures data is provided as a DataFrame called
climate_change. You should also use the function that we have defined before, called
plot_timeseries, which takes an Axes object (as the
axes argument) plots a time-series (provided as
y arguments), sets the labels for the x-axis and y-axis and sets the color for the data, and for the y tick/axis labels:
plot_timeseries(axes, x, y, color, xlabel, ylabel)
Then, you will annotate with text an important time-point in the data: on 2015-10-06, when the temperature first rose to above 1 degree over the average.
fig, ax = plt.subplots() # Plot the CO2 levels time-series in blue plot_timeseries(ax, climate_change.index, climate_change['co2'], 'blue', "Time (years)", "CO2 levels") # Create an Axes object that shares the x-axis ax2 = ax.twinx() # Plot the relative temperature data in red plot_timeseries(ax2, climate_change.index, climate_change["relative_temp"], 'red', "Time (years)", "Relative temp (Celsius)") # Annotate point with relative temperature >1 degree ax2.annotate(">1 degree", xytext=(pd.Timestamp('2008-10-06'), -0.2), xy=(pd.Timestamp('2015-10-06'), 1), arrowprops=dict(arrowstyle='->', color='gray'));