- Introduction to relational plots and subplots
- Customizing scatter plots
- Introduction to line plots
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns plt.rcParams['figure.figsize'] = (10, 5)
We've seen in prior exercises that students with more absences (
"absences") tend to have lower final grades (
"G3"). Does this relationship hold regardless of how much time students study each week?
To answer this, we'll look at the relationship between the number of absences that a student has in school and their final grade in the course, creating separate subplots based on each student's weekly study time (
student_data = pd.read_csv('./dataset/student-alcohol-consumption.csv', index_col=0) student_data.head()
|0||GP||F||18||GT3||A||4||4||2||0||yes||...||4||1||1||3||6||5||6||6||Urban||2 to 5 hours|
|1||GP||F||17||GT3||T||1||1||1||0||no||...||3||1||1||3||4||5||5||6||Urban||2 to 5 hours|
|2||GP||F||15||LE3||T||1||1||1||3||yes||...||2||2||3||3||10||7||8||10||Urban||2 to 5 hours|
|3||GP||F||15||GT3||T||4||2||1||0||no||...||2||1||1||5||2||15||14||15||Urban||5 to 10 hours|
|4||GP||F||16||GT3||T||3||3||1||0||no||...||2||1||2||5||4||6||10||10||Urban||2 to 5 hours|
5 rows × 29 columns
sns.relplot(x="absences", y="G3", data=student_data, kind='scatter');
sns.relplot(x="absences", y="G3", data=student_data, kind="scatter", col='study_time');
sns.relplot(x="absences", y="G3", data=student_data, kind="scatter", row="study_time");
Because these subplots had a large range of x values, it's easier to read them arranged in rows instead of columns.
Let's continue looking at the student_data dataset of students in secondary school. Here, we want to answer the following question: does a student's first semester grade (
"G1") tend to correlate with their final grade (
There are many aspects of a student's life that could result in a higher or lower final grade in the class. For example, some students receive extra educational support from their school (
"schoolsup") or from their family (
"famsup"), which could result in higher grades. Let's try to control for these two factors by creating subplots based on whether the student received extra educational support from their school or family.
sns.relplot(x='G1', y='G3', data=student_data, kind='scatter');
sns.relplot(x="G1", y="G3", data=student_data, kind="scatter", col='schoolsup', col_order=['yes', 'no']);
sns.relplot(x="G1", y="G3", data=student_data, kind="scatter", col="schoolsup", row='famsup', col_order=["yes", "no"], row_order=['yes', 'no']);
It looks like the first semester grade does correlate with the final grade, regardless of what kind of support the student received.
In this exercise, we'll explore Seaborn's
mpg dataset, which contains one row per car model and includes information such as the year the car was made, the number of miles per gallon ("M.P.G.") it achieves, the power of its engine (measured in "horsepower"), and its country of origin.
What is the relationship between the power of a car's engine (
"horsepower") and its fuel efficiency (
"mpg")? And how does this relationship vary by the number of cylinders (
"cylinders") the car has? Let's find out.
Let's continue to use
relplot() instead of
scatterplot() since it offers more flexibility.
mpg = pd.read_csv('./dataset/mpg.csv') mpg.head()
|0||18.0||8||307.0||130.0||3504||12.0||70||usa||chevrolet chevelle malibu|
|1||15.0||8||350.0||165.0||3693||11.5||70||usa||buick skylark 320|
|3||16.0||8||304.0||150.0||3433||12.0||70||usa||amc rebel sst|
sns.relplot(x='horsepower', y='mpg', data=mpg, size='cylinders', kind='scatter');
sns.relplot(x="horsepower", y="mpg", data=mpg, kind="scatter", size="cylinders", hue='cylinders');
Cars with higher horsepower tend to get a lower number of miles per gallon. They also tend to have a higher number of cylinders.
Let's continue exploring Seaborn's
mpg dataset by looking at the relationship between how fast a car can accelerate (
"acceleration") and its fuel efficiency (
"mpg"). Do these properties vary by country of origin (
Note that the "acceleration" variable is the time to accelerate from 0 to 60 miles per hour, in seconds. Higher values indicate slower acceleration.
sns.relplot(x='acceleration', y='mpg', data=mpg, kind='scatter', style='origin', hue='origin');
Cars from the USA tend to accelerate more quickly and get lower miles per gallon compared to cars from Europe and Japan.
- What are line plots?\
Two types of relational plots: scatter plots and line plots
- Scatter plots Each plot point is an independent observation
- Line plots Each plot point represents the same "thing", typically tracked over time.
- Multiple observations per x-value\
Shared region is the confidence interval
- Assume dataset is random sample
- 95% confident that the mean is within this interval
- Indicates uncertainty in our estimate
In this exercise, we'll continue to explore Seaborn's mpg dataset, which contains one row per car model and includes information such as the year the car was made, its fuel efficiency (measured in "miles per gallon" or "M.P.G"), and its country of origin (USA, Europe, or Japan).
How has the average miles per gallon achieved by these cars changed over time? Let's use line plots to find out!
sns.relplot(x='model_year', y='mpg', data=mpg, kind='line');
sns.relplot(x="model_year", y="mpg", data=mpg, kind="line", ci='sd');
Unlike the plot in the last exercise, this plot shows us the distribution of miles per gallon for all the cars in each year.
sns.relplot(x='model_year', y='horsepower', data=mpg, kind='line', ci=None);
sns.relplot(x="model_year", y="horsepower", data=mpg, kind="line", style='origin', hue='origin', ci=None);
sns.relplot(x="model_year", y="horsepower", data=mpg, kind="line", ci=None, style="origin", hue="origin", markers=True, dashes=False);
Now that we've added subgroups, we can see that this downward trend in horsepower was more pronounced among cars from the USA.