- Classifying a time series
- The spectrogram
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns plt.rcParams['figure.figsize'] = (10, 5) plt.style.use('fivethirtyeight')
In this exercise, you'll start with perhaps the simplest classification technique: averaging across dimensions of a dataset and visually inspecting the result.
You'll use the heartbeat data described in the last chapter. Some recordings are normal heartbeat activity, while others are abnormal activity. Let's see if you can spot the difference.
abnormal, each with the shape of
(n_times_points, n_audio_files) containing the audio for several heartbeats are available in your workspace.
def show_plot_and_make_titles(): axs[0, 0].set(title="Normal Heartbeats") axs[0, 1].set(title="Abnormal Heartbeats") plt.tight_layout()
normal = pd.read_csv('./dataset/normal_sound.csv', index_col=0) abnormal = pd.read_csv('./dataset/abnormal_sound.csv', index_col=0) sfreq = 2205
fig, axs = plt.subplots(3, 2, figsize=(15, 7), sharex=True, sharey=True) # Calculate the time array time = np.arange(len(normal)) / sfreq # Stack the normal/abnormal audio so you can loop and plot stacked_audio = np.hstack([normal, abnormal]).T # Loop through each audio file / ax object and plot # .T.ravel() transposes the array, then unravels it into a 1-D vector for looping for iaudio, ax in zip(stacked_audio, axs.T.ravel()): ax.plot(time, iaudio) show_plot_and_make_titles()
While you should always start by visualizing your raw data, this is often uninformative when it comes to discriminating between two classes of data points. Data is usually noisy or exhibits complex patterns that aren't discoverable by the naked eye.
Another common technique to find simple differences between two sets of data is to average across multiple instances of the same class. This may remove noise and reveal underlying patterns (or, it may not).
In this exercise, you'll average across many instances of each class of heartbeat sound.
mean_normal = np.mean(normal, axis=1) mean_abnormal = np.mean(abnormal, axis=1) # Plot each average over time fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 3), sharey=True) ax1.plot(time, mean_normal); ax1.set(title='Normal Data'); ax2.plot(time, mean_abnormal); ax2.set(title='Abnormal Data');
While eye-balling differences is a useful way to gain an intuition for the data, let's see if you can operationalize things with a model. In this exercise, you will use each repetition as a datapoint, and each moment in time as a feature to fit a classifier that attempts to predict abnormal vs. normal heartbeats using only the raw data.
normal = pd.read_csv('./dataset/heart_normal.csv', index_col=0) abnormal = pd.read_csv('./dataset/heart_abnormal.csv', index_col=0)
normal_train_idx = np.random.choice(normal.shape, size=22, replace=False).tolist() normal_test_idx = list(set(np.arange(normal.shape).tolist()) - set(normal_train_idx)) abnormal_train_idx = np.random.choice(abnormal.shape, size=20, replace=False).tolist() abnormal_test_idx = list(set(np.arange(abnormal.shape).tolist()) - set(abnormal_train_idx)) X_train = pd.concat([normal.iloc[:, normal_train_idx], abnormal.iloc[:, abnormal_train_idx]], axis=1).to_numpy().T X_test = pd.concat([normal.iloc[:, normal_test_idx], abnormal.iloc[:, abnormal_test_idx]], axis=1).to_numpy().T y_train = np.array(['normal'] * len(normal_train_idx) + ['abnormal'] * len(abnormal_train_idx)) y_test = np.array(['normal'] * len(normal_test_idx) + ['abnormal'] * len(abnormal_test_idx))
from sklearn.svm import LinearSVC # Initialize and fit the model model = LinearSVC() model.fit(X_train, y_train) # Generate predictions and score them manually predictions = model.predict(X_test) print(sum(predictions == y_test.squeeze()) / len(y_test))
- The auditory envelope
- Smooth the data to calculate the auditory envelope
- Releated to the total amount of audio energy present at each moment of time
- Smoothing over time
- Instead of averaging over all time, we can do a local average
- This is called smoothing your timeseries
- It removes short-term noise, while retaining the general pattern
- Auditory features: The Tempogram
- We can summarize more complex temporal information with timeseries-specific functions
librosais a great library for auditory and timeseries feature engineering
- We can calculate summary statistics of tempo in the same way that we can for the envelope
One of the ways you can improve the features available to your model is to remove some of the noise present in the data. In audio data, a common way to do this is to smooth the data and then rectify it so that the total amount of sound energy over time is more distinguishable. You'll do this in the current exercise.
import librosa as lr audio, sfreq = lr.load('./dataset/files/murmur__201108222238.wav') time = np.arange(0, len(audio)) / sfreq audio = pd.DataFrame(audio)
audio_rectified = audio.apply(np.abs) # Plot the result plt.plot(time[:2205], audio_rectified[:2205]);
audio_rectified_smooth = audio_rectified.rolling(50).mean() # Plot the result plt.plot(time[:2205], audio_rectified_smooth[:2205]);
# Calculate stats means = np.mean(audio_rectified_smooth[:2205], axis=0) stds = np.std(audio_rectified_smooth[:2205], axis=0) maxs = np.max(audio_rectified_smooth[:2205], axis=0) # Create the X and y array X = np.column_stack([means, stds, maxs]) y = labels.reshape([-1, 1]) # Fit the model and score on testing data from sklearn.model_selection import cross_val_score percent_score = cross_val_score(model, X, y, cv=5) print(np.mean(percent_score))
One benefit of cleaning up your data is that it lets you compute more sophisticated features. For example, the envelope calculation you performed is a common technique in computing tempo and rhythm features. In this exercise, you'll use
librosa to compute some tempo and rhythm features for heartbeat data, and fit a model once more.
librosa functions tend to only operate on numpy arrays instead of DataFrames, so we'll access our Pandas data as a Numpy array with the
audio = pd.read_csv('./dataset/heart_normal.csv',index_col=0)
tempos =  for col, i_audio in audio.items(): tempos.append(lr.beat.tempo(i_audio.values, sr=sfreq, hop_length=2**6, aggregate=None)) # Convert the list to an array so you can manipulate it more easily tempos = np.array(tempos) # Calculate statistics of each tempo tempos_mean = tempos.mean(axis=-1) tempos_std = tempos.std(axis=-1) tempos_max = tempos.max(axis=-1)
# Create the X and y arrays X = np.column_stack([means, stds, maxs, tempos_mean, tempos_std, tempos_max]) y = labels.reshape([-1, 1]) # Fit the model and score on testing data percent_score = cross_val_score(model, X, y, cv=5) print(np.mean(percent_score))
- Fourier transforms
- Timeseries data can be described as a combination of quickly-changing things and slowly-changing things
- At each moment in time, we can describe the relative presence of fast- and slow-moving components
- The simplest way to do this is called a Fourier Transform
- This converts a single time series into an array that describe the timeseries as a combination of oscillations
- Spectrograms: combinations of windows Fourier transforms
- A spectrogram is a collection of windowsed Fourier transforms over time
- Similar to how a rolling mean was calculated:
- Choose a windows size and shape
- At a timepoint, calculate the FFT for that window
- Slide the window over by one
- Aggregate the results
- Called a Short-Time Fourier Transform (STFT)
- Spectral feature engineering
- Each timeseries has a different spectral pattern
- We can calculate these spectral patterns by analyzing the spectrogram.
- For example, spectral bandwidth and spectran centroids describe where most of the energy is at each moment in time.
Spectral engineering is one of the most common techniques in machine learning for time series data. The first step in this process is to calculate a spectrogram of sound. This describes what spectral content (e.g., low and high pitches) are present in the sound over time. In this exercise, you'll calculate a spectrogram of a heartbeat audio file.
audio, sfreq = lr.load('./dataset/files/murmur__201108222246.wav')
from librosa.core import stft, amplitude_to_db # Prepare the STFT HOP_LENGTH = 2**4 # spec = stft(audio, hop_length=HOP_LENGTH, n_fft=2**7) # For the test spec = pd.read_csv('./dataset/spec.csv', index_col=0) spec = spec.applymap(complex) time = np.array(normal.index) audio = pd.read_csv('./dataset/audio.csv', index_col=0).to_numpy().squeeze() sfreq = 2205
from librosa.display import specshow # Convert into decibels spec_db = amplitude_to_db(spec) # Compare the raw audio to the spectrogram of the audio fig, axs = plt.subplots(2, 1, figsize=(10, 10), sharex=True) axs.plot(time,audio) specshow(spec_db, sr=sfreq, x_axis='time', y_axis='hz', hop_length=HOP_LENGTH);
/home/chanseok/anaconda3/lib/python3.7/site-packages/librosa/core/spectrum.py:1702: UserWarning: amplitude_to_db was called on complex input so phase information will be discarded. To suppress this warning, call amplitude_to_db(np.abs(S)) instead. warnings.warn('amplitude_to_db was called on complex input so phase '
As you can probably tell, there is a lot more information in a spectrogram compared to a raw audio file. By computing the spectral features, you have a much better idea of what's going on. As such, there are all kinds of spectral features that you can compute using the spectrogram as a base. In this exercise, you'll look at a few of these features.
spec = np.array(np.abs(spec))
bandwidths = lr.feature.spectral_bandwidth(S=spec) centroids = lr.feature.spectral_centroid(S=spec)
spec_db = amplitude_to_db(spec) # Display these features on top of the spectrogram fig, ax = plt.subplots(figsize=(10, 5)) ax = specshow(spec_db, x_axis='time', y_axis='hz', hop_length=HOP_LENGTH); ax.plot(times_spec, centroids); ax.fill_between(times_spec, centroids - bandwidths / 2, centroids + bandwidths / 2, alpha=.5); ax.set(ylim=[None, 6000]); plt.savefig('../images/spectrogram_centroid.png')