Putting it all together - Building a value-weighted index
A Summary of lecture "Manipulating Time Series Data in Python", via datacamp
- Select index components & import data
- Build a market-cap weighted index
- Evaluate index performance
- Index correlation & exporting to Excel
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.rcParams['figure.figsize'] = (10, 5)
Select index components & import data
- Market value-weighted index
- Composite performance of various stocks
- Components weighted by market capitalization
-
Share Price
*Number of Shares
=>Market Value
-
- Larger components get higher percentage weightings
- Key market indexes are value-weighted
- S&P500, NASDAQ, Wilshire 5000, Hang Seng, ...
- Build a cap-weighted Index
- Apply new skills to construct value-weighted index
- Select components from exchange listing data
- Get component number of shares and stock prices
- Calculate component weights
- Calculate index
- Evaluate performance of components and index
- Apply new skills to construct value-weighted index
Explore and clean company listing information
To get started with the construction of a market-value based index, you'll work with the combined listing info for the three largest US stock exchanges, the NYSE, the NASDAQ and the AMEX.
In this and the next exercise, you will calculate market-cap weights for these stocks.
listings_nyse = pd.read_excel('./dataset/listings.xlsx', sheet_name='nyse', na_values='n/a')
listings_amex = pd.read_excel('./dataset/listings.xlsx', sheet_name='amex', na_values='n/a')
listings_nasdaq = pd.read_excel('./dataset/listings.xlsx', sheet_name='nasdaq', na_values='n/a')
listings_nyse['Exchange'] = 'nyse'
listings_amex['Exchange'] = 'amex'
listings_nasdaq['Exchange'] = 'nasdaq'
listings = pd.concat([listings_amex, listings_nasdaq, listings_nyse], axis=0)
listings.reset_index(inplace=True)
listings.drop(['index'], axis=1, inplace=True)
listings['Market Capitalization'] /= 1e6
print(listings.info())
# Move 'stock symbol' into the index
listings.set_index('Stock Symbol', inplace=True)
# Drop rows with missing 'sector' data
listings.dropna(subset=['Sector'], inplace=True)
# Select companies with IPO Year befor 2019
listings = listings[listings['IPO Year'] < 2019]
# Inspect the new listings data
print(listings.info())
# Show the number of companies per sector
print(listings.groupby('Sector').size().sort_values(ascending=False))
Select and inspect index components
Now that you have imported and cleaned the listings data, you can proceed to select the index components as the largest company for each sector by market capitalization.
You'll also have the opportunity to take a closer look at the components, their last market value, and last price.
components = listings.groupby('Sector')['Market Capitalization'].nlargest(1)
# Print components, sorted by market cap
print(components.sort_values(ascending=False))
# Select stock symbols and print the result
tickers = components.index.get_level_values('Stock Symbol')
print(tickers)
# Print company name, market cap, and last price for each components
info_cols = ['Company Name', 'Market Capitalization', 'Last Sale']
print(listings.loc[tickers,info_cols].sort_values('Market Capitalization', ascending=False))
tickers = tickers.tolist()
print(tickers)
# Import prices and inspect result
stock_prices = pd.read_csv('./dataset/stocks_4.csv', parse_dates=['Date'], index_col='Date')
print(stock_prices.info())
# Calculate the returns
price_return = stock_prices.iloc[-1].div(stock_prices.iloc[0]).sub(1).mul(100)
# Plot horizontal bar chart of sorted price_return
price_return.sort_values().plot(kind='barh', title='Stock Price Returns');
Calculate number of shares outstanding
The next step towards building a value-weighted index is to calculate the number of shares for each index component.
The number of shares will allow you to calculate the total market capitalization for each component given the historical price series in the next exercise.
print(listings.info())
print(tickers)
# Select components and relevant columns from listings
components = listings[['Market Capitalization', 'Last Sale']].loc[tickers]
# Print the first rows of components
print(components.head(5))
# Calculate the number of shares here
no_shares = components['Market Capitalization'].div(components['Last Sale'])
# Print the sorted no_shares
print(no_shares.sort_values(ascending=False))
Create time series of market value
You can now use the number of shares to calculate the total market capitalization for each component and trading date from the historical price series.
The result will be the key input to construct the value-weighted stock index, which you will complete in the next exercise.
components['Number of Shares'] = no_shares
no_shares = components['Number of Shares']
print(no_shares.sort_values())
# Create the series of market cap per ticker
market_cap = stock_prices.mul(no_shares)
# Select first and last market cap here
first_value = market_cap.iloc[0]
last_value = market_cap.iloc[-1]
# Concatenate and plot first and last market cap here
pd.concat([first_value, last_value], axis=1).dropna().plot(kind='barh');
plt.savefig('../images/market_cap.png')
plt.show()
Calculate & plot the composite index
By now you have all ingredients that you need to calculate the aggregate stock performance for your group of companies.
Use the time series of market capitalization that you created in the last exercise to aggregate the market value for each period, and then normalize this series to convert it to an index.
market_cap_series = market_cap[pd.concat([first_value, last_value], axis=1).dropna().index.tolist()]
market_cap_series
raw_index = market_cap_series.sum(axis=1)
print(raw_index)
# Normalize the aggregate market cap here
index = raw_index.div(raw_index.iloc[0]).mul(100)
print(index)
# Plot the index here
index.plot(title='Market-Cap Weighted Index');
index_return = (index.iloc[-1] / index.iloc[0] - 1) * 100
print(index_return)
# Select the market capitalization
market_cap = components['Market Capitalization']
# Calculate the total market cap
total_market_cap = market_cap.sum()
# Calculate the component weights , and print the result
weights = market_cap.div(total_market_cap)
print(weights.sort_values())
# Calculate and plot the distribution by component
weights.mul(index_return).sort_values().plot(kind='barh');
Compare index performance against benchmark I
The next step in analyzing the performance of your index is to compare it against a benchmark.
In the video, we used the S&P 500 as benchmark. You can also use the Dow Jones Industrial Average, which contains the 30 largest stocks, and would also be a reasonable benchmark for the largest stocks from all sectors across the three exchanges.
djia = pd.read_csv('./dataset/djia2.csv', parse_dates=['DATE'], index_col='DATE')['DJIA']
data = index.to_frame('Index')
# Normalize djia series and add as new column to data
djia = djia.div(djia.iloc[0]).mul(100)
data['DJIA'] = djia
# Print total return for both index and djia
print((data.iloc[-1] / data.iloc[0] - 1) * 100)
# Plot both series
data.plot();
Compare index performance against benchmark II
The next step in analyzing the performance of your index is to compare it against a benchmark.
In the video, we have use the S&P 500 as benchmark. You can also use the Dow Jones Industrial Average, which contains the 30 largest stocks, and would also be a reasonable benchmark for the largest stocks from all sectors across the three exchanges.
print(data.info())
print(data.head(5))
# Create multi_period_return function here
def multi_period_return(r):
return (np.prod(r + 1) - 1) * 100
# Calculate rolling_return_360
rolling_return_360 = data.pct_change().rolling('360D').apply(multi_period_return)
# Plot rolling_return_360 here
rolling_return_360.plot(title='Rolling 360D Return');
plt.show()
stock_prices = pd.read_csv('./dataset/stocks_4.csv', parse_dates=['Date'], index_col='Date')
print(stock_prices.info())
# Calculate the dail returns
returns = stock_prices.pct_change()
# Calculate and print the pairwise correlations
correlations = returns.corr()
print(correlations)
# Plot a heatmap of daily return correlations
sns.heatmap(correlations, annot=True);
plt.title('Daily Return Correlations')
index = index.to_frame('Index')
print(index.info())
print(stock_prices.info())
# Join index to stock_prices, and inspect the result
data = stock_prices.join(index)
print(data.info())
# Create index & stock price returns
returns = data.pct_change()
# Export data and data as returns to excel
with pd.ExcelWriter('data.xls') as writer:
data.to_excel(writer, sheet_name='data')
returns.to_excel(writer, sheet_name='returns')