I am trying to use yfinance
to download historical financial data. I created a function to grab the Adjusted Closing price for a list of tickers. However, the first item on the list (currently 'S&P 500') returns NaN values from 2000-01-03 to around 2015-04-30.
When I change tickers to just "S&P 500": "^GSPC" for example, there are no issues with the output. Am I doing something wrong that is causing the first item on the list not to return any values?
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# 1. Define Parameters
tickers = {
"S&P 500": "^GSPC",
"Nasdaq": "^IXIC",
"Financials": "XLF",
"Technology": "XLK",
"Healthcare": "XLV",
"Energy": "XLE",
"Airlines": "JETS"
}
start_date = "2000-01-01"
end_date = "2022-12-31"
# 2. Fetch and Prepare Data
def fetch_data(tickers, start, end):
data = yf.download(list(tickers.values()), start=start, end=end)["Adj Close"]
data.columns = tickers.keys()
return data
market_data = fetch_data(tickers, start_date, end_date)
print(market_data)
market_data.to_csv("market_data.csv")
Output:
S&P 500 Nasdaq Financials Technology Healthcare \
Date
2000-01-03 NaN 13.962523 11.351282 41.629448 20.980349
2000-01-04 NaN 13.699696 10.855049 39.517487 20.504265
2000-01-05 NaN 14.061081 10.769754 38.930832 20.320322
2000-01-06 NaN 14.603155 11.242729 37.640156 20.385239
2000-01-07 NaN 14.759212 11.428815 38.297241 20.634104
There is nothing wrong with the data. The assumed problem is caused by data.columns = tickers.keys()
. The thing to realize is that yf.download
returns data with the columns sorted alphabetically:
data = (yf.download(list(tickers.values()),
start="2000-01-01",
end="2022-12-31")["Adj Close"]
.head()
)
Output:
Ticker JETS XLE XLF XLK XLV \
Date
2000-01-03 00:00:00+00:00 NaN 13.962521 11.351274 41.629471 20.980354
2000-01-04 00:00:00+00:00 NaN 13.699698 10.855047 39.517498 20.504265
2000-01-05 00:00:00+00:00 NaN 14.061077 10.769759 38.930836 20.320320
2000-01-06 00:00:00+00:00 NaN 14.603158 11.242724 37.640175 20.385246
2000-01-07 00:00:00+00:00 NaN 14.759203 11.428815 38.297245 20.634102
Ticker ^GSPC ^IXIC
Date
2000-01-03 00:00:00+00:00 1455.219971 4131.149902
2000-01-04 00:00:00+00:00 1399.420044 3901.689941
2000-01-05 00:00:00+00:00 1402.109985 3877.540039
2000-01-06 00:00:00+00:00 1403.449951 3727.129883
2000-01-07 00:00:00+00:00 1441.469971 3882.620117
I.e., tickers.keys()
overwrites the incorrect names. Use df.rename
instead, passing the reversed dictionary, and sort the columns if desired:
data = data.rename(columns={v: k for k, v in tickers.items()})
data = data[tickers.keys()]
data.columns
Index(['S&P 500', 'Nasdaq', 'Financials', 'Technology', 'Healthcare', 'Energy',
'Airlines'],
dtype='object', name='Ticker')
So, why is there no initial data for JETS ('Airlines')? Because the ETF has an inception date of 28 April 2015. It didn't yet exist.