pythonpandasyfinance

Yfinance returning NaN for first ticker in list


I am trying to use yfinance to download historical financial data. I created a function to grab the Adjusted Closing price for a list of tickers. However, the first item on the list (currently 'S&P 500') returns NaN values from 2000-01-03 to around 2015-04-30.

When I change tickers to just "S&P 500": "^GSPC" for example, there are no issues with the output. Am I doing something wrong that is causing the first item on the list not to return any values?

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# 1. Define Parameters
tickers = {
    "S&P 500": "^GSPC",
    "Nasdaq": "^IXIC",
    "Financials": "XLF",
    "Technology": "XLK",
    "Healthcare": "XLV",
    "Energy": "XLE",
    "Airlines": "JETS"
}
start_date = "2000-01-01"
end_date = "2022-12-31"

# 2. Fetch and Prepare Data
def fetch_data(tickers, start, end):
    data = yf.download(list(tickers.values()), start=start, end=end)["Adj Close"]
    data.columns = tickers.keys()
    return data

market_data = fetch_data(tickers, start_date, end_date)
print(market_data)
market_data.to_csv("market_data.csv")

Output:

           S&P 500     Nasdaq  Financials  Technology  Healthcare  \
Date                                                                   
2000-01-03        NaN  13.962523   11.351282   41.629448   20.980349   
2000-01-04        NaN  13.699696   10.855049   39.517487   20.504265   
2000-01-05        NaN  14.061081   10.769754   38.930832   20.320322   
2000-01-06        NaN  14.603155   11.242729   37.640156   20.385239   
2000-01-07        NaN  14.759212   11.428815   38.297241   20.634104  

Solution

  • There is nothing wrong with the data. The assumed problem is caused by data.columns = tickers.keys(). The thing to realize is that yf.download returns data with the columns sorted alphabetically:

    data = (yf.download(list(tickers.values()), 
                        start="2000-01-01", 
                        end="2022-12-31")["Adj Close"]
            .head()
            )
    

    Output:

    Ticker                     JETS        XLE        XLF        XLK        XLV  \
    Date                                                                          
    2000-01-03 00:00:00+00:00   NaN  13.962521  11.351274  41.629471  20.980354   
    2000-01-04 00:00:00+00:00   NaN  13.699698  10.855047  39.517498  20.504265   
    2000-01-05 00:00:00+00:00   NaN  14.061077  10.769759  38.930836  20.320320   
    2000-01-06 00:00:00+00:00   NaN  14.603158  11.242724  37.640175  20.385246   
    2000-01-07 00:00:00+00:00   NaN  14.759203  11.428815  38.297245  20.634102   
    
    Ticker                           ^GSPC        ^IXIC  
    Date                                                 
    2000-01-03 00:00:00+00:00  1455.219971  4131.149902  
    2000-01-04 00:00:00+00:00  1399.420044  3901.689941  
    2000-01-05 00:00:00+00:00  1402.109985  3877.540039  
    2000-01-06 00:00:00+00:00  1403.449951  3727.129883  
    2000-01-07 00:00:00+00:00  1441.469971  3882.620117  
    

    I.e., tickers.keys() overwrites the incorrect names. Use df.rename instead, passing the reversed dictionary, and sort the columns if desired:

    data = data.rename(columns={v: k for k, v in tickers.items()})
    data = data[tickers.keys()]
    
    data.columns
    
    Index(['S&P 500', 'Nasdaq', 'Financials', 'Technology', 'Healthcare', 'Energy',
           'Airlines'],
          dtype='object', name='Ticker')
    

    So, why is there no initial data for JETS ('Airlines')? Because the ETF has an inception date of 28 April 2015. It didn't yet exist.