pythonpandasdataframeyfinance

How to Drop Level 0 Column in Pandas DataFrame returned by yfinance.download


I'm using yfinance.download to get data of bunch of stocks, and I'm able to play around with these stocks using the returned pandas dataframe.

I want to drop specific stocks from my dataframe, but I am not able to...

Minimum Code to Reproduce:

# several stocks for testing purposes, get dataframe
tickers = ['AAPL', 'TSLA', 'AMZN', 'GOOGL', 'MSFT', 'META', 'NVDA', 'PYPL', 'ADBE', 'NFLX']
data = yf.download(tickers, period="1y", interval="1wk", group_by='ticker')

print(data.columns.levels[0])

# trying to remove the first one of these level 0 columns...
data = data.drop(columns=data.columns.levels[0][0], axis=1, level=0, inplace=False)

print(data.columns.levels[0])

Output I'm seeing in console: Output

As you can see, AAPL remains in both the first output and second output. Not sure why it isn't being deleted.

I've tried playing around with inplace=True as well and not assigning to data, but I still get the same issue.

Let me know if there is anything else I can provide for you guys, thanks in advance.


Solution

  • When you drop columns, pandas doesn't automatically clean up unused levels in the MultiIndex. The levels still contain all original values even if they're no longer used.

    If you print the df.columns you can see that the requested "AAPL" has been removed. To update the FrozenList that the df.columns.levels returns you will need to remove the unused levels.

    tickers = ['AAPL', 'TSLA', 'AMZN', 'GOOGL', 'MSFT', 'META', 'NVDA', 'PYPL', 'ADBE', 'NFLX']
    
    data = yf.download(tickers, period="1y", interval="1wk", group_by='ticker')
    
    # I have changed the code here for readability.
    data = data.drop(columns="AAPL", axis=1, level=0)
    
    data.columns = data.columns.remove_unused_levels()
    

    pandas.MultiIndex.remove_unused_levels