pythonpython-3.xpandasdataframeyfinance

How to deal with multi-level column names downloaded with yfinance


I have a list of tickers (tickerStrings) that I have to download all at once. When I try to use Pandas' read_csv it doesn't read the CSV file in the way it does when I download the data from yfinance.

I usually access my data by ticker like this: data['AAPL'] or data['AAPL'].Close, but when I read the data from the CSV file it does not let me do that.

if path.exists(data_file):
    data = pd.read_csv(data_file, low_memory=False)
    data = pd.DataFrame(data)
    print(data.head())
else:
    data = yf.download(tickerStrings, group_by="Ticker", period=prd, interval=intv)
    data.to_csv(data_file)

Here's the print output:

                  Unnamed: 0                 OLN               OLN.1               OLN.2               OLN.3  ...                 W.1                 W.2                 W.3                 W.4     W.5
0                        NaN                Open                High                 Low               Close  ...                High                 Low               Close           Adj Close  Volume
1                   Datetime                 NaN                 NaN                 NaN                 NaN  ...                 NaN                 NaN                 NaN                 NaN     NaN
2  2020-06-25 09:30:00-04:00    11.1899995803833  11.220000267028809  11.010000228881836  11.079999923706055  ...   201.2899932861328   197.3000030517578  197.36000061035156  197.36000061035156  112156
3  2020-06-25 09:45:00-04:00  11.130000114440918  11.260000228881836  11.100000381469727   11.15999984741211  ...  200.48570251464844  196.47999572753906  199.74000549316406  199.74000549316406   83943
4  2020-06-25 10:00:00-04:00  11.170000076293945  11.220000267028809  11.119999885559082  11.170000076293945  ...  200.49000549316406  198.19000244140625   200.4149932861328   200.4149932861328   88771

The error I get when trying to access the data:

Traceback (most recent call last):
File "getdata.py", line 49, in processData
    avg = data[x].Close.mean()
AttributeError: 'Series' object has no attribute 'Close'

Solution

  • In dealing with financial data from multiple tickers, specifically using yfinance and pandas, the process can be broken down into a few key steps: downloading the data, organizing it in a structured format, and accessing it in a way that aligns with the user's needs. Below, the answer is organized into clear, actionable segments.

    Downloading Data for Multiple Tickers

    Direct Download and DataFrame Creation

    Multi-Ticker, Structured DataFrame Approach

    Handling CSV Files with Multi-Level Column Names

    To read a CSV file that has been saved with yfinance data (which often includes multi-level column headers), adjustments are necessary to ensure the DataFrame is accessible in the desired format.

    Flattening Multi-Level Columns for Easier Access

    Depending on the initial structure of the DataFrame, multi-level columns many need to be flattened to a single level, adding clarity and simplicity to the dataset.

    Individual Ticker File Management

    For those preferring to manage each ticker's data separately, downloading and saving each ticker's data to individual files can be a straightforward approach.

    Consolidating Multiple Ticker Files into a Single DataFrame

    If data for each ticker is stored in separate files, combining these into a single DataFrame can be accomplished through file reading and concatenation.

    This structured approach ensures that regardless of the initial data format or how it's stored, you can effectively organize and access financial data for multiple tickers using yfinance and pandas.


    Overview of Data Representations

    This seciton showcases examples of financial data represented in both multi-level and single-level column formats. These representations are crucial for understanding different data structures and their implications for data analysis in financial contexts.

    Multi-Level Column Data

    Multi-level column data can be complex but allows for the organization of related data under broader categories. This structure is especially useful for datasets where each entity (e.g., a stock ticker) has multiple attributes (e.g., Open, High, Low, Close prices).

    Example: DataFrame with Multi-Level Columns

    Below is a sample DataFrame showcasing multi-level column data for two stock tickers, AAPL and MSFT. Each ticker has multiple attributes, such as Open, High, Low, Close, Adjusted Close, and Volume.

                    AAPL                                                    MSFT                                
                    Open      High       Low     Close Adj Close     Volume Open High Low Close Adj Close Volume
    Date                                                                                                        
    1980-12-12  0.513393  0.515625  0.513393  0.513393  0.405683  117258400  NaN  NaN NaN   NaN       NaN    NaN
    1980-12-15  0.488839  0.488839  0.486607  0.486607  0.384517   43971200  NaN  NaN NaN   NaN       NaN    NaN
    1980-12-16  0.453125  0.453125  0.450893  0.450893  0.356296   26432000  NaN  NaN NaN   NaN       NaN    NaN
    1980-12-17  0.462054  0.464286  0.462054  0.462054  0.365115   21610400  NaN  NaN NaN   NaN       NaN    NaN
    1980-12-18  0.475446  0.477679  0.475446  0.475446  0.375698   18362400  NaN  NaN NaN   NaN       NaN    NaN
    

    Example: CSV Format of Multi-Level Columns

    Representing the above DataFrame in CSV format poses a unique challenge, as shown below. The multi-level structure is flattened into two header rows followed by the data rows.

    ,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,MSFT,MSFT,MSFT,MSFT,MSFT,MSFT
    ,Open,High,Low,Close,Adj Close,Volume,Open,High,Low,Close,Adj Close,Volume
    Date,,,,,,,,,,,,
    1980-12-12,0.5133928656578064,0.515625,0.5133928656578064,0.5133928656578064,0.40568336844444275,117258400,,,,,,
    1980-12-15,0.4888392984867096,0.4888392984867096,0.4866071343421936,0.4866071343421936,0.3845173120498657,43971200,,,,,,
    1980-12-16,0.453125,0.453125,0.4508928656578064,0.4508928656578064,0.3562958240509033,26432000,,,,,,
    

    Single-Level Column Data

    For datasets where each entity shares a uniform set of attributes, single-level column data structures are ideal. This simpler format facilitates easier data manipulation and analysis, making it a common choice for many applications.

    Example: DataFrame with Single-Level Columns

    Below is a sample DataFrame displaying single-level column data for the MSFT stock ticker. It includes attributes such as Open, High, Low, Close, Adjusted Close, and Volume, alongside the ticker symbol for each entry. This format is straightforward, enabling direct access to each attribute of the stock data.

                    Open      High       Low     Close  Adj Close      Volume ticker
    Date                                                                            
    1986-03-13  0.088542  0.101562  0.088542  0.097222   0.062205  1031788800   MSFT
    1986-03-14  0.097222  0.102431  0.097222  0.100694   0.064427   308160000   MSFT
    1986-03-17  0.100694  0.103299  0.100694  0.102431   0.065537   133171200   MSFT
    1986-03-18  0.102431  0.103299  0.098958  0.099826   0.063871    67766400   MSFT
    1986-03-19  0.099826  0.100694  0.097222  0.098090   0.062760    47894400   MSFT
    

    Example: CSV Format of Single-Level Columns

    When single-level column data is exported to a CSV format, it results in a straightforward, easily readable file. Each row corresponds to a specific date, and each column header directly represents an attribute of the stock data. This simplicity enhances the CSV's usability for both humans and software applications.

    Date,Open,High,Low,Close,Adj Close,Volume,ticker
    1986-03-13,0.0885416641831398,0.1015625,0.0885416641831398,0.0972222238779068,0.0622050017118454,1031788800,MSFT
    1986-03-14,0.0972222238779068,0.1024305522441864,0.0972222238779068,0.1006944477558136,0.06442664563655853,308160000,MSFT
    1986-03-17,0.1006944477558136,0.1032986119389534,0.1006944477558136,0.1024305522441864,0.0655374601483345,133171200,MSFT
    1986-03-18,0.1024305522441864,0.1032986119389534,0.0989583358168602,0.0998263880610466,0.06387123465538025,67766400,MSFT
    1986-03-19,0.0998263880610466,0.1006944477558136,0.0972222238779068,0.0980902761220932,0.06276042759418488,47894400,MSFT
    

    This section exemplifies how single-level column data is organized, providing an intuitive and accessible way to work with financial datasets. Whether in DataFrame or CSV format, single-level data structures support efficient data processing and analysis tasks.