pythonpandasloops

Iterating over columns in multiple dataframes and conditionally performing an operation


I have three time series dataframes

df_list=[px, SC, SMA]

Each dataframe has identical columns and identical dates. I also have a list of securities (condensed for the purpose of the example):

securities3=['SPX Index','BIL US Equity']

I'm trying to do something like the below, which does not have the correct syntax:

for df in df_list:
    for col in df.columns:       
        if col==securities3:
            ***create new dataframe here***

In words, I want to iterate over each dataframe in df_list, within each column of df, when the column matches the components of securities3, I want a new dataframe to be formed with those three columns (one column from each dataframe, matching securities3 list.

To provide more detail, please see below for sample data:

import pandas as pd

px_data = {'Date': ['8/11/18', '8/12/18', '8/13/18', '8/14/18'],
        'SPX Index': [58.63, 21.25, 19.17, 18.8],
        'BIL US Equity': [35,105,27,98]}

SC_data = {'Date': ['8/11/18', '8/12/18', '8/13/18', '8/14/18'],
        'SPX Index': [20.50, 6, 82, 74.6],
        'BIL US Equity': [74,62,8,99]}

SMA_data = {'Date': ['8/11/18', '8/12/18', '8/13/18', '8/14/18'],
        'SPX Index': [2, 95.3, 39, 68.27],
        'BIL US Equity': [58,37,74,11]}

px = pd.DataFrame(px_data)
SC = pd.DataFrame(SC_data)
SMA = pd.DataFrame(SMA_data)

The target output is:

SPX_data = {'Date': ['8/11/18', '8/12/18', '8/13/18', '8/14/18'],
     'SPX Index': [58.63, 21.25, 19.17, 18.8],
     'SPX Index': [20.50, 6, 82, 74.6],
     'SPX Index': [2, 95.3, 39, 68.27]}
    
SPX = pd.DataFrame(SPX_data)

I'm trying to create the SPX dataframe (for example) in a loop, because the list (securities3) can change. Also, as a point of reference, because this is part of a larger process, the columns in the dataframes will always be identical to the strings (and positions) within the list, as they are in this example.

Thanks for the help.


Solution

  • Assuming all dataframes have the same dates in the same order, you can concatenate the dates + the columns with values on axis=1:

    date = df_list[0]["Date"]
    out = [pd.concat([date] + [df[sec] for df in df_list], axis=1) for sec in securities3]
    
    for df in out:
        print(df)
    
          Date  SPX Index  SPX Index  SPX Index
    0  8/11/18      58.63       20.5       2.00
    1  8/12/18      21.25        6.0      95.30
    2  8/13/18      19.17       82.0      39.00
    3  8/14/18      18.80       74.6      68.27
          Date  BIL US Equity  BIL US Equity  BIL US Equity
    0  8/11/18             35             74             58
    1  8/12/18            105             62             37
    2  8/13/18             27              8             74
    3  8/14/18             98             99             11
    

    Update:

    If 'Date' is the index, remove the date from the concat call.

    If you want each dataframe in a variable, remove the outer loop in the list comprehension.

    for sec in securities3:
        df = pd.concat([df[sec] for df in df_list], axis=1)