pythonpandasattribution

AttributeError: 'tuple' object has no attribute 'loc' when filtering on pandas dataframe


Given the following DataFrame -

json_path Reporting Group Entity/Grouping Entity ID Adjusted Value (Today, No Div, USD) Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD) Annualized Adjusted TWR (Since Inception, No Div, USD) Adjusted Value (No Div, USD) TWR Audit Note
data.attributes.total.children.[0].children.[0].children.[0] Barrack Family William and Rupert Trust 9957007 -1.44 -1.44
data.attributes.total.children.[0].children.[0].children.[0].children.[0] Barrack Family Cash - -1.44 -1.44
data.attributes.total.children.[0].children.[0].children.[1] Barrack Family Gratia Holdings No. 2 LLC 8413655 55491732.66 -0.971018847 -0.971018847 11.52490309 55491732.66
data.attributes.total.children.[0].children.[0].children.[1].children.[0] Barrack Family Investment Grade Fixed Income - 18469768.6 18469768.6
data.attributes.total.children.[0].children.[0].children.[1].children.[1] Barrack Family High Yield Fixed Income - 3668982.44 -0.205356545 -0.205356545 4.441190127 3668982.44

The following code should filter out rows where rows != 'Cash' (Entity/Grouping column) and that have a blank value in either Adjusted TWR (Current Quarter, No Div, USD) column, Adjusted TWR (YTD, No Div, USD) column or Annualized Adjusted TWR (Since Inception, No Div, USD) column.

Code: The following code expects to achieve this -

def twr_exceptions_logic():
    perf_asset_class_df = databases_creation()

    m1 = perf_asset_class_df.loc[(perf_asset_class_df['Entity/Grouping']!= 'Cash')]
    m2 = perf_asset_class_df[['Adjusted TWR (Current Quarter, No Div, USD)',
                              'Adjusted TWR (YTD, No Div, USD)',
                              'Annualized Adjusted TWR (Since Inception, No Div, USD)']].eq('').any(1)
    perf_asset_class_df.loc[m1&m2]
    
    return perf_asset_class_df

Error: being still relatively new to Python, I am unsure why this AttributeError is throwing back -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2689024934.py in <module>
     48     writer.save()
     49 
---> 50 xlsx_writer()

C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2689024934.py in xlsx_writer()
      1 # Function that writes Exceptions Report and API Response as a consolidated .xlsx file.
      2 def xlsx_writer():
----> 3     reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df = twr_exceptions_logic()
      4 
      5 #   Creating and defining filename for exceptions report

C:\Users\WILLIA~1.FOR\AppData\Local\Temp/ipykernel_18756/2834095962.py in twr_exceptions_logic()
      2     perf_asset_class_df = databases_creation()
      3 
----> 4     m1 = perf_asset_class_df.loc[(perf_asset_class_df['Entity/Grouping']!= 'Cash')]
      5     m2 = perf_asset_class_df[['Adjusted TWR (Current Quarter, No Div, USD)',
      6                               'Adjusted TWR (YTD, No Div, USD)',

AttributeError: 'tuple' object has no attribute 'loc'

Help: I have done some research on this AttributionError and am finding conflicting information, as I believe it relates to my particular issue. It looks as if perf_asset_class_df is being returned as a tuple from the database_creation() function. However, it is definitely a pandas dataframe and the only thing database_creation() does is to take a dataframe named df and apply .loc in order to create a pandas dataframe called perf_asset_class_df or am I missing something

perf_asset_class_df = df[df['json_path'].str.contains(r'(?:\.children\.\[\d+\]){4}')]

databases_creation() function -

def databases_creation():
    df = data_cleansing()

    unknown_df = df[df['Entity/Grouping'].str.contains('Unknown')==True]

    perf_asset_class_df = df[df['json_path'].str.contains(r'(?:\.children\.\[\d+\]){4}')]
    perf_asset_class_df = pd.DataFrame(perf_asset_class_df)
    
    perf_entity_df = df[df['json_path'].str.count(r'\.children').eq(3)]
    perf_entity_group_df = df[df['json_path'].str.count(r'\.children').eq(2)]

    return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df

Does anyone have any suggestions?


Solution

  • return reporting_group_df, unknown_df, perf_asset_class_df, perf_entity_df, perf_entity_group_df
    

    This line returns a tuple of data frames. You'll need to unpack it when you call the function to get the data frame you're interested in. When your code calls databases_creation() it saves this entire tuple as perf_asset_class_df. If you only want that data frame you'll need to unpack it:

    _, _, perf_asset_class_df, _, _ = databases_creation()
    

    This unpacks the tuple, saving each element to the respective variable. We use _ for the parts we don't care about by convention but it could be any other variable.