pythondataframelistobject

Python Unable to Map Dataframe Columns by Name to List of Objects


I have a dataframe like the table shown below.

id shop var1 var2
1 a a b
2 b b c

I would like to populate a list of objects using just the id and shop columns, however the columns in the table may not always be in the order shown, so I would like to reference them by name opposed to index which is shown below. I've searched online but can't find a solution.

class Test:
  def __init__(self, id, shop):
    self.id = id
    self.shop = shop

def test_list(df:pd.DataFrame)->list:
    return list(map(lambda x:Test(id=x[0],shop=x[1]),df.values.tolist()))

Solution

  • Just iterate over the rows of your dataframe (I assume it's a Pandas dataframe) without converting to a list before, so you can still reference your columns by name:

    import pandas as pd
    
    df = pd.DataFrame([{"id": 1, "shop": "a", "var1": "a", "var2": "b"},
                       {"id": 2, "shop": "b", "var1": "b", "var2": "c"}])
    
    class Test:
      def __init__(self, id, shop):
        self.id = id
        self.shop = shop
    
    def test_list(df: pd.DataFrame) -> list:
        return [Test(id=row["id"], shop=row["shop"]) for _, row in df.iterrows()]
    
    result = test_list(df)
    
    assert len(result) == 2
    assert result[0].id == 1 and result[0].shop == "a"
    assert result[1].id == 2 and result[1].shop == "b"
    

    An alternative would be: (1) only keep the columns of the parameters you are interested in, (2) convert the result to a list of dictionaries, (3) use the dictionaries as named parameters in your Test instance creation. This could look as follows:

    def test_list(df: pd.DataFrame) -> list:
        return [Test(**row) for row in df.filter(["id", "shop"]).to_dict("records")]
    
    result = test_list(df)
    
    assert len(result) == 2
    assert result[0].id == 1 and result[0].shop == "a"
    assert result[1].id == 2 and result[1].shop == "b"
    

    You might want to time both approaches to see which is faster (if that's critical for you).