I have a dataframe like the table shown below.
id | shop | var1 | var2 |
---|---|---|---|
1 | a | a | b |
2 | b | b | c |
I would like to populate a list of objects using just the id and shop columns, however the columns in the table may not always be in the order shown, so I would like to reference them by name opposed to index which is shown below. I've searched online but can't find a solution.
class Test:
def __init__(self, id, shop):
self.id = id
self.shop = shop
def test_list(df:pd.DataFrame)->list:
return list(map(lambda x:Test(id=x[0],shop=x[1]),df.values.tolist()))
Just iterate over the rows of your dataframe (I assume it's a Pandas dataframe) without converting to a list before, so you can still reference your columns by name:
import pandas as pd
df = pd.DataFrame([{"id": 1, "shop": "a", "var1": "a", "var2": "b"},
{"id": 2, "shop": "b", "var1": "b", "var2": "c"}])
class Test:
def __init__(self, id, shop):
self.id = id
self.shop = shop
def test_list(df: pd.DataFrame) -> list:
return [Test(id=row["id"], shop=row["shop"]) for _, row in df.iterrows()]
result = test_list(df)
assert len(result) == 2
assert result[0].id == 1 and result[0].shop == "a"
assert result[1].id == 2 and result[1].shop == "b"
An alternative would be: (1) only keep the columns of the parameters you are interested in, (2) convert the result to a list of dictionaries, (3) use the dictionaries as named parameters in your Test
instance creation. This could look as follows:
def test_list(df: pd.DataFrame) -> list:
return [Test(**row) for row in df.filter(["id", "shop"]).to_dict("records")]
result = test_list(df)
assert len(result) == 2
assert result[0].id == 1 and result[0].shop == "a"
assert result[1].id == 2 and result[1].shop == "b"
You might want to time both approaches to see which is faster (if that's critical for you).