Efficiently filtering comma separated strings in pandas/dask

I have some data which has the following shape (with header)

Name,Signal,Date
MyName,"1,2,3,4,5,6,7,8,9,10",19-04-2024
MyName,"1,2,3,4,5,6,7,8,9,10",19-04-2024

I'm interested in filtering the rows based on the sum of the array in "Signal". So I tried the following :

df = read_csv("my_csv.csv", dtype={"Signal" : "string"}, parse_dates=True)

for i in df["Signal"]:
   t = np.array([int(x) for x in i.split(",")])
   if t.sum() == 100:
       #etc

This approach raises some issues though :

How can I then record the index of the current row to then filter/drop it from my dataframe
Can this operation be sped up/done more efficiently? I was thinking of allocating a 2d numpy array and then parsing in the numbers to only allocate once but not sure this would make a difference
When using dask, which lacks a global row index, is there a more efficient way to filter the rows without allocating all the data into numpy arrays?

Solution

Having read in your dataframe, you can then convert the Signal values into a list of integers using this code:

df['Signal'] = df['Signal'].apply(lambda s:list(map(int, s.split(','))))

Output:

     Name                           Signal        Date
0  MyName  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  19-04-2024
1  MyName  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  19-04-2024

Then you can play with your values to filter the dataframe. For example, to filter on sums:

sums = df['Signal'].apply(sum)
# 0    55
# 1    55
# Name: Signal, dtype: int64

mask = sums == 100
# 0    False
# 1    False
# Name: Signal, dtype: bool

df_filtered = df[mask]

If you want to use the sums more than once, it's probably best to save them in the dataframe e.g.

df['sum'] = df['Signal'].apply(sum)