I have some data which has the following shape (with header)
Name,Signal,Date
MyName,"1,2,3,4,5,6,7,8,9,10",19-04-2024
MyName,"1,2,3,4,5,6,7,8,9,10",19-04-2024
I'm interested in filtering the rows based on the sum of the array in "Signal". So I tried the following :
df = read_csv("my_csv.csv", dtype={"Signal" : "string"}, parse_dates=True)
for i in df["Signal"]:
t = np.array([int(x) for x in i.split(",")])
if t.sum() == 100:
#etc
This approach raises some issues though :
Having read in your dataframe, you can then convert the Signal
values into a list of integers using this code:
df['Signal'] = df['Signal'].apply(lambda s:list(map(int, s.split(','))))
Output:
Name Signal Date
0 MyName [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 19-04-2024
1 MyName [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 19-04-2024
Then you can play with your values to filter the dataframe. For example, to filter on sums:
sums = df['Signal'].apply(sum)
# 0 55
# 1 55
# Name: Signal, dtype: int64
mask = sums == 100
# 0 False
# 1 False
# Name: Signal, dtype: bool
df_filtered = df[mask]
If you want to use the sums more than once, it's probably best to save them in the dataframe e.g.
df['sum'] = df['Signal'].apply(sum)