I'm looking to do what I thought would be a simple task. I have a large dataset that is created on various if conditions in python. Such as:
for index, row in df.iterrows():
if int(row['Fog_Event']) >= 4:
df.at[index, 'Fog_Event_determine'] = 'Fog'
elif int(row['Fog_Event']) == 3:
df.at[index, 'Fog_Event_determine'] = 'Dust'
elif int(row['Fog_Event']) == 2:
df.at[index, 'Fog_Event_determine'] = 'Dust'
else:
df.at[index, 'Fog_Event_determine'] = 'Background'
continue
These work perfectly to do what I want them to do, but there are some issues with the final analysis of the data. To fix the issue I need to add a running threshold value that is based on the results of the previous row. So if Row 1 >=4: then I want row 2 to be +1.
I tried this:
df['Running_threshold'] = 0
for index, row in df.iterrows():
if int(row['Fog_Event']) >= 4:
df.loc[index[+1], 'Running_threshold'] = 1
else:
continue
But this only adds a 1 to the second row of the index, which makes sense upon looking on it. How can I ask python to add a +1 to every row after the condition ['Fog_Event']) >= 4 is met?
Thank you.
np.where()
as it's more efficient than loopingcumsum()
it to get running total as you noteddf = pd.DataFrame({"Fog_Event":np.random.randint(0, 10,20)})
df = df.assign(Fog_Event_Determine=np.where(df.Fog_Event>=4, "fog", np.where(df.Fog_Event>=2, "dust", "background"))
, Running_threshold=np.where(df.Fog_Event.shift()>=4,1,0)
).assign(Running_threshold=lambda dfa: dfa.Running_threshold.cumsum())
Fog_Event Fog_Event_Determine Running_threshold
9 fog 0
3 dust 1
2 dust 1
9 fog 1
7 fog 2
0 background 3
4 fog 3
7 fog 4
6 fog 5
9 fog 6
1 background 7
6 fog 7
7 fog 8
8 fog 9
6 fog 10
9 fog 11
6 fog 12
2 dust 13
7 fog 13
8 fog 14