Full question-
Search each column in a DataFrame to determine when the first instance of a value greater than a value stored in the last row of each column in the DataFrame and output the index
Ex. of df.head():
Well A1 A2 A3 A4
Temperature
25.0 371.335253 360.026443 253.228769 593.436104
25.2 331.957145 332.224668 233.607595 561.057715
25.4 305.472591 303.777874 213.500582 535.310186
25.6 285.713623 274.069361 202.024427 515.261876
25.8 252.716374 254.610848 181.719415 488.988468
Ex. of df.tail():
Well A1 A2 A3 A4
Temperature
94.79 -441.775980 -664.549239 1060.674188 1158.481056
94.99 -492.189733 -709.521424 1029.628209 1087.625128
mean 280.759521 283.417750 201.471571 519.939366
std 72.404373 69.023406 45.447202 58.150127
4*std 570.377014 559.511373 383.260378 752.539875
I want to use the value of 4*std in A1 (570.37) and search starting from the top of the column for the first value greater than (570.37) in A1 and output the Temperature. I need to repeat this for all columns.
I would like the output as a new dataframe like this example below...I'm lost on how to construct this?
Well Temp
A1 26.0
A2 27.6
A3 26.8
... ...
H12 27.2
I would appreciate any help!
I believe you need if greater value for each column always exist:
print (df)
A1 A2 A3 A4
Well Temperature
25.0 371.335253 360.026443 253.228769 593.436104
25.2 331.957145 632.224668 233.607595 561.057715
25.4 3005.472591 303.777874 213.500582 535.310186
25.6 285.713623 274.069361 202.024427 515.261876
25.8 252.716374 254.610848 181.719415 488.988468
94.79 -441.775980 -664.549239 1060.674188 1158.481056
94.99 -492.189733 -709.521424 1029.628209 1087.625128
mean 280.759521 283.417750 201.471571 519.939366
std 72.404373 69.023406 45.447202 58.150127
4*std 570.377014 559.511373 383.260378 752.539875
df1 = df.iloc[:-3].gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
print (df1)
Well Temp
0 A1 25.4
1 A2 25.2
2 A3 94.79
3 A4 94.79
Details:
print (df.iloc[:-3].gt(df.iloc[-1]))
A1 A2 A3 A4
Well Temperature
25.0 False False False False
25.2 False True False False
25.4 True False False False
25.6 False False False False
25.8 False False False False
94.79 False False True True
94.99 False False True True
print (df.iloc[:-3].gt(df.iloc[-1]).idxmax())
A1 25.4
A2 25.2
A3 94.79
A4 94.79
dtype: object
If is possible some value is not greater one possible solution is add new row to the end with NaN
index:
print (df)
A1 A2 A3 A4
Well Temperature
25.0 371.335253 360.026443 253.228769 593.436104
25.2 331.957145 332.224668 233.607595 561.057715
25.4 3005.472591 303.777874 213.500582 535.310186
25.6 285.713623 274.069361 202.024427 515.261876
25.8 252.716374 254.610848 181.719415 488.988468
94.79 -441.775980 -664.549239 1060.674188 1158.481056
94.99 -492.189733 -709.521424 1029.628209 1087.625128
mean 280.759521 283.417750 201.471571 519.939366
std 72.404373 69.023406 45.447202 58.150127
4*std 570.377014 559.511373 383.260378 752.539875
df1 = df.iloc[:-3].append((df.iloc[-1] + 1).rename(np.nan))
print (df1)
A1 A2 A3 A4
Well Temperature
25.0 371.335253 360.026443 253.228769 593.436104
25.2 331.957145 332.224668 233.607595 561.057715
25.4 3005.472591 303.777874 213.500582 535.310186
25.6 285.713623 274.069361 202.024427 515.261876
25.8 252.716374 254.610848 181.719415 488.988468
94.79 -441.775980 -664.549239 1060.674188 1158.481056
94.99 -492.189733 -709.521424 1029.628209 1087.625128
NaN 571.377014 560.511373 384.260378 753.539875
df2 = df1.gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
print (df2)
Well Temp
0 A1 25.4
1 A2 NaN
2 A3 94.79
3 A4 94.79
print (df1.gt(df.iloc[-1]))
A1 A2 A3 A4
Well Temperature
25.0 False False False False
25.2 False False False False
25.4 True False False False
25.6 False False False False
25.8 False False False False
94.79 False False True True
94.99 False False True True
NaN True True True True