I am processing files that contain data in an unusable format. After processing one of the files I am left with a dataframe and a singular value.
The dataframe looks like this:
df = pd.DataFrame({
'A': ['foo', 'foo', 'foo', 'fizz', 'fizz', 'fizz', 'fizz'],
'B': ['bar', 'bar', 'bar', 'buzz', 'buzz', 'buzz', 'baz'],
'C': [10,10,10,10,10,10,10]
})
val = 20.0
The value is not supposed to be apart of my dataframe, but needs to be included in my TOTAL calculation. This is how I extract it from the file after I read it into a dataframe (it returns a string so I am casting to float):
if len(df.loc[df['ID'].eq("Settle"), 'C'].values) > 0:
temp = df.loc[df['ID'].eq("Settle"), 'C']
if temp.values[0].isnumeric():
num = float(temp.values[0])
else:
num = 0.0
else:
num = 0.0
now I need to do groupby.agg()
with the following condition:
A == 'foo'
I need to add val to the sumThis is the basic code I would use to get the sums for each value in A
, but I cannot figure out how factor in my condition. I'm assuming I can use np.where
or lambda
but unsure how to use that with .agg()
and achieve my output.
out = df.groupby(['A'], sort=False, as_index=False).agg({"C":"sum"})
Expected Output:
A C
foo 50
fizz 40
You could just append a row with the extra value (e.g. with concat
) prior to performing the aggregation:
out = (pd.concat([df, pd.DataFrame([{'A': 'foo', 'C': 20}])])
.groupby('A', as_index=False, sort=False)['C'].sum()
)
Output:
A C
0 foo 50
1 fizz 40
If you want to add different values to several groups:
extra = [('foo', 20), ('fizz', 100)]
out = (pd.concat([df, pd.DataFrame(extra, columns=['A', 'C'])])
.groupby('A', as_index=False, sort=False)['C'].sum()
)