I want to create a new row via row calculations. My DataFrame looks like this:
Para 01.21 02.21 03.21
<48h 22 25 38
>48h 0 1 3
I want to calc the percentage across each row such that:
Goal
Para 01.21 02.21 03.21
<48h 22 25 38
>48h 0 1 3
Perc. 0% 4% 8%
I had some ideas with appending a new DataFrame, but nothing really succesful. Thank you for your help!
Assuming you are happy to modify the input DataFrame in place, this will produce the exact output you've requested:
df.loc['Perc.'] = (
df.loc['>48h'] / df.loc['<48h']
).apply(lambda x: f"{x:.0%}")
This assumes that the first column in your printed output is the index of the DataFrame.
To unpack this a bit:
df.loc['Perc.'] =
will create a new row populated with the value on the right-hand side of the =
.df.loc['>48h'] / df.loc['<48h']
computes the desired ratio, as a float.apply
call calls a function on each element of the new row, to convert the raw percentage number to a string formatted in the manner you requested. To unpack this a little further:
f"{x}"
converts x
to a string (it is essentially equivalent to str(x)
)f"{x:.0%}"
enhances the above by adding the .0%
format specifier, which will display the value as a percentage with 0 decimal places (i.e. multiply by 100, round to 0 decimal places, and then add a %
sign). There are various other format specifiers available for floating point data.Note that if you want to do any further processing on the resulting value, I would recommend not doing step 3 - once you've converted your data into a string it obviously can't be e.g. multiplied by other data.
If you want the best of both worlds, keeping the internal value as a float ratio, but displaying them as a percentage, you can do this:
df.loc['Perc.'] = (
df.loc['>48h'] / df.loc['<48h']
)
display_str = df.T.to_string(formatters={"Perc.": lambda x: f"{x:.0%}"})
print(display_str)
This will modify df
to add the Perc.
row, but its values will stay as the actual float ratio. Then display_str
will be an appropriately-formatted string representation of the whole dataframe, including the percentage.
Note that formatters
accepts a dict keyed by column names, not rows, so you must first transpose your data (df.T
), the resulting output will be transposed:
Para <48h >48h Perc.
0 1.21 22.0 0.0 0%
1 2.21 25.0 1.0 4%
2 3.21 38.0 3.0 8%
If you're working inside a Jupyter notebook, you can also use the df.style.format
method to achieve something similar; calling
df.T.style.format({"Perc.": "{:.0%}"})
will return a pandas.io.formats.style.Styler
object, but if you're in a Jupyter notebook, this will be rendered like a DataFrame with the relevant column formatted as above. Again though, this only works on a column-by-column basis, so you'll need to transpose.