pythonpandasdataframerows

How to create new row out of calculation across other rows in Pandas Data Frame?


I want to create a new row via row calculations. My DataFrame looks like this:

Para   01.21    02.21   03.21   
<48h     22      25      38 
>48h     0       1        3

I want to calc the percentage across each row such that:

Goal

Para   01.21    02.21   03.21   
<48h     22      25      38 
>48h     0       1        3
Perc.    0%      4%       8%

I had some ideas with appending a new DataFrame, but nothing really succesful. Thank you for your help!


Solution

  • Assuming you are happy to modify the input DataFrame in place, this will produce the exact output you've requested:

    df.loc['Perc.'] = (
        df.loc['>48h'] / df.loc['<48h']
    ).apply(lambda x: f"{x:.0%}")
    

    This assumes that the first column in your printed output is the index of the DataFrame.

    To unpack this a bit:

    1. The assignment df.loc['Perc.'] = will create a new row populated with the value on the right-hand side of the =.
    2. df.loc['>48h'] / df.loc['<48h'] computes the desired ratio, as a float
    3. The .apply call calls a function on each element of the new row, to convert the raw percentage number to a string formatted in the manner you requested. To unpack this a little further:
      • f"{x}" converts x to a string (it is essentially equivalent to str(x))
      • f"{x:.0%}" enhances the above by adding the .0% format specifier, which will display the value as a percentage with 0 decimal places (i.e. multiply by 100, round to 0 decimal places, and then add a % sign). There are various other format specifiers available for floating point data.

    Note that if you want to do any further processing on the resulting value, I would recommend not doing step 3 - once you've converted your data into a string it obviously can't be e.g. multiplied by other data.

    If you want the best of both worlds, keeping the internal value as a float ratio, but displaying them as a percentage, you can do this:

    df.loc['Perc.'] = (
        df.loc['>48h'] / df.loc['<48h']
    )
    
    display_str = df.T.to_string(formatters={"Perc.": lambda x: f"{x:.0%}"})
    print(display_str)
    

    This will modify df to add the Perc. row, but its values will stay as the actual float ratio. Then display_str will be an appropriately-formatted string representation of the whole dataframe, including the percentage.

    Note that formatters accepts a dict keyed by column names, not rows, so you must first transpose your data (df.T), the resulting output will be transposed:

       Para  <48h  >48h Perc.
    0  1.21  22.0   0.0    0%
    1  2.21  25.0   1.0    4%
    2  3.21  38.0   3.0    8%
    

    If you're working inside a Jupyter notebook, you can also use the df.style.format method to achieve something similar; calling

    df.T.style.format({"Perc.": "{:.0%}"})
    

    will return a pandas.io.formats.style.Styler object, but if you're in a Jupyter notebook, this will be rendered like a DataFrame with the relevant column formatted as above. Again though, this only works on a column-by-column basis, so you'll need to transpose.