pandasdataframepandas-styles

How to display a random sample from a styled DataFrame?


I often want to view a random sample of k rows from a DataFrame rather than just the head/tail, for which I would use df.sample(k).

When I chain on .style to this sample, the styler will only see the k selected rows, and the resulting colour-mapping will be inaccurate as it only considers the sample.

How can I shuffle, sample, and style a DataFrame, whilst ensuring the styler uses all of the data?

Example

import pandas as pd
import numpy as np

#Data for testing
df = pd.DataFrame({
    'device_id': np.random.randint(200, 800, size=1000),
    'normalised_score': np.random.uniform(0, 2, size=1000),
    'severity_level': np.random.randint(-3, 4, size=1000),
})

#Inaccurate styling if I chain .style onto a sampled DataFrame:
df.sample(5).style.background_gradient(subset='severity_level', cmap='RdYlGn')

I am using a colourmap that roughly goes red-white-green over the range of severity_level (-3, -2, -1, 0, +1, +2, +3). A value of 0 should therefore display as white, but it gets coloured red in the sample below:

enter image description here

The colouring should consider all severity_level values, even though I only display a few randomly-selected rows.


Solution

  • The .background_gradient function accepts vmin and vmax arguments to define the range for the gradient. When these parameters are left unspecified, the minimum and maximum values are pulled from the data (or gmap) ref, but it is also possible to specify these values directly.

    The appropriate gradient colours can be achieved in the sampled version, even when using .sample on the DataFrame first, by passing the min/max values from the original DataFrame's 'severity_level' column to .background_gradient.

    k = 5
    (
        df
        .sample(n=k)
        .style
        .background_gradient(
            subset='severity_level',
            cmap='RdYlGn',
            vmin=df['severity_level'].min(),
            vmax=df['severity_level'].max()
        )
    )
    

    Styled DataFrame