[SOLVED] How to display a random sample from a styled DataFrame?

How to display a random sample from a styled DataFrame?

I often want to view a random sample of k rows from a DataFrame rather than just the head/tail, for which I would use df.sample(frac=1.0).iloc[:k].

When I chain on .style to this sample, the styler will only see the k selected rows, and the resulting colour-mapping will be inaccurate as it only considers the sample.

How can I shuffle, sample, and style a DataFrame, whilst ensuring the styler uses all of the data?

Example

import pandas as pd
import numpy as np

#Data for testing
df = pd.DataFrame({
    'device_id': np.random.randint(200, 800, size=1000),
    'normalised_score': np.random.uniform(0, 2, size=1000),
    'severity_level': np.random.randint(-3, 4, size=1000),
})

#Inaccurate styling if I chain .style onto a sampled DataFrame:
df.sample(frac=1.0).iloc[:5].style.background_gradient(subset='severity_level', cmap='RdYlGn')

I am using a colourmap that roughly goes red-white-green over the range of severity_level (-3, -2, -1, 0, +1, +2, +3). A value of 0 should therefore display as white, but it gets coloured red in the sample below:

The colouring should consider all severity_level values, even though I only display a few randomly-selected rows.

Solution

You would need to pipe df into the styler first, and then chain on .hide, whereat you select a random subset of rows using .hide(df.sample(frac=1.0).index[k:]).

.hide doesn't take lambda functions, so you can't shuffle before .style and then access the shuffled DataFrame later in the chain.

#... data from OP
(
    df
    .style
    .background_gradient(subset='severity_level', cmap='RdYlGn')

    #Shuffle and select k indices (by hiding rows coming after k)
    .hide(df.sample(frac=1.0).index[k:])
)

A value of 0 should therefore display as white, but it gets coloured red because the styler only gets part of the data

The styler now uses all values of severity_level, irrespective of the sample displayed