pythondataframeformatjupyter

Style a dataframe in notebook with indentation


I have a table, which I have read from a database. I want to display the table in a jupyter notebook, showing the elements of totals and subtotals indented. I want the end result to look like this

enter image description here

The data will come from the database in a format like this

from pandas import MultiIndex, DataFrame, Series
index = MultiIndex.from_tuples(
        [
            ('total', 'subtotal', 'a'),
            ('total', 'subtotal', 'b'),
            ('total', 'subtotal', 'c'),
            ('total', 'subtotal', 'd'),
            ('total', 'foo', 'foo'),
            ('total', 'bar', 'bar')],
            names=['module_1', 'module_2', 'module_3'])

musd = [
    106.564338488296,
    60.5686589737704,
    311.695156173571,
    -90.3794796906721,
    29.6147863260859,
   -49.0048344046974]

# This is how the data as it is loaded from the database
df = Series(musd, index=index, name='musd').to_frame()

I try to make a new dataframe from a dictionary where all subtotals and totals are added as separate rows, and the column names have the proper indentation.

# Template indicating how to sum and the indentation
template = DataFrame(
    [
        ("a", "module_3", 4),
        ("b", "module_3", 4),
        ("c", "module_3", 4),
        ("d", "module_3", 4),
        ("subtotal", "module_2", 2),
        ("foo", "module_3", 2),
        ("bar", "module_3", 2),
        ("total", "module_1", 0)], columns=["name", "sum_over", "indent"])

# Create dataframe where names are indentated, using space, and with the proper sums
d = {f"{' '*indent}{name}": sum(df.query(f"{sum_over} == '{name}'")["musd"])
     for name, sum_over, indent in template.values}

# Create dataframe from dictionary
df_formated = DataFrame(d, index=[0]).T.reset_index().set_axis(['', 'musd'], axis=1)

# I try to display the dataframe using the 'style' property of the dataframe
df_formated.style.format(precision=1)

Which produces this

enter image description here

There is a lot of stuff missing here, lines, bold and italics etc. I hope to figure that out later. For now my biggest issue is that the columns are not indented. I have tried to add the indentation at the end of the string as well. But that does not work

And for the record, I don't necessarily have to use ´dataframe.style´, any other solution is also welcome.


Solution

  • Quick fix

    Inspired by this answer to a loosely related question, a bit of a hack would be using non-breaking spaces (e.g.  ) in combination with left-alignment for indentation:

    import pandas as pd
    
    df = pd.DataFrame(list(
        {4 * " " + "a": 106.6,
         4 * " " + "b": 60.6,
         2 * " " + "subtotal": 388.4,
         0 * " " + "total": 369.1}.items()), columns=["", "musd"])
    df_style = df.style.set_properties(subset=[""], **{"text-align": "left"})
    df_style
    

    This produces:

    indentation result

    CSS solution

    However, a better idea is probably: use CSS everywhere for styling your table. This has the benefit of keeping content and style completely separate, unlike in the approach above.

    import pandas as pd
    
    data = {"a": 106.6, "b": 60.6, "subtotal": 388.4, "total": 369.1}
    df = pd.DataFrame(list(data.items()), columns=["", "musd"])
    
    # Left-align column ""
    props = {"text-align": "left"}
    df_style = df.style.format(precision=1).set_properties(subset=[""], **props)
    
    # Make rows at index 0 and 1 italic, add indent via CSS
    props = {"text-indent": "2em", "font-style": "italic"}
    df_style = df_style.set_properties(subset=pd.IndexSlice[0:1, :], **props)
    
    # Add indent for row at index 2
    props = {"text-indent": "1em"}
    df_style = df_style.set_properties(subset=pd.IndexSlice[2, :], **props)
    
    # Add border around row at index 3, make content bold
    props = {"border-top": "1px solid black", "border-bottom": "3px double black",
             "font-weight": "bold"}
    df_style = df_style.set_properties(subset=pd.IndexSlice[3, :], **props)
    
    df_style
    

    This produces:

    CSS result

    If you would furthermore like to hide the column of indexes (0, 1, 2, …), you could add df_style.hide(axis="index") somewhere before rendering the table.

    Note that this answer is still a bit sloppy, in that the indents are added to the complete rows rather than to their entries in the first column only. It does not really matter in the given context, since the indents have no effect on the second column, which is right-aligned. In a different context, we could fix this by splitting the styles once more, using e.g. pd.IndexSlice[0:1, ""] for applying text-indent: 2em to column "" in rows 0+1 only and keeping pd.IndexSlice[0:1, :] for applying font-style: italic to the complete rows 0+1.