pythonpandasjupyter-notebookpython-xarray

Pandas: Limiting __repr__ and _repr_html_ behaviour with Xarray


I am storing large xarrays in my dataframe, but any time I display the dataframe in Jupyter or the terminal it takes way to long (11 seconds for a 10 row dataframe). I'd imagine it has something to do with how pandas is grabbing whatever repr information from the individual cells, and it's shoving the whole xarray in there, but then truncates the display after the fact? Who knows?

Is there some pandas setting that will limit this behavior?

Here's the code:




import pandas as pd
import numpy as np
import xarray as xr


df = pd.DataFrame({'xarrays':[xr.DataArray(np.random.randn(50,50)) 
                                for  _ in range(10)], # 10 50x50 xarrays
                    'other_stuff':np.arange(10)})

The attached image show the time for displaying the whole frame, the xarray series, and a normal series, but the quick breakdown:

Display Type Time
Whole df 11 s
Xarray series 6 s
normal series 0 s
directly displaying df repr_html 0.2 s

enter image description here

Expected to display abbreviated/truncated xarray rows without much fuss. Takes way too long just to display.


Solution

  • Solved it! Apparently the option I was looking for was pandas' display.pprint_nest_depth. After limiting that to 1, things sped up considerably, but I'm not yet sure the implication of that.

    # ------- Same dataframe as before ----------------
    
    import pandas as pd
    import numpy as np
    import xarray as xr
    
    
    df = pd.DataFrame({'xarrays':[xr.DataArray(np.random.randn(50,50)) 
                                    for  _ in range(10)], # 10 50x50 xarrays
                        'other_stuff':np.arange(10)})
    
    # ------- Experimenting with pprint settings ----------------
    
    import IPython.display
    
    # NOTE apparently my computer has sped up a bit, so the default display speed has sped up from 11 seconds to 5 seconds
    # but that is still way to slow
    
    pd.set_option('display.pprint_nest_depth',3) # (default)
    
    IPython.display.display(df) # 5.4 seconds 
    IPython.display.display(df.xarrays) # 2.7 seconds (default)
    
    pd.set_option('display.pprint_nest_depth',2) 
    IPython.display.display(df)# also 5.4 seconds
    IPython.display.display(df.xarrays)# also 2.7 seconds
    
    pd.set_option('display.pprint_nest_depth',1) 
    
    # SUCCESS!
    IPython.display.display(df)# 0.2 seconds
    IPython.display.display(df.xarrays)# 0.1 seconds