pythonpandasjupyter-console

pandas display: truncate column display rather than wrapping


With lengthy column names, DataFrames will display in a very messy form seemingly no matter what options are set.

Info: I'm in Jupyter QtConsole, pandas 0.20.1, with the following relevant options specified at startup:

pd.set_option('display.max_colwidth', 20)
pd.set_option('expand_frame_repr', False)
pd.set_option('display.max_rows', 25)

Question: how can I truncate the DataFrame if necessary rather than wrapping the columns to the next line, while keeping expand_frame_repr=False?

Here's an example. Again, the issue doesn't depend on the number of columns but length of the columns.

This will not cause an issue:

df = pd.DataFrame(np.random.randn(1000, 1000),
                  columns=['col' + str(i) for i in range(1000)])

As the output is perfectly readable and looks like: enter image description here

The same DataFrame with long column names causes the issue I'm talking about:

df = pd.DataFrame(np.random.randn(1000, 1000),
                  columns=['very_long_col_name_' 
                           + str(i) for i in range(1000)])

enter image description here

Is there any way to conform the second output to be like the first that I'm missing? (Through specifying an option, not through using .iloc every time I want to view.)


Solution

  • Looks like it will need an enhancement. The relevant code in the repr function appears to be here:

        max_rows = get_option("display.max_rows")
        max_cols = get_option("display.max_columns")
        show_dimensions = get_option("display.show_dimensions")
        if get_option("display.expand_frame_repr"):
            width, _ = console.get_console_size()
        else:
            width = None
        self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
                       line_width=width, show_dimensions=show_dimensions)
    

    So either you pass expand_frame_repr=True and it wraps on the line width, or you pass expand_frame_repr=False and it shouldn't. But it looks like there is a bug in the code (this should be pandas 0.20.3 iirc):

    in pd.io.formats.format.DataFrameFormatter:

    def _chk_truncate(self):
        """
        Checks whether the frame should be truncated. If so, slices
        the frame up.
        """
        from pandas.core.reshape.concat import concat
    
        # Column of which first element is used to determine width of a dot col
        self.tr_size_col = -1
    
        # Cut the data to the information actually printed
        max_cols = self.max_cols
        max_rows = self.max_rows
    
        if max_cols == 0 or max_rows == 0:  # assume we are in the terminal
                                            # (why else = 0)
            (w, h) = get_terminal_size()
            self.w = w
            self.h = h
            if self.max_rows == 0:
                dot_row = 1
                prompt_row = 1
                if self.show_dimensions:
                    show_dimension_rows = 3
                n_add_rows = (self.header + dot_row + show_dimension_rows +
                              prompt_row)
                # rows available to fill with actual data
                max_rows_adj = self.h - n_add_rows
                self.max_rows_adj = max_rows_adj
    
            # Format only rows and columns that could potentially fit the
            # screen
            if max_cols == 0 and len(self.frame.columns) > w:
                max_cols = w
            if max_rows == 0 and len(self.frame) > h:
                max_rows = h
    

    Looks like it intended to do what you wanted, but was unfinished. It's checking max_cols against the number of columns, not the total width of the columns.

    So you could either create a show_df function that would calculate the correct number of columns and show it in an option_context like pi2Squared's answer, or fix it here (and maybe submit a patch if you need it distributed).