With lengthy column names, DataFrames will display in a very messy form seemingly no matter what options are set.
Info: I'm in Jupyter QtConsole, pandas 0.20.1, with the following relevant options specified at startup:
pd.set_option('display.max_colwidth', 20)
pd.set_option('expand_frame_repr', False)
pd.set_option('display.max_rows', 25)
Question: how can I truncate the DataFrame if necessary rather than wrapping the columns to the next line, while keeping expand_frame_repr=False
?
Here's an example. Again, the issue doesn't depend on the number of columns but length of the columns.
This will not cause an issue:
df = pd.DataFrame(np.random.randn(1000, 1000),
columns=['col' + str(i) for i in range(1000)])
As the output is perfectly readable and looks like:
The same DataFrame with long column names causes the issue I'm talking about:
df = pd.DataFrame(np.random.randn(1000, 1000),
columns=['very_long_col_name_'
+ str(i) for i in range(1000)])
Is there any way to conform the second output to be like the first that I'm missing? (Through specifying an option, not through using .iloc
every time I want to view.)
Looks like it will need an enhancement. The relevant code in the repr
function appears to be here:
max_rows = get_option("display.max_rows")
max_cols = get_option("display.max_columns")
show_dimensions = get_option("display.show_dimensions")
if get_option("display.expand_frame_repr"):
width, _ = console.get_console_size()
else:
width = None
self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
line_width=width, show_dimensions=show_dimensions)
So either you pass expand_frame_repr=True
and it wraps on the line width, or you pass expand_frame_repr=False
and it shouldn't. But it looks like there is a bug in the code (this should be pandas 0.20.3 iirc):
in pd.io.formats.format.DataFrameFormatter
:
def _chk_truncate(self):
"""
Checks whether the frame should be truncated. If so, slices
the frame up.
"""
from pandas.core.reshape.concat import concat
# Column of which first element is used to determine width of a dot col
self.tr_size_col = -1
# Cut the data to the information actually printed
max_cols = self.max_cols
max_rows = self.max_rows
if max_cols == 0 or max_rows == 0: # assume we are in the terminal
# (why else = 0)
(w, h) = get_terminal_size()
self.w = w
self.h = h
if self.max_rows == 0:
dot_row = 1
prompt_row = 1
if self.show_dimensions:
show_dimension_rows = 3
n_add_rows = (self.header + dot_row + show_dimension_rows +
prompt_row)
# rows available to fill with actual data
max_rows_adj = self.h - n_add_rows
self.max_rows_adj = max_rows_adj
# Format only rows and columns that could potentially fit the
# screen
if max_cols == 0 and len(self.frame.columns) > w:
max_cols = w
if max_rows == 0 and len(self.frame) > h:
max_rows = h
Looks like it intended to do what you wanted, but was unfinished. It's checking max_cols
against the number of columns, not the total width of the columns.
So you could either create a show_df
function that would calculate the correct number of columns and show it in an option_context
like pi2Squared's answer, or fix it here (and maybe submit a patch if you need it distributed).