I am trying to format the index name so it can escape latex when using .to_latex()
.
Using .format_index()
works only for the index values but not for the index names.
Here is a Minimal, Reproducible Example.
import pandas as pd
import numpy as np
import pylatex as pl
dict1= {
'employee_w': ['John_Smith','John_Smith','John_Smith', 'Marc_Jones','Marc_Jones', 'Tony_Jeff', 'Maria_Mora','Maria_Mora'],
'customer&client': ['company_1','company_2','company_3','company_4','company_5','company_6','company_7','company_8'],
'calendar_week': [18,18,19,21,21,22,23,23],
'sales': [5,5,5,5,5,5,5,5],
}
df1 = pd.DataFrame(data = dict1)
ptable = pd.pivot_table(
df1,
values='sales',
index=['employee_w','customer&client'],
columns=['calendar_week'],
aggfunc=np.sum
)
mystyler = ptable.style
mystyler.format(na_rep='-', precision=0, escape="latex")
mystyler.format_index(escape="latex", axis=0)
mystyler.format_index(escape="latex", axis=1)
latex_code1 = mystyler.to_latex(
column_format='|c|c|c|c|c|c|c|',
multirow_align="t",
multicol_align="r",
clines="all;data",
hrules=True,
)
# latex_code1 = latex_code1.replace("employee_w", "employee")
# latex_code1 = latex_code1.replace("customer&client", "customer and client")
# latex_code1 = latex_code1.replace("calendar_week", "week")
doc = pl.Document(geometry_options=['a4paper'], document_options=["portrait"], textcomp = None)
doc.packages.append(pl.Package('newtxtext,newtxmath'))
doc.packages.append(pl.Package('textcomp'))
doc.packages.append(pl.Package('booktabs'))
doc.packages.append(pl.Package('xcolor',options= pl.NoEscape('table')))
doc.packages.append(pl.Package('multirow'))
doc.append(pl.NoEscape(latex_code1))
doc.generate_pdf('file1.pdf', clean_tex=False, silent=True)
When I replace them using .replace()
it works. such as the commented lines.
(desired result):
But I'm dealing with houndreds of tables with unknown index/column names.
The scope is to generate PDF files using Pylatex automatically. So any html option is not helpful for me.
Thanks in advance!
I coded all the Styler.to_latex
features and I'm afraid the index names are currently not formatted, which also means that they are not escaped. So there is not a direct function to do what you desire. (by the way its great to see an example where many of the features including the hrules table styles definition is being used). I actually just created an issue on this on Pandas Github.
However, the code itself contains an _escape_latex(s)
method in pandas.io.formats.styler_render.py
def _escape_latex(s):
r"""
Replace the characters ``&``, ``%``, ``$``, ``#``, ``_``, ``{``, ``}``,
``~``, ``^``, and ``\`` in the string with LaTeX-safe sequences.
Use this if you need to display text that might contain such characters in LaTeX.
Parameters
----------
s : str
Input to be escaped
Return
------
str :
Escaped string
"""
return (
s.replace("\\", "ab2§=§8yz") # rare string for final conversion: avoid \\ clash
.replace("ab2§=§8yz ", "ab2§=§8yz\\space ") # since \backslash gobbles spaces
.replace("&", "\\&")
.replace("%", "\\%")
.replace("$", "\\$")
.replace("#", "\\#")
.replace("_", "\\_")
.replace("{", "\\{")
.replace("}", "\\}")
.replace("~ ", "~\\space ") # since \textasciitilde gobbles spaces
.replace("~", "\\textasciitilde ")
.replace("^ ", "^\\space ") # since \textasciicircum gobbles spaces
.replace("^", "\\textasciicircum ")
.replace("ab2§=§8yz", "\\textbackslash ")
)
So your best bet is to reformat the input dataframe and escape the index name before you do any styling to it:
df.index.name = _escape_latex(df.index.name)
# then continue with your previous styling code