pandasdataframetextgeanynon-printing-characters

txt file rendered perfectly from pdf, opens with perfect alignment in text editor, can not load into dataframe


I can not figure out how generic text processors like Geany or the new default gnome Text Editor parse my text files. The column alignment is perfect. Using cat -nA sees the correct delimiters, but of course they are mysterious non-printing characters.

The closest I have gotten is pd.read_fwf. I would like to use pd.read_csv if I could figure out the right parameter combination.

If anyone has a suggestion about how to tell Pandas to delimit columns on those non-printing characters, it would be greatly appreciated.


Solution

  • I found the definitive answer on Stackoverflow. You have to read the whole discussion.

    Non-printing characters in cat -v