pythonpandastext-manipulation

How to save pandas textmanipulation as csv in the correct form


I have a *.txt file with numbers. I want to eliminate the spaces. The raw data looks like this

12 12345 1234
23 23456 234

If I use the following

data=data[0].str.replace(" ","")
data.update('\'' + data + '\',')

I get

 '1234123451234',
 '2323456234',

which I want. But if I save it to csv with

data.to_csv("/Users/k/file.txt", header=None, index=None, mode='a')

I get as file values:

 "'1234123451234',"
 "'2323456234',"

If I use the quoating = csv.None or 3 (same)

data.to_csv("Users/k/file.txt", header=None, index=None, quoting=3, escapechar="\\", mode='a')

The file looks like:

 '1234123451234'\,
 '2323456234'\,

Just using space or nothing as escapechar does not work.

If I just remove the spaces without adding quotes or commas and then save via:

data.to_csv("Users/k/file.txt", header=None, index=None, mode='a', quoting=1, sep=",")

I get:

 "1234123451234"
 "2323456234"

missing the comma.

Adding only the comma and saving as above gets me

 "1234123451234,"
 "2323456234,"

wrong place :-)

As you can see, I am getting mad over missing my target by inches, while it is most likely super easy. I probably will switch to regex :-)


Solution

  • The output is expected. As you manually added ' to the strings, they are retained in the output. There's no need to manually add quotation marks at all, just set the correct options for to_csv:

    >>> df = pd.DataFrame(["12 12345 1234", "23 23456 234"])
    >>> df
                   0
    0  12 12345 1234
    1   23 23456 234
    
    >>> df[0] = df[0].str.replace(" ", "")
    >>> df
                 0
    0  12123451234
    1   2323456234
    
    >>> df.to_csv("output.csv", quoting=1, quotechar="'", line_terminator=',\n')
    
    # output.csv
    '','0',
    '0','12123451234',
    '1','2323456234',