I am currently trying to read in a .csv
with the following structure:
samplecsv = """ 'column A', 'column b', 'column c',
'valueA', 'valueb', 'valuec,d',
'valueA', 'valueb', 'valuecd',
'valueA', 'valueb', 'valuecd',
'valueA', 'valueb', 'valuec,d'
"""
Note that I don't include it as a string but I stored the data in the same structure in a .txt
file.
I want python to ignore the comma in 'valuec,d'
for the separation, for this value to be stored in a single column later. I found out that you can provide a to then be ignored quotechar as argument, yet I can’t get it to run with single quote. I have tried:
import pandas as pd
DF = pd.read_csv(r'Myfilepath', sep = ',', quotechar =''')
As well as:
DF = pd.read_csv(r'Myfilepath', sep = ',', quotechar ='''')
I also tried:
DF = pd.read_csv(r'Myfilepath', sep = ',', quotechar ="'")
All 3 versions gave me SyntaxError: incomplete input
. What am I doing wrong?
Note: The question marked as duplicate only explains the difference between single and double quotes but not how to exclude them in pd.read_csv
or why this doesn't work in this case.
I don't get any errors with this data. The fields contain extra whitespace though, which requires the skipinitialspace
parameter. The following code:
import pandas as pd
samplecsv = """ 'column A', 'column b', 'column c',
'valueA', 'valueb', 'valuec,d',
'valueA', 'valueb', 'valuecd',
'valueA', 'valueb', 'valuecd',
'valueA', 'valueb', 'valuec,d'
"""
from io import StringIO
df=pd.read_csv(StringIO(csv), sep = ',', quotechar ="'",skipinitialspace=True)
df
Produces
column A column b column c Unnamed: 3
0 valueA valueb valuec,d NaN
1 valueA valueb valuecd NaN
2 valueA valueb valuecd NaN
3 valueA valueb valuec,d NaN
The final empty column is sue to the trailing ,
in most lines. It can be removed with dropna
:
>>> df.dropna(axis=1,how='all')
column A column b column c
0 valueA valueb valuec,d
1 valueA valueb valuecd
2 valueA valueb valuecd
3 valueA valueb valuec,d