I would like to import csv files with pandas. Normally my data is given in the form:
a,b,c,d
a1,b1,c1,d1
a2,b2,c2,d2
where a,b,c,d is the header. I can easily use the pandas.read_csv here. However, now I have data stored like this:
"a;b;c;d"
"a1;\"b1\";\"c1\";\"d1\""
"a2;\"b2\";\"c2\";\"d2\""
How can I clean this up in the most efficient way? How can I remove the string around the entire row so that it can detect the columns? And then how to remove all the "?
Thanks a lot for any help!!
I am not sure what to do. enter image description here
Here is one option with read_csv
(and I'm sure we can make it better) :
df = (
pd.read_csv("input.csv", sep=r";|;\\?", engine="python")
.pipe(lambda df_: df_.set_axis(df_.columns.str.strip('"'), axis=1))
.replace(r'[\\"]', "", regex=True)
)
Output :
ā
print(df)
ā
a b c d
0 a1 b1 c1 d1
1 a2 b2 c2 d2