I'm using the pandas library to convert CSVs to other data types. My CSV has the fields quoted, like this:
"Version", "Date", "Code", "Description", "Tracking Number", "Author"
"0.1", "22AUG2022", , "Initial Draft", "NR", "Sarah Marshall"
"0.2", "23SEP2022", "D", "Update 1", "N-1234", "Bill Walter"
"0.3", "09MAY2023", "A\, C", "New update.", "N-1235", "George Orwell"
The problem is that when I read the CSV with pandas.read_csv('myfile.csv')
, the quotes are included in the values:
Version "Date" "Code" "Description" "Tracking Number" "Author"
0 0.1 "22AUG2022" "Initial Draft" "NR" "Sarah Marshall"
1 0.2 "23SEP2022" "D" "Update 1" "N-1234" "Bill Walter"
2 0.3 "09MAY2023" "A, C" "New update." "N-1235" "George Orwell"
So these quotes are included when converting to HTML:
<table>
<thead>
<tr style="text-align: right">
<th></th>
<th>Version</th>
<th>"Date"</th>
<th>"Code"</th>
<th>"Description"</th>
<th>"Tracking Number"</th>
<th>"Author"</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0.1</td>
<td>"22AUG2022"</td>
<td></td>
<td>"Initial Draft"</td>
<td>"NR"</td>
<td>"Sarah Marshall"</td>
</tr>
...
I've tried quoting=csv.QUOTE_NONE
, but it didn't fix it--in fact, it actually added quotes to the Version
column.
I found this question--the answers essentially says to strip out any quotes in post processing.
I can of course loop through the CSV rows and strip out the leading/trailing quotes for each of the values, but since quoted values are common with CSV I'd expect that there would be a parameter or easy way to enable/disable the quotes in the rendered output but I can't find something like that.
Is there a way to accomplish this?
The leading spaces in your input seem to be throwing Pandas off (and some other CSV processors I can think of).
Try the skipinitialspace=True option to make Pandas ignore every space between a comma and a quote char:
import pandas as pd
df = pd.read_csv("input.csv", skipinitialspace=True)
print(df)
Version Date ... Tracking Number Author
0 0.1 22AUG2022 ... NR Sarah Marshall
1 0.2 23SEP2022 ... N-1234 Bill Walter
2 0.3 09MAY2023 ... N-1235 George Orwell
[3 rows x 6 columns]
Expounding a bit, you can see the difference, where quotes are present, between the default of not skipping the initial space and skipping:
from io import StringIO
data = """
Col1
"foo"
"foo"
" foo"
""".strip()
print("no skip")
df = pd.read_csv(StringIO(data))
for _, x in df.iterrows():
print(repr(x["Col1"]))
print()
print("skip initial")
df = pd.read_csv(StringIO(data), skipinitialspace=True)
for _, x in df.iterrows():
print(repr(x["Col1"]))
no skip
'foo'
' "foo"'
' foo'
skip initial
'foo'
'foo'
' foo'
"foo"
doesn't matter as it doesn't have any space "foo"
is up for interpretation as the space precedes the (default) quote char" foo"
also doesn't matter as the space is encoded inside the (default) quote char