I have a simple CSV file like this:
SellerProductID;ProductTextLong
1000;"a ""good"" Product"
And this is the try to read it in with Apache CSV:
try (Reader reader = new StringReader(content)) {
CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';').withHeader().withEscape('"').withQuote('"');
CSVParser records = format.parse(reader);
System.out.println(records.iterator().next());
}
That doesn't work because of:
Exception in thread "main" java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (startline 2) EOF reached before encapsulated token finished
at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:145)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.next(CSVParser.java:171)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.next(CSVParser.java:137)
Caused by: java.io.IOException: (startline 2) EOF reached before encapsulated token finished
at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:288)
at org.apache.commons.csv.Lexer.nextToken(Lexer.java:158)
at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:674)
at org.apache.commons.csv.CSVParser$CSVRecordIterator.getNextRecord(CSVParser.java:142)
... 3 more
Other CSV tools (e.g. Google Sheets) can load the CSV just fine.
It works if I use another quote or escape character, but sadly the customer's CSV is set.
How do I configure Apache CSV to allow the same escape and quote character? Or is there any way to modify a stream to replace the quote characters on the fly (the files are gigantic)?
The entire problem is that " is not the "escape character".
From Wikipedia:
Embedded double quote characters may then be represented by a pair of consecutive double quotes, or by prefixing a double quote with an escape character such as a backslash.
So in this case, "" is just two quote characters next to each other, while the escape character is a differenct character used to escape quotes or line breaks or separators.
This fixes it (note that withEscape()
is called differently, but the example data doesn't show what the escape character actually is):
try (Reader reader = new StringReader(content)) {
CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';').withHeader().withEscape('/').withQuote('"');
CSVParser records = format.parse(reader);
System.out.println(records.iterator().next());
}