I have a simple CSV file that uses the | (pipe) as a quote character. After upgrading my rails app from Ruby 1.9.2 to 1.9.3 I'm getting an "CSV::MalformedCSVError: Missing or stray quote in line 1" error.
If I pop open vim and replace the | with regular quotes, single quotes or even "=", the file works fine, but | and * result in the error. Anyone have any thoughts on what might be causing this? Here's a simple one-liner that can reproduce the error:
@csv = CSV.read("public/sample_file.csv", {quote_char: '|', headers: false})
Also reproduced this in Ruby 2.0 and also in irb w/out loading rails.
Edit: here are some sample lines from the CSV
|076N102 |,|CARD |,| 1|,|NEW|,|PCS |
|07-1801 |,|BASE |,| 18|,|NEW|,|PCS |
I think you've just discovered a bug in CSV ruby module. From csv.rb :
1587: @re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/
This Regexp is used to escape characters conflicting with special regular expression symbols, including your "pipe" char |
.
I don't see any reason for the prepending [-]
, so if you do remove it, your example starts to work:
edit: the hyphen has to be escaped inside character set expression (surrounded with brackets []
) only when not as the leading character. So had to update the fixed Regexp:
1587: @re_chars = /#{%"(?<!\\[)-(?=.*\\])|[\\.^$?*+{}()|# \r\n\t\f\v]".encode(@encoding)}/
CSV.read('sample.csv', {quote_char: '|'})
# [["076N102 ",
# "CARD ",
# " 1", "NEW", "PCS "],
# ["07-1801 ",
# "BASE ",
# " 18", "NEW", "PCS "]]
As most languages does not support lookbehind expressions with quantifiers, Ruby included, I had to write it as a negative version for the left bracket. It would also match hyphens with missing left one of a bracket pair. If you'd find a better solution, leave a comment pls.
Glad to hear any comments before fill in a bug report to ruby-lang.org .