I'm using ssconvert
in Gnumeric to convert a bunch of ODS
files to CSV
files with the command:
ssconvert -O 'separator=; quoting-mode=never' "f.ods" "f.txt";
which works out great ... most of the time. Sometimes, there are cells where the user has punched in a new line character within the cell (in OpenOffice and LibreOffice on Mac, you achieve this by pressing cmd+enter
). This results in the subsequently created CSV
file getting an extra row, so instead of
This is some text. Here comes a newline that should be ignored;Some data;Some more data
I get
This is some text. Here comes a newline
that should be ignored;Some data; Some more data
Is it possible in the conversion process to replace all these newline characters within cells with something else, for example a *
?
Or can I somehow set the computer to ignore all the inline characters within cells?
Here's your problem:
ssconvert -O 'separator=;
quoting-mode=never'
"f.ods" "f.txt";
By preventing ssconvert from quoting where necessary, you're shooting yourself in the foot here, and your problem is not limited to newlines. For example, this spreadsheet:
example.ods
is converted by your ssconvert command to this:
example.txt
A1;B1;C1
A2;XX;B2
YY;C2
Good luck untangling that.
Rather than attempting to undo the mess after conversion (which is going to be impossible to do reliably), or by somehow pre-processing your source ODS file prior to conversion (which is insane – if you're converting to CSV it's presumably because you want to avoid messing with ODS documents), you need to use a CSV dialect that doesn't have this kind of fundamental flaw.
That means you need your data to be quoted. It turns out that ssconvert isn't intelligent enough to quote cells containing the separator on its default setting:
$ ssconvert -O 'separator=;' example.ods example-2.txt
$ cat example-2.txt
A1;B1;C1
A2;XX;"B2
YY";C2
... so you're going to need to quote everything:
$ ssconvert -O 'separator=; quoting-mode=always' example.ods example-3.txt
$ cat example-3.txt
"A1";"B1";"C1"
"A2;XX";"B2
YY";"C2"
There's no reliable way around this with CSV; any solution you come up with other than quoting your data properly is going to come back and bite you at some point, because unquoted CSV is fundamentally broken as a data format.
To reiterate: Do not attempt to work around this fundamental flaw in unquoted CSV. Even if you think you've worked around all the problems you created for yourself by using an ambiguous data format, at some point a circumstance you didn't anticipate will come along, and you will repent at your leisure.