I'm working on a java program that imports a csv file into a Postgresql database. Some of the lines in the csv files contain escaped commas, and commas are also the delimiter. When I import this file with mysql it works fine, but with postgresql I get an error saying there's more tokens in the data than there are columns to import it into (org.postgresql.util.PSQLException: ERROR: extra data after last expected column
)
How can I get COPY to import these lines with the escaped commas treated as text instead of as delimiters?
My COPY command
"COPY temporary_%s (%s) FROM STDIN with csv"
;
And for comparison, the mysql version works
"LOAD DATA LOCAL INFILE ? "
+ "INTO TABLE temporary_%s "
+ "FIELDS TERMINATED BY ',' ENCLOSED BY '\"' "
+ "LINES TERMINATED BY '\n' "
+ "(%s)";
I've been trying to append something like ESCAPE '\\'
to the postgresql statement but it hasn't appeared to do anything. Solutions that I've seen suggest wrapping every chunk of data in quotes, but I don't have control over the contents of the csv file, and it's quite huge (up to ~8gb) so I'd prefer to not have to do pre-processing on it if that's not necessary.
And the java
BaseConnection physicalConnection =
(BaseConnection) connection.unwrap(Class.forName("org.postgresql.core.BaseConnection"));
CopyManager copyManager = new CopyManager(physicalConnection);
FileReader fileReader = new FileReader(fileName);
copyManager.copyIn(String.format(PROCEDURE_POSTGRES, tableName, columns), fileReader);
And a sample line of data with this problem (the \,
)
0.2\,4,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1541592473000,1545585645000,0.3.0,1,NULL,NULL
edit to answer qs: The data type for these columns is charvar(255). What I'd like to see is the commas being inserted as data, e.g. in this case 0.2,4
should be the data in the first column.
Full error message (with names scrubbed for privacy)
2025-02-04 12:00:40 2025-02-04 17:00:40.374 UTC [89725] STATEMENT: COPY table_name (***, ***, ***, ***, ***, ***, ***, ***, ***) FROM STDIN with csv"
2025-02-04 12:03:48 2025-02-04 17:03:48.733 UTC [89890] ERROR: extra data after last expected column
2025-02-04 12:03:48 2025-02-04 17:03:48.733 UTC [89890] CONTEXT: COPY table_name, line 10: " 0.2\,4,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1541592473000,1..."
also, I found a solution(?) where I don't use CSV
. I don't know why this is working but it is - all the lines are being imported and they look correct. This might turn out to be all I need (doing more testing)
"COPY temporary_%s (%s) FROM STDIN DELIMITER ','";
From here COPY:
File Formats
Text Format
Backslash characters () can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character.
Therefore, using psql
:
\! cat backslash.csv
0.2\,4,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1541592473000
.5,000088797e02a729c72ff07bb9c5a84ef27e6b7c,1545585645000
create table csv_import (fld1 varchar, fld2 varchar, fld3 varchar);
\copy csv_import from backslash.csv delimiter ','
COPY 2
select * from csv_import ;
fld1 | fld2 | fld3
-------+------------------------------------------+---------------
0.2,4 | 000088797e02a729c72ff07bb9c5a84ef27e6b7c | 1541592473000
.5 | 000088797e02a729c72ff07bb9c5a84ef27e6b7c | 1545585645000
\copy
being a client(psql
) side instance of the COPY
command per:
https://www.postgresql.org/docs/current/app-psql.html
Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.