I have some very large CSV files and I am working with Postgres 9.4. I am using Postgres's COPY command to import them into my database efficiently. This works fine:
cursor = conn.cursor()
copy_str = "COPY mytable(presentation_code,quantity,"
copy_str += "processing_date,price_per_unit) FROM STDIN "
copy_str += "WITH DELIMITER AS ','"
file_obj = open(filename)
cursor.copy_expert(copy_str, file_obj)
try:
conn.commit()
except Exception as err:
print 'EXCEPTION:', err
The issue is that I have to run some transformation steps on the data files before they are suitable for COPY. These include reordering the fields, converting strings to floats, transforming dates so they are suitable for Postgres, and calculating some values (though I could possibly skip this last step):
for row in reader:
presentation_code = row[0].strip()
quantity = int(row[1])
period = row[9]
processing_date = period[:4] + '-' + period[4:] + '-01'
if row[4]:
price_per_unit = actual_cost / float(row[4])
else:
price_per_unit = 0
output = [presentation_code, quantity, processing_date, price_per_unit]
writer.writerow(output)
Is there any Postgres tool that would let me do these two steps (transformation, then COPY) in a single command? Or is this just a necessary step?
You have to write your transformation as a program which get each row from its standard input to the standard output, then use it as filter for the COPY
command itself. See the manual page of copy for the details (look at the PROGRAM
part):
https://www.postgresql.org/docs/current/sql-copy.html.
Note that the program will be run as "postgres" user (if you want to run it from a client application, you could use the \copy
command of psql
, https://www.postgresql.org/docs/current/app-psql.html).