I have a dataframe that holds datetime values and arrays and possibly other datatypes in the future. I wish to to_sql it to PostgreSQL where datetime is a (timestamp without time zone) and arrays are (byte) types, but I have no idea what to put for the dtype argument.
Is there a way to dynamically do the dtype based on the dataframe column's datatype?
How the table looks:
CREATE TABLE IF NOT EXISTS data2_mcw_tmp (
time timestamp without time zone,
u bytea,
v bytea,
w bytea,
spd bytea,
dir bytea,
temp bytea
);
My code so far (after from help from user rftr):
dtypedict = {}
Data2_mcw_conv = Data2_mcw.copy()
for row in Data2_mcw_conv.index:
for col in Data2_mcw_conv.columns:
value = Data2_mcw_conv[col][row]
try:
if type(Data2_mcw_conv[col].iloc[0]).__module__ == np.__name__:
dtypedict.update({col:BYTEA})
value = Data2_mcw_conv[col].loc[row]
print('before: ')
print (value.flags)
print('---------------------')
value = value.copy(order='C')
print('after: ')
print (value.flags)
print('=====================')
value = pickle.dumps(value)
except:
if isinstance(Data2_mcw_conv[col].iloc[0], datetime.date):
dtypedict.update({col:TIMESTAMP})
Data2_mcw_conv[col][row] = value
Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
if_exists = 'replace',
dtype=dtypedict)
However, I get this error:
Traceback (most recent call last):
File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 145, in <module>
postgres_conv()
File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 124, in postgres_conv
Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
File "C:\Python38\lib\site-packages\pandas\core\generic.py", line 2778, in to_sql
sql.to_sql(
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
pandas_sql.to_sql(
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
table.insert(chunksize, method=method)
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 831, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1286, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "C:\Python38\lib\site-packages\sqlalchemy\sql\elements.py", line 325, in _execute_on_connection
return connection._execute_clauseelement(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1478, in _execute_clauseelement
ret = self._execute_context(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1842, in _execute_context
self._handle_dbapi_exception(
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 2027, in _handle_dbapi_exception
util.raise_(exc_info[1], with_traceback=exc_info[2])
File "C:\Python38\lib\site-packages\sqlalchemy\util\compat.py", line 207, in raise_
raise exception
File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1779, in _execute_context
self.dialect.do_executemany(
File "C:\Python38\lib\site-packages\sqlalchemy\dialects\postgresql\psycopg2.py", line 951, in do_executemany
context._psycopg2_fetched_rows = xtras.execute_values(
File "C:\Python38\lib\site-packages\psycopg2\extras.py", line 1267, in execute_values
parts.append(cur.mogrify(template, args))
ValueError: ndarray is not C-contiguous
value.flag outputs before/after value = value.copy(order='C'):
before:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
---------------------
after:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
=====================
Why is this error occurring and any idea how to solve it?
Follow the second code snippet of this answer and customize the dtypes to their PostgreSQL
equivalents from here. So in your case, e.g.:
from sqlalchemy.dialects.postgresql import BYTEA, TIMESTAMP
def sqlcol(dfparam):
# ...
if "datetime" in str(j):
dtypedict.update({i: TIMESTAMP})
if "object" in str(j): # Depending on what your other column's datatypes are
dtypesdict.update({i: BYTEA})
# ...
Notes:
TIMESTAMP
is without timezone by default.object
in pandas
. You should take this into consideration, if you add further data in the future.