pythonpandasoracle-databasesqlalchemypandas-to-sql

How to tweak Pandas method to_sql, so that my data are formatted with right no. of decimals?


There's an Oracle table which I need to fill with weather data. This is what the table looks like:

"IDGRID" NUMBER(12,0) NOT NULL,
"DAY" DATE CONSTRAINT NOT NULL,
"TEMPERATURE_MAX" NUMBER(3,1) NOT NULL,
"TEMPERATURE_MIN" NUMBER(3,1) NOT NULL,
"TEMPERATURE_AVG" NUMBER(3,1) NOT NULL,
"TEMPERATURE_DEW" NUMBER(3,1) NOT NULL,
"VAPOURPRESSURE" NUMBER(4,2) NOT NULL,
"WINDSPEED" NUMBER(5,1) NOT NULL,
"PRECIPITATION" NUMBER(4,1) NOT NULL,
"RADIATION" NUMBER(6,0) NOT NULL,
"SNOWDEPTH" NUMBER(6,0), 
"SNOWWEQ" NUMBER(6,2), 
"CLOUDCOVER" NUMBER(4,3) NOT NULL 

I'm developing python code to carry out this task. I have the data in a Pandas dataframe df2, with all the columns having the same name as the table fields. In order to arrange for the right precision I added this statement, before I invoke the pandas method "to_sql":

df2 = df2.round({"TEMPERATURE_MAX": 1, "TEMPERATURE_MIN": 1, "TEMPERATURE_AVG": 1, 
    "TEMPERATURE_DEW": 1, "VAPOURPRESSURE": 2, "WINDSPEED": 1, "PRECIPITATION": 1, 
    "RADIATION": 0, "SNOWDEPTH": 0, "SNOWWEQ": 2, "CLOUDCOVER": 3}
)
df2.to_sql(name='MY_WEATHER_TABLE', con=engine, chunksize=1000, if_exists='append')

For your information: I use an sqlalchemy engine to connect to the Oracle database. Unfortunately, it seems that the rounding of the data is not having the expected result. My question is: what can I do in my Python code to overcome this problem?

Exception: (cx_Oracle.DatabaseError) ORA-01438: value larger than specified precision allowed for this column [SQL: INSERT INTO "WEATHER_ERA5_GRID_TEST" ("IDGRID", "VAPOURPRESSURE", "TEMPERATURE_AVG", "TEMPERATURE_DEW", "TEMPERATURE_MAX", "TEMPERATURE_MIN", "CLOUDCOVER", "SNOWWEQ", "SNOWDEPTH", "PRECIPITATION", "RADIATION", "WINDSPEED", "DAY") VALUES (:IDGRID, :VAPOURPRESSURE, :TEMPERATURE_AVG, :TEMPERATURE_DEW, :TEMPERATURE_MAX, :TEMPERATURE_MIN, :CLOUDCOVER, :SNOWWEQ, :SNOWDEPTH, :PRECIPITATION, :RADIATION, :WINDSPEED, :DAY)] [parameters: [{'IDGRID': 904491077, 'VAPOURPRESSURE': 11.729999542236328, 'TEMPERATURE_AVG': -17.899999618530273, 'TEMPERATURE_DEW': -20.700000762939453, 'TEMPERATURE_MAX': -17.799999237060547, 'TEMPERATURE_MIN': -25.100000381469727, 'CLOUDCOVER': 0.9700000286102295, 'SNOWWEQ': 8.399999618530273, 'SNOWDEPTH': 55.0, 'PRECIPITATION': 1.399999976158142, 'RADIATION': 379.0, 'WINDSPEED': 4.599999904632568, 'DAY': datetime.date(2020, 1, 1)}, etc.


Solution

  • It appeared that there was a mistake in my code for calculating the vapour pressure, leading to values greater than 99.99 and these did not fit in the column defined as NUMBER(4,2). The error message from Oracle put me off because of the word precision used in it. The great number of decimals used in the INSERT-statement did therefore not really cause the problem. It means that it was not necessary to convert any column of the dataframe to type decimal. The solution was to correct the mistake in the calculation. Thanks anyway, Christopher Jones!