[SOLVED] Pandas to_sql ignoring dtype when column contains null values

Pandas to_sql ignoring dtype when column contains null values

First SO Question. I hope this is descriptive enough.

Pandas 0.25, Oracle 11g

I have a dataframe read from a csv. It contains a mix of numeric, string and date data.

I force data types within the dataframe using .astype(str), .astype(int) and .to_datetime.

I then create a dtype dictionary to select the data types I want.

When there are some nulls in the numeric columns types.NUMBER and types.INTEGER creates a FLOAT in the Oracle table. It should be NUMBER(38,0), especially if I use types.INTEGER. The key column that is defined as types.NUMBER and contains all non-null integers is created as a NUMBER(38,0) as expected.

When there are columns with all nulls, but have had .astype(str) applied and dtype of types.VARCHAR(300) the columns are also created as FLOAT in Oracle.

I need to use if_exists='append' in to_sql as the table collects history, so I can't wait for the VARCHAR columns to recieve data. Though I have been using if_exists='replace during testing to ensure the table is dropped and recreated.

Is there a way to resolve these issues, caused by nulls in the data, resulting in the datatype selection being incorrect? I shouldn't need to use a blank (ie '') in the strings and 0 for integers, I need nulls to come through as nulls.

Nulls in date columns, even when the entire column is null values works, and creates a DATE in Oracle as requested.

EDIT: String to VARCHAR Issue was actually an issue with a trapped and incorrectly handled exception.

Numbers were still an issue that had to be handled separately I will add an answer with the solution.

Solution

The issue with the numeric fields with some null and some non-null values was due to Pandas using NaN for null and numpy treating NaN as float.

.astype(int) doesn't handle NaN's and actually raises an exception due to the NaNs (which a try block had caught and handled incorrectly in my case).

The solution is: df['pref1'] = df['pref1'].astype('Int64')

The 'Int64' needs the capitalised 'I'. .astype('int64') also doesn't work.

This answer was helpful