pythonpandasdataframe

Solving incompatible dtype warning for pandas DataFrame when setting new column iteratively


Setting the value of a new dataframe column:

df.loc[df["Measure] == metric.label, "source_data_url"] = metric.source_data_url

now (as of Pandas version 2.1.0) gives a warning,

FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '       metric_3' has dtype incompatible with float64, please explicitly cast to a compatible dtype
 first.

The Pandas documentation discusses how the problem can be solved for a Series but it is not clear how to do this iteratively (the line above is called in a loop over metrics and it's the final metric that gives the warning) when assigning a new DataFrame column. How can this be done?


Solution

  • I had the same problem. My intuition of this is that when you are setting value for the first time to the column source_data_url, the column does not yet exists, so pandas creates a column source_data_url and assigns value NaN to all of its elements. This makes Pandas think that the column's dtype is float64. Then it raises this warning.

    My solution was to create the column with some default value, e.g. empty string, before adding values to it:

    df["source_data_url"] = ""

    or None seems also to work:

    df["source_data_url"] = None