pythonpandasdataframegroup-by

Pandas dt accessor or groupby function returning decimal numbers instead of integers in index labels where some series values NA


We're trying to group up date counts by month and index values are returning as decimals instead of integers when series contain any number of NaTs / na values.

Simplified reproducible example:

import pandas as pd

df = pd.DataFrame({"A": ["2025-07-24","2025-07-24","2025-07-24"], "B": ["2025-07-24","2025-07-24",pd.NA]}, dtype="datetime64[ns]")

df['values'] = [1,2,3]

a_df = df.groupby([df["A"].dt.month])["values"].count()
b_df = df.groupby([df["B"].dt.month])["values"].count()

print(a_df)
print(b_df)

So the index value for a_df is "7" and the index value for b_df is "7.0", with an undesired ".0" suffix.

What's causing this and what's a good way to make values return as integers, or at least return consistently?


Solution

  • Cause of the issue is:

    df["A"] has no missing values, so dt.month returns integers.

    However, df["B"] has a missing value, so dt.month returns floats (since NaN is a float).

    There are several options to solve it.

    Option 1:

    Convert to Int after groupby

    b_df = df.groupby([df["B"].dt.month])["values"].count()
    b_df.index = b_df.index.astype(int)
    

    Option 2:

    Use .dropna() on dt.month

    b_month = df["B"].dt.month.dropna().astype(int)
    b_df = df.loc[b_month.index].groupby(b_month)["values"].count()
    

    Option 3:

    Convert month to int explicitly

    b_df = df.groupby([df["B"].dt.month.astype('Int64')])["values"].count()
    

    Option 4:

    Convert the index to integer after grouping

    b_df.index = b_df.index.astype("Int64")
    

    Ouput:

    A
    7    3
    Name: values, dtype: int64
    B
    7    2
    Name: values, dtype: int64