We're trying to group up date counts by month and index values are returning as decimals instead of integers when series contain any number of NaTs / na values.
Simplified reproducible example:
import pandas as pd
df = pd.DataFrame({"A": ["2025-07-24","2025-07-24","2025-07-24"], "B": ["2025-07-24","2025-07-24",pd.NA]}, dtype="datetime64[ns]")
df['values'] = [1,2,3]
a_df = df.groupby([df["A"].dt.month])["values"].count()
b_df = df.groupby([df["B"].dt.month])["values"].count()
print(a_df)
print(b_df)
So the index value for a_df
is "7" and the index value for b_df
is "7.0", with an undesired ".0" suffix.
What's causing this and what's a good way to make values return as integers, or at least return consistently?
Cause of the issue is:
df["A"]
has no missing values, so dt.month
returns integers.
However, df["B"]
has a missing value, so dt.month
returns floats (since NaN is a float).
There are several options to solve it.
Option 1:
Convert to Int after groupby
b_df = df.groupby([df["B"].dt.month])["values"].count()
b_df.index = b_df.index.astype(int)
Option 2:
Use .dropna()
on dt.month
b_month = df["B"].dt.month.dropna().astype(int)
b_df = df.loc[b_month.index].groupby(b_month)["values"].count()
Option 3:
Convert month to int explicitly
b_df = df.groupby([df["B"].dt.month.astype('Int64')])["values"].count()
Option 4:
Convert the index to integer after grouping
b_df.index = b_df.index.astype("Int64")
Ouput:
A
7 3
Name: values, dtype: int64
B
7 2
Name: values, dtype: int64