
Different Data Name Output

I want to count the highest age of diabetes in this dataframe. Where the expected output of this code is like this:

25    14
31    13
41    13
29    13
43    11
22    11
28    10
33    10
38    10
36    10
Name: age, dtype: int64

However when I run it with this command:

(data_clean['age'].where(data_clean['class'] == 'Diabetes')).value_counts().head(10)

The output produced is like this:

25.0    14
31.0    13
41.0    13
29.0    13
43.0    11
22.0    11
28.0    10
33.0    10
38.0    10
36.0    10
Name: count, dtype: int64

Here's the csv file I used in this case: CSV file link

The resulting output index is float, while the expected output index should be integer. And the output name is count, while the expected output name should be age. Do you have any suggestions about it? I appreciate any help you can give me. Thank you


  • Don't use where which will convert the non Diabetes data to NaN and thus to float, instead perform boolean indexing to only select the valid rows:

    out = (data_clean
            .loc[data_clean['class'] == 'Diabetes', 'age']