I want to count the highest age of diabetes in this dataframe. Where the expected output of this code is like this:
age
25 14
31 13
41 13
29 13
43 11
22 11
28 10
33 10
38 10
36 10
Name: age, dtype: int64
However when I run it with this command:
(data_clean['age'].where(data_clean['class'] == 'Diabetes')).value_counts().head(10)
The output produced is like this:
age
25.0 14
31.0 13
41.0 13
29.0 13
43.0 11
22.0 11
28.0 10
33.0 10
38.0 10
36.0 10
Name: count, dtype: int64
Here's the csv file I used in this case: CSV file link
The resulting output index is float, while the expected output index should be integer. And the output name is count, while the expected output name should be age. Do you have any suggestions about it? I appreciate any help you can give me. Thank you
Don't use where
which will convert the non Diabetes data to NaN and thus to float, instead perform boolean indexing to only select the valid rows:
out = (data_clean
.loc[data_clean['class'] == 'Diabetes', 'age']
.value_counts().head(10)
)