pandasdataframeindexingseries

Pandas : get the value_count of a specific datatype


how could I retrieve the value count of a particular data type ? Tried several ways with index label, end up in key error.

To get the result, ended up creating a new dataframe with the datatype name as column name, there should be some other efficient way.

df = pd.DataFrame(
    [['Bob', 20, 2],['Alice', 19, 3],['Joshua', 22, 1]],
    columns = ['Name', 'Age', 'Marks']
)
strdtypes = df.dtypes.value_counts()

strIndex = strdtypes.keys().tolist()
strIndex = [str(d) for d in strIndex]

df1 =pd.DataFrame({'datatype':strIndex, 'valuecount':strdtypes})
count =df1[df1['datatype']=='int64']['valuecount']
print ("count int64 ", count)

Solution

  • Your issue is that when you do:

    counts = df.dtypes.value_counts()
    

    The index of counts is comprised of dtype objects. To be able to easily access those values, you need to convert the index to the string representation of the objects, which you can access via their name property. For example:

    df = pd.DataFrame(
        [['Bob', 20, 2],['Alice', 19, 3],['Joshua', 22, 1]],
        columns = ['Name', 'Age', 'Marks']
    )
    counts = df.dtypes.value_counts()
    print(counts)
    # int64     2
    # object    1
    # Name: count, dtype: int64
    
    print(counts['int64'], counts['object'])
    # KeyError: 'int64'
    
    counts.index = [dt.name for dt in counts.index]
    print(counts['int64'], counts['object'])
    # 2 1
    

    Alternatively, as pointed out by @mozway in the comments, you can just convert the dtype values using astype:

    counts = df.dtypes.astype(str).value_counts()
    print(counts['int64'], counts['object'])
    # 2 1