pythondataframebinningbins

How to define a function that will check any data frame for Age column and return bins?


I am trying to define a function that will take any dataframe with an 'Age' column, bin the ages, and return how many Xs are in each age category.

Consider the following:

def age_range():
        x = input("Enter Dataframe Name: ")
        df = x
        df['Age']
        bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
        labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s','100s']
        pd.df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
        return print("Age Ranges:", result)

I keep getting a Type Error: string indices must be integers.

I thought that by calling the df['Age'], it would return a one-column series from which the binning and labelling would work effectively. But it isn't working for me.


Solution

  • the problem lies here

    x = input("Enter Dataframe Name: ") # type of x is a string
    df = x # now type of df is also a string
    df['Age'] # python uses [] as a slicing operation for string, hence generate error
    

    this would resolve your problem

    def age_range(df):
            bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
            labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
            result = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
            return result
    

    for example, you can run it like:

    df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
    df["AgeRange"] = age_range(df)
    

    or

    df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
    AgeRangeDf = pd.DataFrame({"Age_Range" :age_range(df)})