pythonpandasdataframecount

How to count the number of times a value has appeared so far in a Python pandas dataset?


I have two datasets from an animal shelter that include Animal IDs and dates. One set is for intakes, one is for outcomes. Some Animal IDs are duplicated, typically because the same animal has entered/exited the shelter multiple times. I want to add a column indicating how many times that animal has entered/exited so far (the dataframes are organized by date). How would I do this?

Example intake data:

Animal ID Intake Date
1 12/07/2017 02:07:00 PM
2 12/08/2017 01:10:00 PM
1 01/06/2018 12:03:00 PM
3 01/07/2018 01:10:00 PM
2 01/08/2018 04:15:00 PM

What I want is to add this column to each dataframe:

Animal ID Intake Date Shelter Visits
1 12/07/2017 02:07:00 PM 1
2 12/08/2017 01:10:00 PM 1
1 01/06/2018 12:03:00 PM 2
3 01/07/2018 01:10:00 PM 1
2 01/08/2018 04:15:00 PM 2

This would be to clarify join conditions when I want to combine the datasets (match each Animal ID not only with itself, but with what visit it represents). How do I do this?


Solution

  • This should work:

    import pandas as pd
    
    df = pd.DataFrame({
        'Animal ID': [1, 2, 1, 3, 2],
        'Intake Date': [
            '12/07/2017 02:07:00 PM',
            '12/08/2017 01:10:00 PM',
            '01/06/2018 12:03:00 PM',
            '01/07/2018 01:10:00 PM',
            '01/08/2018 04:15:00 PM'
        ]
    })
    
    df['Intake Date'] = pd.to_datetime(df['Intake Date'])
    df = df.sort_values(by=['Animal ID', 'Intake Date'])
    
    df['Shelter Visits'] = df.groupby('Animal ID').cumcount() + 1
    

    The code is

    1. sorting by animal ID and intake date so that the cumulative count will work
    2. adding a field whose value is 0 for the first row, incremented by 1 for every extra row, for each animal ID
    3. since the index is zero-indexed, we add 1 to it