pythonpandasdatetimelist-comparison

What is the most efficient way to count the number of instances occurring within a time frame in python?


I am trying to run a simple count function which runs a dataframe of event times (specifically surgeries) against another dataframe of shift time frames, and returns a list of how many events occur during each shift. These csvs are thousands of rows, though, so while the way I have it set up currently works, it takes forever. This is what I have:

numSurgeries = [0 for shift in range(len(df.Date))]

for i in range(len(OR['PATIENT_IN_ROOM_DTTM'])):
    for shift in range(len(df.DateTime)):
        if OR['PATIENT_IN_ROOM_DTTM'][i] >= df.DateTime[shift] and OR['PATIENT_IN_ROOM_DTTM'][i] < df.DateTime[shift+1]:
            numSurgeries[shift] += 1

So it loops through each event and checks to see which shift time frame it is in, then increments the count for that time frame. Logical, works, but definitely not efficient.

EDIT:

Example of OR data file

Example of df data file


Solution

  • Without example data, it's not absolutely clear what you want. But this should help you vectorise:

    numSurgeries = {shift: np.sum((OR['PATIENT_IN_ROOM_DTTM'] >= df.DateTime[shift]) & \
                           (OR['PATIENT_IN_ROOM_DTTM'] < df.DateTime[shift+1])) \
                           for shift in range(len(df.Date))}
    

    The output is a dictionary mapping integer shift to numSurgeries.