pythonpandas

Extract the max of a cumulative sum from dataframe column


I have a dataframe column containing values of 0 and 1. Values of 0 indicate a piece of equipment is offline, while 1 indicates the equipment is running. To calculate the days online between outages, I used:

df['col2'] = df[col1].groupby(df_proc[col1].eq(0).cumsum()).cumcount()

df['col2'] contains a cumulative total of the days online between outages.

Example:

Dataframe

I need to extract the date and the cumumlative total before each of the outages into a separate dataframe. From the example above I would want:

Date DaysOnline
6/2/2025 4
6/10/2025 5

I obviously I am a novice at Python and Stack Overflow.


Solution

  • To extract the date and the cumulative total right before each outage (i.e., where col1 changes from 1 to 0), we can identify the rows where col1 == 0, then look at the row just before each of those and get the Date and col2 values. See demo: https://www.online-python.com/zmTHRWUfKr

    import pandas as pd
    data = {
        "Date": [
            "5/29/2025", "5/31/2025", "6/1/2025", "6/2/2025",
            "6/3/2025", "6/4/2025", "6/5/2025", "6/6/2025",
            "6/7/2025", "6/8/2025", "6/9/2025", "6/10/2025"
        ],
        "col1": [1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1],
        "col2": [1, 2, 3, 4, 0, 0, 0, 1, 2, 3, 4, 5]
    }
    df = pd.DataFrame(data)
    df["Date"] = pd.to_datetime(df["Date"])
    
    # Find where a 1 is followed by a 0 (transition point)
    transition_idx = df[(df["col1"] == 1) & (df["col1"].shift(-1) == 0)].index
    
    result = df.loc[transition_idx, ["Date", "col2"]].rename(columns={"col2": "DaysOnline"})
    
    # If the last row is 1, include it as well
    if df["col1"].iloc[-1] == 1:
        result = pd.concat([
            result,
            df.iloc[[-1]][["Date", "col2"]].rename(columns={"col2": "DaysOnline"})
        ])
    
    print(result)