pythonpandastime-seriesstackedbarseries

how to create a stacked bar chart indicating time spent on nest per day


I have some data of an owl being present in the nest box. In a previous question you helped me visualize when the owl is in the box:

owl in box

In addition I created a plot of the hours per day spent in the box with the code below (probably this can be done more efficiently):

import pandas as pd
import matplotlib.pyplot as plt

# raw data indicating time spent in box (each row represents start and end time)
time = pd.DatetimeIndex(["2021-12-01 18:08","2021-12-01 18:11",
                         "2021-12-02 05:27","2021-12-02 05:29",
                         "2021-12-02 22:40","2021-12-02 22:43",
                         "2021-12-03 19:24","2021-12-03 19:27",
                         "2021-12-06 18:04","2021-12-06 18:06",
                         "2021-12-07 05:28","2021-12-07 05:30",
                         "2021-12-10 03:05","2021-12-10 03:10",
                         "2021-12-10 07:11","2021-12-10 07:13",
                         "2021-12-10 20:40","2021-12-10 20:41",
                         "2021-12-12 19:42","2021-12-12 19:45",
                         "2021-12-13 04:13","2021-12-13 04:17",
                         "2021-12-15 04:28","2021-12-15 04:30",
                         "2021-12-15 05:21","2021-12-15 05:25",
                         "2021-12-15 17:40","2021-12-15 17:44",
                         "2021-12-15 22:31","2021-12-15 22:37",
                         "2021-12-16 04:24","2021-12-16 04:28",
                         "2021-12-16 19:58","2021-12-16 20:09",
                         "2021-12-17 17:42","2021-12-17 18:04",
                         "2021-12-17 22:19","2021-12-17 22:26",
                         "2021-12-18 05:41","2021-12-18 05:44",
                         "2021-12-19 07:40","2021-12-19 16:55",
                         "2021-12-19 20:39","2021-12-19 20:52",
                         "2021-12-19 21:56","2021-12-19 23:17",
                         "2021-12-21 04:53","2021-12-21 04:59",
                         "2021-12-21 05:37","2021-12-21 05:39",
                         "2021-12-22 08:06","2021-12-22 17:22",
                         "2021-12-22 20:04","2021-12-22 21:24",
                         "2021-12-22 21:44","2021-12-22 22:47",
                         "2021-12-23 02:20","2021-12-23 06:17",
                         "2021-12-23 08:07","2021-12-23 16:54",
                         "2021-12-23 19:36","2021-12-23 23:59:59",
                         "2021-12-24 00:00","2021-12-24 00:28",
                         "2021-12-24 07:53","2021-12-24 17:00",
                             ])

# create dataframe with column indicating presence (1) or absence (0)
time_df = pd.DataFrame(data={'present':[1,0]*int(len(time)/2)}, index=time)

# calculate interval length and add to time_df
time_df['interval'] = time_df.index.to_series().diff().astype('timedelta64[m]')

# add column with day to time_df
time_df['day'] = time.day

#select only intervals where owl is present 
timeinbox = time_df.iloc[1::2, :]
interval = timeinbox.interval
day = timeinbox.day

# sum multiple intervals per day
interval_tot = [interval[0]]
day_tot = [day[0]]
for i in range(1, len(day)):
    if day[i] == day[i-1]:
        interval_tot[-1] +=interval[i]
    else:
        day_tot.append(day[i])
        interval_tot.append(interval[i])

# recalculate to hours        
for i in range(len(interval_tot)):
    interval_tot[i] = interval_tot[i]/(60)

plt.figure(figsize=(15, 5))       
plt.grid(zorder=0)
plt.bar(day_tot, interval_tot, color='g', zorder=3) 

plt.xlim([1,31])
plt.xlabel('day in December')
plt.ylabel('hours per day in nest box')
plt.xticks(np.arange(1,31,1))
plt.ylim([0, 24])

Now I would like to combine all data in one plot by making a stacked bar chart, where each day is represented by a bar and each bar indicating for each of the 24*60 minutes whether the owl is present or not. Is this possible from the current data structure?


Solution

  • The data seems to have been created manually, so I have changed the format of the data presented. The approach I took was to create the time spent and the time not spent, with a continuous index of 1 minute intervals with the start and end time as the difference time and a flag of 1. Now to create non-stay time, I will create a time series index of start and end date + 1 at 1 minute intervals. Update the original data frame with the newly created index. This is the data for the graph. In the graph, based on the data frame extracted in days, create a color list with red for stay and green for non-stay. Then, in a bar graph, stack the height one. It may be necessary to consider grouping the data into hourly units.

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from datetime import timedelta
    import io
    
    data = '''
    start_time,end_time
    "2021-12-01 18:08","2021-12-01 18:11"
    "2021-12-02 05:27","2021-12-02 05:29"
    "2021-12-02 22:40","2021-12-02 22:43"
    "2021-12-03 19:24","2021-12-03 19:27"
    "2021-12-06 18:04","2021-12-06 18:06"
    "2021-12-07 05:28","2021-12-07 05:30"
    "2021-12-10 03:05","2021-12-10 03:10"
    "2021-12-10 07:11","2021-12-10 07:13"
    "2021-12-10 20:40","2021-12-10 20:41"
    "2021-12-12 19:42","2021-12-12 19:45"
    "2021-12-13 04:13","2021-12-13 04:17"
    "2021-12-15 04:28","2021-12-15 04:30"
    "2021-12-15 05:21","2021-12-15 05:25"
    "2021-12-15 17:40","2021-12-15 17:44"
    "2021-12-15 22:31","2021-12-15 22:37"
    "2021-12-16 04:24","2021-12-16 04:28"
    "2021-12-16 19:58","2021-12-16 20:09"
    "2021-12-17 17:42","2021-12-17 18:04"
    "2021-12-17 22:19","2021-12-17 22:26"
    "2021-12-18 05:41","2021-12-18 05:44"
    "2021-12-19 07:40","2021-12-19 16:55"
    "2021-12-19 20:39","2021-12-19 20:52"
    "2021-12-19 21:56","2021-12-19 23:17"
    "2021-12-21 04:53","2021-12-21 04:59"
    "2021-12-21 05:37","2021-12-21 05:39"
    "2021-12-22 08:06","2021-12-22 17:22"
    "2021-12-22 20:04","2021-12-22 21:24"
    "2021-12-22 21:44","2021-12-22 22:47"
    "2021-12-23 02:20","2021-12-23 06:17"
    "2021-12-23 08:07","2021-12-23 16:54"
    "2021-12-23 19:36","2021-12-24 00:00"
    "2021-12-24 00:00","2021-12-24 00:28"
    "2021-12-24 07:53","2021-12-24 17:00"
    '''
    df = pd.read_csv(io.StringIO(data), sep=',')
    df['start_time'] = pd.to_datetime(df['start_time'])
    df['end_time'] = pd.to_datetime(df['end_time'])
    
    time_df = pd.DataFrame()
    for idx, row in df.iterrows():
        rng = pd.date_range(row['start_time'], row['end_time']-timedelta(minutes=1), freq='1min')
        tmp = pd.DataFrame({'present':[1]*len(rng)}, index=rng)
        time_df = time_df.append(tmp)
    date_add = pd.date_range(time_df.index[0].date(), time_df.index[-1].date()+timedelta(days=1), freq='1min')
    time_df = time_df.reindex(date_add, fill_value=0)
    time_df['day'] = time_df.index.day
    
    import matplotlib.pyplot as plt
    
    fig, ax = plt.subplots(figsize=(8,15))
    ax.set_yticks(np.arange(0,1500,60))
    ax.set_ylim(0,1440)
    ax.set_xticks(np.arange(1,25,1))
    
    days = time_df['day'].unique()
    for d in days:
        #if d == 1:
        day_df = time_df.query('day == @d')
        colors = [ 'r' if p == 1 else 'g' for p in day_df['present']]
        for i in range(len(day_df)): 
            ax.bar(d, height=1, width=0.5, bottom=i+1, color=colors[i])
    
    plt.show()
    

    enter image description here