matplotlibtime-seriestimeserieschart

Python ploty graph for time series RTTM file


I have this panda dataframe:

0 music 0.00 9.02 
1 female 9.02 152.70 
2 music 152.70 155.12 
3 female 155.12 206.82 
4 noEnergy 206.82 208.10 

basically an ID, TYPE, START, END All this event are in sequence, so they cannot overlap. My goal is to obtain a graph to show it sequence for the event duration, like this: enter image description here

Basically I want to "interpolate" the "music" type from 0 to 9.02, "female" to 9.02 to 152.70, and so on.

using this code:

    # Read data from RTTM files into data frames
import matplotlib.pyplot as plt

import numpy as np
    df1 = pd.read_csv('file.rttm', sep=' ', names=['type', 'start', 'end'])
    df1.plot(x='start', y='type', kind='scatter', rot='vertical')
    plt.show()

enter image description here

This only show the start position in time.

If I plot using this:

df1['duration'] = (df1['end'] - df1['start'])
plt.plot(df1['start'], df1['type'])
plt.show()

enter image description here

That once again is not what I would like to visualize. Any suggestion on right way to visualize it? Thx


Solution

  • From my understanding, you're trying to display each dataframe entry as a horizontal segment from START value to END value, on the correct Y level depending on the type. Here is a solution using matplotlib.

    First, here is a sample of the fake data I've created (and used) based on the short sample you provided.

    START END TYPE
    22 726 728 music
    6 195 214 music
    15 453 464 female
    47 1478 1506 noEnergy
    9 304 318 female
    20 599 610 female
    23 738 747 music
    6 219 237 female
    31 947 954 music
    17 570 595 noEnergy

    The solution uses the fact that you can plot a batch of segments at once using the following plt.plot signature (see here):

    plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)
    

    So, for each type, we want to create a list of X, Y ([START value, END value], [Y_type, Y_type]) and unpack everything in the plt.plot function. This will plot all the segments for a given TYPE.

    import numpy as np
    import matplotlib.pyplot as plt 
    
    plt.figure(figsize=(16, 6))
    
    dfgroups = df.groupby("TYPE")
    
    for i, (gtype, group) in enumerate(dfgroups):
    
        segments_data = np.full((len(group), 2, 2), i)
        segments_data[:, 0, :] = group[["START", "END"]]
        segments_data = segments_data.reshape((-1, 2))
    
        plt.plot(*segments_data, marker="|", c="k")
        
    plt.yticks(range(dfgroups.ngroups), dfgroups.groups.keys())
    

    Here is the result on my fake test data:

    enter image description here