I have this panda dataframe:
0 music 0.00 9.02
1 female 9.02 152.70
2 music 152.70 155.12
3 female 155.12 206.82
4 noEnergy 206.82 208.10
basically an ID, TYPE, START, END All this event are in sequence, so they cannot overlap. My goal is to obtain a graph to show it sequence for the event duration, like this:
Basically I want to "interpolate" the "music" type from 0 to 9.02, "female" to 9.02 to 152.70, and so on.
using this code:
# Read data from RTTM files into data frames
import matplotlib.pyplot as plt
import numpy as np
df1 = pd.read_csv('file.rttm', sep=' ', names=['type', 'start', 'end'])
df1.plot(x='start', y='type', kind='scatter', rot='vertical')
plt.show()
This only show the start position in time.
If I plot using this:
df1['duration'] = (df1['end'] - df1['start'])
plt.plot(df1['start'], df1['type'])
plt.show()
That once again is not what I would like to visualize. Any suggestion on right way to visualize it? Thx
From my understanding, you're trying to display each dataframe entry as a horizontal segment from START value to END value, on the correct Y level depending on the type. Here is a solution using matplotlib
.
First, here is a sample of the fake data I've created (and used) based on the short sample you provided.
START | END | TYPE | |
---|---|---|---|
22 | 726 | 728 | music |
6 | 195 | 214 | music |
15 | 453 | 464 | female |
47 | 1478 | 1506 | noEnergy |
9 | 304 | 318 | female |
20 | 599 | 610 | female |
23 | 738 | 747 | music |
6 | 219 | 237 | female |
31 | 947 | 954 | music |
17 | 570 | 595 | noEnergy |
The solution uses the fact that you can plot a batch of segments at once using the following plt.plot
signature (see here):
plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)
So, for each type, we want to create a list of X, Y
([START value, END value]
, [Y_type, Y_type]
) and unpack everything in the plt.plot
function. This will plot all the segments for a given TYPE.
import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 6))
dfgroups = df.groupby("TYPE")
for i, (gtype, group) in enumerate(dfgroups):
segments_data = np.full((len(group), 2, 2), i)
segments_data[:, 0, :] = group[["START", "END"]]
segments_data = segments_data.reshape((-1, 2))
plt.plot(*segments_data, marker="|", c="k")
plt.yticks(range(dfgroups.ngroups), dfgroups.groups.keys())
Here is the result on my fake test data: