pythonstringtranscription

Split transcript into transcripts for different speakers


I have a transcript with different speakers, for instance (new.txt):

spk_0: Default transcript, containing many sentences. Such as this. 
spk_1: Blablabla
spk_2: Blablablaba fjdslf 

I want to create different strings from this transcript that only contains the text said by a speaker, so for instance:

new_spk_0 = "Default transcript, containing many sentences. Such as this."
new_spk_1 = "Blablabla"

How could I go about doing this?


Solution

  • Fixed it using the method provided in: Reading only the words of a specific speaker and adding those words to a list

    Here a regex match at the beginning of the sentence is used to indicate the prevalence of different speakers and is later split into multiple key-value pairs in a dictionary.