pythonparsingtext-parsingstring-parsingpraat

Python: Parsing complex text file by delimiter


I'm quite new to Python and generally used to Java. I'm currently trying to parse a text file outputted by Praat that is always in the same format and looks generally like this, with a few more features:

-- Voice report for 53. Sound T1_1001501_vowels --
Date: Tue Aug  7 12:15:41 2018

Time range of SELECTION
    From 0 to 0.696562 seconds (duration: 0.696562 seconds)
Pitch:
   Median pitch: 212.598 Hz
   Mean pitch: 211.571 Hz
   Standard deviation: 23.891 Hz
   Minimum pitch: 171.685 Hz
   Maximum pitch: 265.678 Hz
Pulses:
   Number of pulses: 126
   Number of periods: 113
   Mean period: 4.751119E-3 seconds
   Standard deviation of period: 0.539182E-3 seconds
Voicing:
   Fraction of locally unvoiced frames: 5.970%   (12 / 201)
   Number of voice breaks: 1
   Degree of voice breaks: 2.692%   (0.018751 seconds / 0.696562 seconds)

I would like to output something that looks like this:

0.696562,212.598,211.571,23.891,171.685,265.678,126,113,4.751119E-3,0.539182E-3,5.970,1,2.692

So essentially I want to print out a string of just the numbers between the colon and its following whitespace from each line, separated by commas. I know this might be a stupid question but I just can't figure it out in Python; any help would be much appreciated!


Solution

  • Thank you for the help everyone! I actually came up with this solution:

    import csv
    
    input = 't2_5.txt'
    input_name = input[:-4]
    
    def parse(filepath):
    data = []
    with open(filepath, 'r') as file:
        file.readline()
        file.readline()
        file.readline()
        for line in file:
            if line[0] == ' ':
                start = line.find(':') + 2
                end = line.find(' ', start)
                if line[end - 1] == '%':
                    end -= 1
                number = line[start:end]
                data.append(number)
    with open(input_name + '_output.csv', 'wb') as csvfile:
        wr = csv.writer(csvfile)
        wr.writerow(data)
    
    parse(input)