pythonstringweb-scrapingformat

Formatting Raw String Python


I have a raw string in Python that is retrieved via an imap library.

It looks like this:

Season: Winter 2017-18
Activity: Basketball - Boys JV
*DATE: 02/13/2018 * - ( previously 02/06/2018 )
Event type: Game
Home/Host: Clear Lake
Opponent: Webster City
*START TIME: 6:15PM CST* - ( previously 4:30PM CST )
Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA

What would be the best way to scrap the data that comes after each label (label is DATE:) For example DATE: 02/13/2018 * - ( previously 02/06/2018 ) would be set equal to a variable like Date, so when print(date) is printed, 02/13/2018 * - ( previously 02/06/2018 ) would be the output.

I tried the below code, but it printed one character per line. Thanks!

for line in message:
     if "DATE:" in line:
          print line

Solution

  • You can use regular expressions and a dictionary:

    import re
    s = """
    Season: Winter 2017-18
    Activity: Basketball - Boys JV
    *DATE: 02/13/2018 * - ( previously 02/06/2018 )
    Event type: Game
    Home/Host: Clear Lake
    Opponent: Webster City
    *START TIME: 6:15PM CST* - ( previously 4:30PM CST )
    Location: Clear Lake High School, 125 N. 20th Street, Clear Lake, IA
    """
    final_dict = {(a[1:] if a.startswith('*') else a).strip('\r'):b.strip('\r') for a, b in filter(lambda x:len(x)> 1, [re.split('\:\s', i) for i in filter(None, s.split('\n'))])}
    

    Output:

    {'Home/Host': 'Clear Lake', 'Season': 'Winter 2017-18', 'START TIME': '6:15PM CST* - ( previously 4:30PM CST )', 'Location': 'Clear Lake High School, 125 N. 20th Street, Clear Lake, IA', 'Activity': 'Basketball - Boys JV', 'DATE': '02/13/2018 * - ( previously 02/06/2018 )', 'Event type': 'Game', 'Opponent': 'Webster City'}