pythonpython-3.xregex

Pattern match multiple values in a sentence


I have a sentence that has a specific format.

<subject> <action> <object> @ <price> ... // The sentence can continue

and I want to extract these values out of the sentence.

Constraints:

Example:

Hi there, Bob sold apples @2.0 dollars each

Desired Output:

Subject: Bob
Action: sold
Object: apples
Price: 2.0

Currently, I do it the naive way by:

#!/usr/bin/env python3

sentence = "Hi there, alice sold apples @2.0 dollars each"

sentence = sentence.lower()

if 'alice' in sentence or 'bob' in sentence:

    s_list = sentence.split(" ")
    s_idx = -1

    if 'bob' in sentence:
        s_idx = s_list.index('bob')
    elif 'alice' in sentence:
        s_idx = s_list.index('alice')

    if s_idx > -1:
        Subject = s_list[s_idx]
        Action = s_list[s_idx+1]
        Object = s_list[s_idx+2]  #more if/else to validate Object contraints
        Price = s_list[s_idx+3]   #more if/else to extract 2.0 if we get @2.0 

    print("Subject: {}, Action: {}, Object: {}, Price: {}".format(Subject, Action, Object, Price))

How can I do this better? Possibly using re


Solution

  • You could use a regex with a named capturing group for each element:

    import re
    
    sentence = "Hi there, alice sold apples @2.0 dollars each"
    
    values = re.search('(?P<subject>bob|alice)\s+(?P<action>bought|sold)\s+(?P<object>[A-Za-z]{1,7})\s+@\s*(?P<price>\d+(?:\.\d+)?)', sentence)
    if values:
        Subject = values['subject']
        Action = values['action']
        Object = values['object']
        Price = values['price']
        print("Subject: {}, Action: {}, Object: {}, Price: {}".format(Subject, Action, Object, Price))   
    

    This will output

    Subject: alice, Action: sold, Object: apples, Price: 2.0
    

    Note you may want to supply the re.I flag to re.search to allow for bob or Bob (or Sold or sold etc.) to be matched; in that case you could replace [A-Za-z] in the object capture group with [a-z].