I'm trying to extract some parameters from a list whose the structure and the length are variable. Basically, these parameters are the departure and the arrival addresses for a route. This list is built from a sentence in natural language so it does not follow any particular template:
1st example : ['go', 'Buzenval', 'from', 'Chatelet']
2nd example : ['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval']
3rd example : ['go', 'from', '33', 'street', 'Republique', 'to', '12','street','Napoleon']
I already managed to create another list that is pretty much similar, for each case, except the departure and arrival are replaced by the actual words 'departure' and 'arrival'. With the examples above I obtain:
1st example : ['go', 'arrival', 'from', 'departure']
2nd example : ['How', 'go', 'arrival', 'from', 'departure']
3rd example : ['go', 'from', 'departure', 'to', 'arrival']
Now that I have these two kind of lists, I would like to identify departure and arrival :
1rst example : departure = ['Chatelet'], arrival = ['Buzenval']
2nd example : departure = ['Buzenval'], arrival = ['street','Saint','Augustin']
3rd example : departure = ['33','street','Republique'], arrival = ['12','street','Napoleon']
Basically, the parameters are everything that are different in the two lists, but I need to identify which one is the departure and which one is the arrival. I think Regex could help me on this one but I don't know how.
Thanks for your help!
I found a way that solves your three examples. The one thing you should change are the variable names, I didn't know how to name them. (This is the old slow and hard to understand version. The one later is the better one)
def extract_places(names, modes):
keywords = set(modes).intersection(names)
extracted = [[] for _ in modes]
j = 0
for i, mode in enumerate(modes):
if mode.lower() in keywords:
if mode.lower() != names[j].lower():
while mode.lower() != names[j].lower():
extracted[i - 1].append(names[j])
j += 1
else:
extracted[i].append(names[j])
j += 1
else:
if names[j].lower() not in keywords:
while j < len(names) and names[j].lower() not in keywords:
extracted[i].append(names[j])
j += 1
extracted = dict(zip(modes, extracted))
return extracted["arrival"], extracted["departure"]
I found another way to do it, that may be easier to understand. But this way is ten times faster then the first, so you probably want to use it.
def partition(l, word): # Helper to split a list or tuple at an specific element
i = l.index(word)
return l[:i], l[i + 1:]
def extract_places(names, modes):
keywords = set(modes).intersection(names)
mapped = [(modes, names)]
for word in keywords:
new_mapped = []
for mode,name in mapped:
if word in mode:
m1, m2 = partition(mode, word)
n1, n2 = partition(name, word)
if m1:
new_mapped.append((m1, n1))
if m2:
new_mapped.append((m2, n2))
else:
new_mapped.append((mode,name))
mapped = new_mapped
mapped = {m[0]: n for m, n in mapped}
return mapped['arrival'], mapped['departure']
Both ways act the exact same:
for example in ((['go', 'Buzenval', 'from', 'Chatelet'],
['go', 'arrival', 'from', 'departure']
),
(['How', 'go', 'street', 'Saint', 'Augustin', 'from', 'Buzenval'],
['How', 'go', 'arrival', 'from', 'departure']
),
(['go', 'from', '33', 'street', 'Republique', 'to', '12', 'street', 'Napoleon'],
['go', 'from', 'departure', 'to', 'arrival']
)):
print(extract_places(*example))
prints for both:
(['Buzenval'], ['Chatelet'])
(['street', 'Saint', 'Augustin'], ['Buzenval'])
(['12', 'street', 'Napoleon'], ['33', 'street', 'Republique'])