pythonextractgreedynon-greedy

Python: non greedy before or after


I made a few tests to help myself to understand non-greedy in Python, but it made me much more confused than before. Thank you for the help!

lan='From 000@hhhaaa@stephen.marquard@uct.ac.za@bbb@ccc fff@ddd eee'
print(re.findall('\S+@\S+?',lan))          # 1
print(re.findall('\S+@\S+',lan))           # 2
print(re.findall('\S+?@\S+?',lan))         # 3
print(re.findall('\S+?@\S+',lan))          # 4

Result:

['000@hhhaaa@stephen.marquard@uct.ac.za@bbb@c', 'fff@d']                   # 1
['000@hhhaaa@stephen.marquard@uct.ac.za@bbb@ccc', 'fff@ddd']               # 2
['000@h', 'hhaaa@s', 'tephen.marquard@u', 'ct.ac.za@b', 'bb@c', 'fff@d']   # 3
['000@hhhaaa@stephen.marquard@uct.ac.za@bbb@ccc', 'fff@ddd']               # 4

Question:

  1. why result only shows one d here - @d?
  2. is normal, very clear.
  3. very confusing, I even do not know how to ask the logic behind... Especially when compared with 1...
  4. it seems it is same as 2, so why ? before @ is so 'weak'?

Solution

    1. why result only shows one d here - @d?

    Because +? is not required to match more than once, so it doesn't.

    1. is normal, very clear.
    2. very confusing, I even do not know how to ask the logic behind... Especially when compared with 1...

    Again, +? matches as many characters as it has to - as opposed to matching as many characters as it can, which is exactly the difference between greedy and non-greedy matching.

    On the example of \S+?@\S+? matching From 000@hhhaaa@stephen.marquard@uct.ac.za@bbb@ccc:

    1. it seems it is same as 2, so why ? before @ is so 'weak'?

    Explained above.


    Since email addresses can't contain spaces, why bother with non-greedy matching anyway? You could use something as simple as \S+@\S+.