I'm trying to parse a To
email header with a regex. If there are no <>
characters then I want the whole string otherwise I want what is inside the <>
pair.
import re
re_destinatario = re.compile(r'^.*?<?(?P<to>.*)>?')
addresses = [
'XKYDF/ABC (Caixa Corporativa)',
'Fulano de Tal | Atlantica Beans <fulano.tal@atlanticabeans.com>'
]
for address in addresses:
m = re_destinatario.search(address)
print(m.groups())
print(m.group('to'))
But the regex is wrong:
('XKYDF/ABC (Caixa Corporativa)',)
XKYDF/ABC (Caixa Corporativa)
('Fulano de Tal | Atlantica Beans <fulano.tal@atlanticabeans.com>',)
Fulano de Tal | Atlantica Beans <fulano.tal@atlanticabeans.com>
What am I missing?
You may use this regex:
<?(?P<to>[^<>]+)>?$
RegEx Demo:
<?
: Match an optional <
(?P<to>[^<>]+)
: Named capture group to
to match 1+ of any characters that are not <
and >
>?
: Match an optional >
$
: EndCode:
import re
re_destinatario = re.compile(r'<?(?P<to>[^<>]+)>?$')
addresses = [
'XKYDF/ABC (Caixa Corporativa)',
'Fulano de Tal | Atlantica Beans <fulano.tal@atlanticabeans.com>'
]
for address in addresses:
m = re_destinatario.search(address)
print(m.group('to'))
Output:
XKYDF/ABC (Caixa Corporativa)
fulano.tal@atlanticabeans.com