pythonregexemail-headersemail-address

Regex to catch email addresses in email header


I'm trying to parse a To email header with a regex. If there are no <> characters then I want the whole string otherwise I want what is inside the <> pair.

import re
re_destinatario = re.compile(r'^.*?<?(?P<to>.*)>?')
addresses = [
    'XKYDF/ABC (Caixa Corporativa)',
    'Fulano de Tal | Atlantica Beans <fulano.tal@atlanticabeans.com>'
]
for address in addresses:
    m = re_destinatario.search(address)
    print(m.groups())
    print(m.group('to'))

But the regex is wrong:

('XKYDF/ABC (Caixa Corporativa)',)
XKYDF/ABC (Caixa Corporativa)
('Fulano de Tal | Atlantica Beans <fulano.tal@atlanticabeans.com>',)
Fulano de Tal | Atlantica Beans <fulano.tal@atlanticabeans.com>

What am I missing?


Solution

  • You may use this regex:

    <?(?P<to>[^<>]+)>?$
    

    RegEx Demo

    RegEx Demo:

    Code Demo

    Code:

    import re
    re_destinatario = re.compile(r'<?(?P<to>[^<>]+)>?$')
    addresses = [
        'XKYDF/ABC (Caixa Corporativa)',
        'Fulano de Tal | Atlantica Beans <fulano.tal@atlanticabeans.com>'
    ]
    for address in addresses:
        m = re_destinatario.search(address)
        print(m.group('to'))
    

    Output:

    XKYDF/ABC (Caixa Corporativa)
    fulano.tal@atlanticabeans.com