perlemailencodingspecial-characters

How to encode special characters in e-mail-adresses


E-Mail-Adresses do not only consist of this parts:

localpart@domain.tld

The complete string in the next line (including the part between the quotation marks, the quotation marks itself and the angle brackets) is also a valid address:

"John Doe" <localpart@domain.tld>

And when I replace "John Doe" with my own name, I get an address, that I can enter in my E-Mail-Client without getting any complaints (Note the »ö« in my last name which is a non-ascii-character):

"Hubert Schölnast" <localpart@domain.tld>

So it seems (to a user of a standard e-mail client like Thunderbird), as if special characters in the quoted section was ok.

But when I check this complete e-mail adress in a perl-script with the cpan-module Email::Valid i get an error, saying that this address is not matching the rules of rfc822, and the documentation of this module says, that rfc822 does not allow any non-ascii-character in any part of an e-mail address. (When I omit the letter ö or replace it with an ascii-letter, the check says that the address is valid.)

So obviously any e-mail client must encode the e-mail-address before it sends an e-mail to a smtp-server, and must decode it when it received a new e-mail and displays header-information to the user. But I can't find out how this is done, and I really did my best on googling.

I need this encoding-algorithm because I want to write a perl-script that accepts any valid e-mail-address (also with special characters in the quoted section) and then send e-mails to those addresses.


Solution

  • Perl core has Encode.pm:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Encode;
    
    my $from_header = decode_utf8 q{From: "Hubert Schölnast" <localpart@domain.tld>};
    print encode('MIME_Header', $from_header);
    
    1;
    __END__
    From: "=?UTF-8?B?SHViZXJ0IFNjaMO2bG5hc3Q=?=" <localpart@domain.tld>
    

    There are a lot of requirements behind RFC822/2822 that makes it hard to deal with emails.

    RFC2822 also prohibits every lines in a message to be more than 998 characters. Long lines must be split across multiple lines by indenting the continuation lines.

    This means we have to pay attention to the line length whenever we modify them just after of converting of special characters and prepending a header label.


    Edit

    As of Encode.pm version 2.80, MIME-Header encoding was rewritten to comply RFC2047, the original code that I posted above can not be used nowadays.

    See: https://metacpan.org/pod/Encode::MIME::Header#BUGS

    The most straight-foward alternative is to use both of Email::MIME and Email::Address::XS, these packages are not in core though:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use utf8;
    use open qw/:std :encoding(UTF-8)/;
    
    use Email::Address::XS;
    use Email::MIME::Header::AddressList;
    
    my $address = Email::Address::XS->new('Hubert Schölnast' => 'localpart@domain.tld');
    my $addr_list = Email::MIME::Header::AddressList->new($address);
    
    print $addr_list->as_mime_string;
    
    1;
    __END__
    =?UTF-8?B?SHViZXJ0IFNjaMO2bG5hc3Q=?= <localpart@domain.tld>