phputf-8imapdecode

imap_utf8 not always converting utf-8 text


I have the following simple script, running under PHP 8.3.6

<?php
$original = '"=?utf-8?Q?part1=40part2.com?=" <part1@part2.com>' ;
$converted = imap_utf8($original) ;
printf("Original: %s\nConverted: %s\n", $original, $converted) ;

When this is executed, the result is that the $converted text is exactly equal to the original text.

I get values like this (especially in the TO field) when using IMAP_SEARCH and other functions that return headers. I am sure this will be widespread, I just just getting into initial testing with PHP IMAP. Note in particular the embedded double quotes, which may be (a part of) the problem.

What is the appropriate way to decode a value like the above?


Solution

  • The imap_utf8 function is designed to convert MIME-encoded text (like =?charset?encoding?encoded-text?=) to UTF-8.

    The issue you have encountered may be due to the input string, which may contain intended double quotes (U+0022 or "fancy quotes" like U+201C), is not a correctly formatted MIME header string according to RFC standards

    One of the possible causes of the above is due to improper encoding by the sending client.

    One of the workarounds is to use a function applying imap_mime_header_decode and mb_convert_encoding to perform the parsing job in a function.

    Please note that mb_convert_encoding is needed to convert text from its original charset to UTF-8, if necessary.

    so the function is:

    function custom_imap_utf8_decode($mime_encoded_text) {
        $decoded_elements = imap_mime_header_decode($mime_encoded_text);
        $decoded_string = '';
    
        foreach ($decoded_elements as $element) {
            // Convert the text to UTF-8 from its original charset, if necessary
            if ($element->charset != 'utf-8' && $element->charset != 'default') {
                $decoded_string .= mb_convert_encoding($element->text, 'UTF-8', $element->charset);
            } else {
                $decoded_string .= $element->text;
            }
        }
        return $decoded_string;
    }
    

    So the following is a working example code:

    <?php
    $original = '"=?utf-8?Q?part1=40part2.com?=" <part1@part2.com>' ;
    
    
    function custom_imap_utf8_decode($mime_encoded_text) {
        $decoded_elements = imap_mime_header_decode($mime_encoded_text);
        $decoded_string = '';
    
        foreach ($decoded_elements as $element) {
            // Convert the text to UTF-8 from its original charset, if necessary
            if ($element->charset != 'utf-8' && $element->charset != 'default') {
                $decoded_string .= mb_convert_encoding($element->text, 'UTF-8', $element->charset);
            } else {
                $decoded_string .= $element->text;
            }
        }
        return $decoded_string;
    }
    
    //$converted = imap_utf8($original) ;
    $converted = custom_imap_utf8_decode($original);
    
    
    //printf("Original: %s\nConverted: %s\n", $original, $converted) ;
    echo "Original:". $original;
    echo "<br>";
    echo "Converted:". $converted; 
    ?>
    

    The result will be:

    enter image description here