perlencodingmojolicious

koi8-r text is shown incorrectly


Here is the code that tries to output text in koi8-r single-byte encoding to the browser. I saved it in Emacs in koi8-r:

#!/usr/bin/perl
use Mojolicious::Lite;
no utf8;

get '/' => sub {
    my $c = shift;

    $c->res->headers->content_type('text/plain; charset=KOI8-R');

    # Respond with plain text
    $c->render(text => "Hello, World! Текст на русском однобитный.");
};

app->start;

I works but instead of "Hello, World! Текст на русском однобитный." in Firefox I see:

Hello, World! ц╢ц┘ц▀ц⌠ц■ ц▌ц│ ц▓ц∙ц⌠ц⌠ц▀ц▐ц█ ц▐ц└ц▌ц▐ц┌ц┴ц■ц▌ц≥ц┼.

After I click View - "Repair text encoding", I get:

Hello, World! ôÅËÓÔ ÎÁ ÒÕÓÓËÏÍ ÏÄÎÏÂÉÔÎÙÊ.

I seems that "Текст на русском однобитный" is being interpreted in Windows-1252 instead of koi8-r and that's why I see "ôÅËÓÔ ÎÁ ÒÕÓÓËÏÍ ÏÄÎÏÂÉÔÎÙÊ".

What is wrong? Maybe render function can't work with single byte encoding?


Solution

  • I can see a use for this is some consumer can only handle a particular encoding. However, if the consumer can handle UTF-8, go with that.

    This works by sending raw octets with data:

    use Mojolicious::Lite;
    use utf8;
    
    use Encode qw(encode);
    
    get '/' => sub {
        my $c = shift;
    
        $c->res->headers->content_type('text/plain; charset=KOI8-R');
    
        my $message = "Hello, World! Текст на русском однобитный.";
    
        $c->render(data => encode('KOI8-R', $message) );
    };
    
    app->start;
    

    First, you don't have to store your program in the encoding that you want to emit. It's easier to make the source UTF-8 and then emit something else. Not storing it as UTF-8 means you have to wrangle everything yourself, and that's going to cause problems. @JosefZ already explained this leads to the equivalent of this chain:

    'Текст на русском однобитный'
        .encode( 'koi8_r')
        .decode( 'latin1')
        .encode( 'utf_8')
        .decode( 'koi8_r')
    

    There is an encoding pragma that tells perl to assume your source is some other encoding, but quoting from its docs:

    This pragma dates from the days when UTF-8-enabled editors were uncommon. But that was long ago, and the need for it is greatly diminished.

    That's perl telling you to not do what you are trying to do by saving it as KOI8-R (or anything not ASCII nor UTF-8).

    Second, Perl stores its strings in its internal Perl format, which is something like UTF-8. You don't have to know anything about that. However, Encode knows what to do with that when you want to encode it to something else.

    Third, if you want to emit bytes (which is the result of encode), use data. The text type will re-encode, and if you are trying to control the bytes, you don't want perl sticking its fingers in your pie.

    Or, you can tell Mojo to render text in some other encoding. This also works and may be useful if everything you are going to send will be not-UTF-8.

    use Mojolicious::Lite;
    use utf8;
    
    use Encode qw(decode encode);
    
    get '/' => sub {
        my $c = shift;
    
        $c->app->renderer->encoding('koi8-r');
        $c->res->headers->content_type('text/plain; charset=KOI8-R');
    
        my $message = "Text, World! Текст на русском однобитный.";
    
        $c->render(text => $message);
    };
    
    app->start;