In my terminal these are equally wide:
ヌー平行
parallel
æøåüäöûß
I have managed to get Perl to give the length 8 for the last 2 lines, but it reports the length of the first line as 4. Is there a way for me to determine that the width of ヌ is twice that of ø?
You can use Text::CharWidth's mbswidth
. It uses POSIX's wcwidth
.
use v5.14;
use warnings;
use utf8;
use open ':std', ':encoding(UTF-8)';
use Encode qw( encode_utf8 );
use Text::CharWidth qw( mbswidth );
use Unicode::Normalize qw( NFC NFD );
my @tests = (
[ "ASCII", "parallel", 8 ],
[ "NFC", NFC("æøåüäöûß"), 8 ],
[ "NFD", NFD("æøåüäöûß"), 8 ],
[ "EastAsian", "ヌー平行", 8 ],
);
for ( @tests ) {
my ( $name, $s, $expect ) = @$_;
my $length = length( $s );
my $got = mbswidth( encode_utf8( $s ) );
printf "%-9s length=%2d expect=%d got=%d\n",
$name, $length, $expect, $got;
}
ASCII length= 8 expect=8 got=8
NFC length= 8 expect=8 got=8
NFD length=13 expect=8 got=8
EastAsian length= 4 expect=8 got=8
Note that mbswidth
expects a string encoded using the locale's encoding, which I assumed was UTF-8 in two places in the above program.
If you want to know the number of column a string should take according to Unicode, this is covered by Unicode Standard Annex #11. Note that the answer may depend on whether one is in an East Asian context or not. For example, U+03A6 GREEK CAPITAL LETTER PHI ("Φ") takes up two columns in an East Asian Context, while it takes up only one otherwise.