unicodeasciidouble-byte

Convert double-byte numbers and spaces in filenames to ASCII


Given a directory of filenames consisting of double-byte/full-width numbers and spaces (along with some half-width numbers and underscores), how can I convert all of the numbers and spaces to single-byte characters?

For example, this filename consists of a double-byte number, followed by a double-byte space, followed by some single-byte characters:

2 2_3.ext

and I'd like to change it to all single-byte like so:

2 2_3.ext

I've tried convmv to convert from utf8 to ascii, but the following message appears for all files:

"ascii doesn't cover all needed characters for: filename"


Solution

  • Thanks for your quick replies, bmargulies and bobince. I found a Perl module, Unicode::Japanese, that helped get the job done. Here is a bash script I made (with help from this example) to convert filenames in the current directory from full-width to half-width characters:

    #!/bin/bash
    for file in *;do
    newfile=$(echo $file | perl -MUnicode::Japanese -e'print Unicode::Japanese->new(<>)->z2h->get;')
    test "$file" != "$newfile" && mv "$file" "$newfile"
    done