javaswingspecial-characters

Padding text with Latin and foreign characters Java


I have a delimited file that can be a pain to read, so i have a swing application that takes the files contents and puts it in a Jtextarea. Japanese and Korean characters can take up more length than a latin one so i used FontRenderContext to help with that. I tried using Character.UnicodeBlock for the different languages but that did not work. These methods have gotten me close. If the japanese string was the longest of the bunch it worked, but if it is not, it gets padded 2 spaces longer than it should.

  public static int getDisplayWidth(String text, Font font) {
    FontMetrics metrics = new JLabel().getFontMetrics(font);
    int width = 0;
    int spaceWidth = metrics.charWidth(' ');// Get width of a monospaced space
    FontRenderContext frc = new FontRenderContext(null, true, true);

    for (char c : text.toCharArray()) {
        TextLayout layout = new TextLayout(String.valueOf(c), font, frc);
        double charWidth = layout.getBounds().getWidth();

        //this is close
        if (charWidth > spaceWidth) {
            width += Math.max(1, (int) Math.round(charWidth / (double) spaceWidth));
        } else {
            width += 1; // Treat half-width and Latin characters as taking 1 space
        }
    }
    return width ;
}

   private static String padToWidth(String text, int width) {
    //int textWidth = getDisplayWidth(text);
    int textWidth = getDisplayWidth(text,new Font("Monospaced",Font.PLAIN,20));
    int padding = width - textWidth;
    return text + " ".repeat(Math.max(0, padding));
}

I've been pretty stumped as to what is making the padding not correct. I can share the whole code and my sample data if needed as well. Just didn't want to flood the post.

Edit: sample data

asdasdasdasdasd|a||0|
qweas|aa|||
end|aaaa||平仮名 ひらがな|
filler|zz||qweas|
!|idk||pen|
@|names||ANYthing that is not pencils|
ㄱ넣튜ㅓ,ㅣㅎㅌ|korean|||

Whole code base if needed: https://sourceb.in/MxSSlWpFbx

Edit 2: here is a screenshot of the bug.Offset pipes


Solution

  • For those who run into this issue this is what i did. I kept the same padding method. g00se and Basil Bourque's comments were helpful in resolving this.

    public static int getDisplayWidth(String text, Font font) {
    FontRenderContext frc = new FontRenderContext(null, true, true);
    FontMetrics metrics = new JLabel().getFontMetrics(font); // Get font metrics
    int spaceWidth = metrics.charWidth(' '); // Get width of a space in monospaced font
    double width = 0.0;
    
    for (int codePoint : text.codePoints().toArray()) {
        String charAsString = new String(Character.toChars(codePoint));
        TextLayout layout = new TextLayout(charAsString, font, frc);
        float charWidth = (float) layout.getBounds().getWidth();  // Get actual pixel width of the character
    
        //double spaceRatio = charWidth / spaceWidth;
    
        if (charWidth > spaceWidth) {
            if (isKoreanCharacter(codePoint)) {width += Math.max(2, (double) Math.round(charWidth / (double) spaceWidth));} // Fine-tuned for Hangul
            else {width += Math.max(1.75, (double) Math.round(charWidth / (double) spaceWidth));}
        } else {
                width += 1; // Treat half-width and Latin characters as taking 1 space
            }
    
    }
    
    return (int) width;
    
    }
    private static boolean isKoreanCharacter(int codePoint) {
        Character.UnicodeBlock block = Character.UnicodeBlock.of(codePoint);
        Set<Character.UnicodeBlock> WIDE_CHAR_BLOCKS = new HashSet<>();
        WIDE_CHAR_BLOCKS.add(Character.UnicodeBlock.HANGUL_SYLLABLES);
        WIDE_CHAR_BLOCKS.add(Character.UnicodeBlock.HANGUL_JAMO);
        WIDE_CHAR_BLOCKS.add(Character.UnicodeBlock.HANGUL_COMPATIBILITY_JAMO);
    
        if(WIDE_CHAR_BLOCKS.contains(block)){return true;}
        else{return false;}
    }