stringgoterminalrune

Get the width of Chinese strings correctly


I want to make a border around the text 这是一个测试, but I cannot get the actual width of it. With English text, it does work perfectly.

Screenshot

Here is my analysis:

len tells me this:

这是一个测试 18
aaaaaaaaa 10
つのだ☆HIRO 16
aaaaaaaaaa 10

runewidth.StringWidth tells me this:

这是一个测试 12
aaaaaaaaa 10
つのだ☆HIRO 11
aaaaaaaaaa 10
func main() {
    fmt.Println("这是一个测试 |")
    fmt.Println("aaaaaaaaaa | 10*a")
    fmt.Println()
    fmt.Println("这是一个测试 |")
    fmt.Println("aaaaaaaaa | 9*a")
    fmt.Println()
    fmt.Println("Both are not equal to the Chinese text.")
    fmt.Println("The (pipe) lines are not under each other.")
}

enter image description here

Question:

How can I get my box (first screenshot) to appear correctly?


Solution

  • Unicode characters (like Chinese characters) in Golang take 3 bytes, while ASCII only takes 1 byte. That's by design.

    If you wish to check the actual string size of unicode character, use unicode/utf8 built-in package.

    fmt.Printf("String: %s\nLength: %d\nRune Length: %d\n", c, len(c), utf8.RuneCountInString(c))
    // String: 这是一个测试
    // Length: 18
    // Rune Length: 6
    

    More basic way to count is by using for loop.

    count := 0
    for range "这是一个测试" {
        count++
    }
    fmt.Printf("Count=%d\n", count)
    // Count=6
    

    About the pretty print of Chinese and English strings in tabular format, there seems to be no direct way. Nor the tabwriter works in this case. A small hack-around this is to use csv writer as follows:

    data := [][]string{
        {"这是一个测试", "|"},
        {"aaaaaaaaaa", "|"},
        {"つのだ☆HIRO", "|"},
        {"aaaaaaaaaa", "|"},
    }
    
    w := csv.NewWriter(os.Stdout)
    defer w.Flush()
    w.Comma = '\t'
    
    for _, row := range data {
        w.Write(row)
    }
    

    This should print data as expected. Unfortunately, StackOverflow isn't printing the same format as I see in terminal. But Playground to our rescue. Click Here

    Note: This works for strings with rune size close enough to one another. For lengthier strings, you'd need more work-around.