unicodeutf-8go

Remove diacritics using Go


How can I remove all diacritics from the given UTF8 encoded string using Go? e.g. transform the string "žůžo" => "zuzo". Is there a standard way?


Solution

  • You can use the libraries described in Text normalization in Go.

    Here's an application of those libraries:

    // Example derived from: http://blog.golang.org/normalization
    
    package main
    
    import (
        "fmt"
        "unicode"
    
        "golang.org/x/text/transform"
        "golang.org/x/text/unicode/norm"
    )
    
    func isMn(r rune) bool {
        return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
    }
    
    func main() {
        t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
        result, _, _ := transform.String(t, "žůžo")
        fmt.Println(result)
    }