I need to compare strings in Go.
The problem is: I want to compare accented words (café) with its non-accented form (cafe).
The first thing I do is converting my accented string
to its non-accented form with this:
you can run the code here: https://play.golang.org/p/-eRUQeujZET
But every time I do this transformation in a string it adds more runes in the end. The example above prints:
bytes: [99 97 102 101 0] string: cafe
As I need to compare the string returned from this process with its counterpart without the 'é' in the first place, I would need to remove the last rune
(0) from the []byte
.
After running some tests I perceived that the last 0s (sometimes it adds more than one) don't change the string representation.
Am I missing something? Can I just remove all zeros in the end of the []byte
?
Here is my code to remove the 0s and compare the strings:
https://play.golang.org/p/HoueAGI4uUx
As we can't work alone in this field, here the articles I read to get to where I am now:
https://blog.golang.org/strings
https://blog.golang.org/normalization
This is your custom Transform()
function:
func Transform(s string) ([]byte, error) {
var t transform.Transformer
t = transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC)
dst := make([]byte, len(s))
_, _, err := t.Transform(dst, []byte(s), true)
if err != nil {
return nil, err
}
return dst, nil
}
In it you are using Transformer.Transform()
which also returns the number of bytes written to the destination. But you don't use that return value.
So simplest is to store the nDst
return value, and slice the destination slice, because this holds the number of "useful" bytes in it (bytes beyond nDst
will remain 0
as handed to you by the preceding make()
call):
nDst, _, err := t.Transform(dst, []byte(s), true)
if err != nil {
return nil, err
}
return dst[:nDst], nil
With this change, the returned slice will only contain the useful bytes without trailing zeros.
Output will be (try it on the Go Playground):
2009/11/10 23:00:00 bytes: [99 97 102 101] string: cafe