In Python3, all strings are Unicode so you only need to do decode or encode when doing I/O operation, and in the main part of you code, you only do with Unicode.
So, I want to know that in Go, should I do the same? Should I convert all the strings to []rune
at the input and all my functions only receive []rune
type?
Because I'm new to Go, so I don't know how many 3rd party library support rune as string. If I use rune all the way in my code, when I need to interact with a 3rd party library, will the overhead of converting rune to string be a problem?
Should I ALWAYS use rune instead of string except doing I/O
There are several very useful packages that work with strings which you would find awkward to work with if your data is in arrays (or slices) of runes.
There are many cases that I have to get the character at a index,
It isn't safe to do that in general, partly because of combining characters but also because strings (or Unicode text in general) can contain many other difficult situations - perhaps a mix of left-to-right and right-to-left text etc.
Normalizing the text to one of the several normal forms might help deal with most combining characters but there will be some combinations that don't reduce to a single rune.
I'm writing something like a parser to parse the text with emoji
Unicode emoticons are just another codepoint - so can be treated like most ordinary characters.
In many cases it is probably best to use the range
operator to walk through a string.
If you wanted to, for example, replace all 😀 with :-)
, this could perhaps be handled using strings.Replace()
or by using for ... range
with strings.Builder
.
For me the most persuasive argument is that once you venture outside of ASCII, text is weird, Unicode is almost unfathomably weird, plumbing its depths is something best left to experts who spend their lives grappling with its madness. If you want to spend your time on business-end features that usually more clearly matter to you, your business and customers, use standard packages.
Useful references: