According to https://blog.golang.org/strings and my testings, it looks like while we range
a string, the characters we get are rune
type, but if we get it by str[index]
, they will be byte
type, why is it?
To the first level, the why is because that's how the language is defined. The String type tells us that:
A string value is a (possibly empty) sequence of bytes. The number of bytes is called the length of the string and is never negative. Strings are immutable: once created, it is impossible to change the contents of a string.
and:
A string's bytes can be accessed by integer indices 0 through len(s)-1.
Meanwhile, range
is a clause you can insert into a for
statement, and the specification says:
The expression on the right in the "range" clause is called the range expression, which may be ... [a] string ...
and:
- For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type
rune
, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be0xFFFD
, the Unicode replacement character, and the next iteration will advance a single byte in the string.
If you want to know why the language is defined that way, you really have to ask the definers themselves. However, note that if for
ranged only over the bytes, you'd need to construct your own fancier loops to range over the runes. Given that for ... range
does work through the runes, if you want to work through the bytes in string s
instead, you can write:
for i := 0; i < len(s); i++ {
...
}
and easily access s[i]
inside the loop. You can also write:
for i, b := range []byte(s) {
}
and access both index i
and byte b
inside the loop. (Conversion from string to []byte
, or vice versa, can require a copy since []byte
can be modified. In this case, though, the range
does not modify it and the compiler can optimize away the copy. See icza's comment below or this answer to golang: []byte(string) vs []byte(*string).) So you have not lost any ability, just perhaps a smidgen of concision.