I'm learning the book "Go Programing Language", when it introduce string, it says Go use utf-8 encoding system, so it's easy to check whether a string is a prefix/suffix of another base string. Use the functions below:
func HasPrefix(s, prefix string) bool {
return len(s) >= len(prefix) && s[:len(prefix)] == prefix
}
func HasSuffix(s, suffix string) bool {
return len(s) >= len(suffix) && s[len(s)-len(suffix):] == suffix
}
I wonder if there's any encoding system that would fail when using above functions to check prefix/suffix?
One encoding that would break HasSuffix
is Big5: bytes between (hex) 40 and 7E (inclusive) can be either a complete character or the second byte of a two-byte character.
UTF-16 or UTF-32 with byte order marks would break HasSuffix
(because the BOM in suffix
would generally not correspond to anything at the right position in s
) even if both strings use the same byte order, and would break both HasPrefix
and HasSuffix
if they do not. (This is not an issue in practice, however, because byte order marks are only used for communication, never inside a language's representation for all strings.)
I haven't read the book you mention, but it may be thinking of languages that don't require a specific encoding, such that these functions would need to know the encoding of each string and handle the case that they don't have the same encoding.
Arguably even UTF-8 doesn't have the stated property, in that these functions wouldn't recognize that "e with acute accent" (one Unicode character) is the same real-world character as "e, followed by a combining acute accent" (two Unicode characters). But that's obviously a much harder problem.