In Linux, the following code outputs strange results.
import Foundation
print("a\r\nb".components(separatedBy: "\n"))
print(("a\r\nb" as NSString).components(separatedBy: "\n"))
The following is the output.
["a\r\nb"]
["a\r", "b"]
Is this a bug? I tested this with Swift 6.1.2 on Ubuntu 20.04.
On MacOS, the output is like the following.
["a\r", "b"]
["a\r", "b"]
This is intentional. Recall that a Swift Character
is not the same as a Unicode codepoint. A Swift Character
is an "extended grapheme cluster". Compare:
print("\r\n".count) // 1
print("\r\n".unicodeScalars.count) // 2
"\r\n"
is a single Character
, and "\n"
is another, different, single Character
. "a\r\nb"
does not contain any occurrences of the Character
that is "\n"
.
On the other hand, NSString
works differently. NSString
is a sequence of UTF16 code units. Its components(separatedBy:)
method operates not on Character
s, but on UTF16 code units.
Note that on Apple platforms, the behaviour of components(separatedBy: "\n")
does split the string codepoint-by-codepoint, so that it is compatible with NSString.components(separatedBy: "\n")
. See the source:
public func components<T : StringProtocol>(separatedBy separator: T) -> [String] {
#if FOUNDATION_FRAMEWORK
if let contiguousSubstring = _asContiguousUTF8Substring(from: startIndex..<endIndex) {
let options: String.CompareOptions
if separator == "\n" {
// 106365366: Some clients intend to separate strings whose line separator is "\r\n" with "\n".
// Maintain compatibility with `.literal` so that "\n" can match that in "\r\n" on the unicode scalar level.
options = [.literal]
} else {
options = []
}
do {
return try contiguousSubstring._components(separatedBy: Substring(separator), options: options)
} catch {
// Otherwise, inputs were unsupported - fallthrough to NSString implementation for compatibility
}
}
return _ns.components(separatedBy: separator._ephemeralString)
This whole special case is wrapped in #if FOUNDATION_FRAMEWORK
, which is only turned on when building as Foundation.framework.