swiftubuntu

In Linux, why does String.components(separatedBy: "\n") return an incorrect result?


In Linux, the following code outputs strange results.

import Foundation

print("a\r\nb".components(separatedBy: "\n"))
print(("a\r\nb" as NSString).components(separatedBy: "\n"))

The following is the output.

["a\r\nb"]
["a\r", "b"]

Is this a bug? I tested this with Swift 6.1.2 on Ubuntu 20.04.

On MacOS, the output is like the following.

["a\r", "b"]
["a\r", "b"]

Solution

  • This is intentional. Recall that a Swift Character is not the same as a Unicode codepoint. A Swift Character is an "extended grapheme cluster". Compare:

    print("\r\n".count) // 1
    print("\r\n".unicodeScalars.count) // 2
    

    "\r\n" is a single Character, and "\n" is another, different, single Character. "a\r\nb" does not contain any occurrences of the Character that is "\n".

    On the other hand, NSString works differently. NSString is a sequence of UTF16 code units. Its components(separatedBy:) method operates not on Characters, but on UTF16 code units.


    Note that on Apple platforms, the behaviour of components(separatedBy: "\n") does split the string codepoint-by-codepoint, so that it is compatible with NSString.components(separatedBy: "\n"). See the source:

        public func components<T : StringProtocol>(separatedBy separator: T) -> [String] {
    #if FOUNDATION_FRAMEWORK
            if let contiguousSubstring = _asContiguousUTF8Substring(from: startIndex..<endIndex) {
                let options: String.CompareOptions
                if separator == "\n" {
                    // 106365366: Some clients intend to separate strings whose line separator is "\r\n" with "\n".
                    // Maintain compatibility with `.literal` so that "\n" can match that in "\r\n" on the unicode scalar level.
                    options = [.literal]
                } else {
                    options = []
                }
    
                do {
                    return try contiguousSubstring._components(separatedBy: Substring(separator), options: options)
                } catch {
                    // Otherwise, inputs were unsupported - fallthrough to NSString implementation for compatibility
                }
            }
    
            return _ns.components(separatedBy: separator._ephemeralString)
    

    This whole special case is wrapped in #if FOUNDATION_FRAMEWORK, which is only turned on when building as Foundation.framework.