swiftstringnsdata

How to detect encoding in Data based on a String?


I'm loading a text file, the encoding is unknown as it comes from other sources. The content itself comes from macOS NSDocument's read method, which is fed into my model's read. The String constructor requires the encoding when using Data, if you assume the incorrect one you may get a null. I've created a conditional cascade of potential encodings (it's what other people seem to be doing), there's gotta be a better way to do this. Suggestions?

    override func read(from data: Data, ofType typeName: String) throws {
        model.read(from: data, ofType: typeName)
    }

In the model:

    func read(from data: Data, ofType typeName: String) {
        if let text = String(data: data, encoding: .utf8) {
            content = text
        } else if let text = String(data: data, encoding: .macOSRoman) {
            content = text
        } else if let text = String(data: data, encoding: .ascii) {
            content = text
        } else {
            content = "?????"
        }
    }

Solution

  • You can extend Data and create a stringEncoding property to try to detect the string encoding. Try like this:

    extension Data {
        var stringEncoding: String.Encoding? {
            var nsString: NSString?
            guard case let rawValue = NSString.stringEncoding(for: self, encodingOptions: nil, convertedString: &nsString, usedLossyConversion: nil), rawValue != 0 else { return nil }
            return .init(rawValue: rawValue)
        }
    }
    

    Then you can simply pass data.stringEncoding to the String initialer:

    if let string = String(data: data, encoding: data.stringEncoding) {
        print(string)
    }