swiftcharacter-encodingnsdatajsondecoder

Character encoding of a `Data` from a `URLSession` to create objects from JSON in Swift


In Swift, I am trying to downloaded JSON data from a URL, then use it to create & populate a Swift object.

I do not know the character encoding of the returned JSON.

I have seen some uses of JSONDecoder#decode<T>(_ type: T.Type, from data: Data).

How does the above function know what character encoding to use?

The Data that I've seen used is from an argument to the completionHandler argument from the following function from URLSession:

func dataTask(
    with request: URLRequest,
    completionHandler: @escaping @Sendable (Data?, URLResponse?, (any Error)?) -> Void
) -> URLSessionDataTask

I don't see any character encoding info in the Data class, but I do see it in URLResponse#textEncodingName.

Does Data somehow know what its character encoding is? Or does JSONDecoder#decode<T>(…) try to autodetect the character encoding? Or?

If Data doesn't contain the character encoding info itself, how can I use it from URLResponse#textEncodingName in some function to create & populate a new object?

Also, if the input is is not proper JSON, I want to output the non-JSON text. From what I've seen, I'm supposed to use String(data:encoding:) to convert from Data to String?. I currently do not have the URLResponse available to me where I need to call this (I just have the Data) due to existing code from someone else.

If the Data has the character encoding, is there a function I can call that just takes the Data without requiring the String.Encoding argument? Or is there some way to read the character encoding from the Data so I can use it to provide the String.Encoding argument?

If I can avoid directly using the URLResponse, it will save me from needing to refactor everything to make the URLResponse available in certain places.


Solution

  • Data does not know what encoding it is encoded with. Data is just a collection of bytes. It doesn't even necessarily represent text at all.

    Though JSONDecoder does not document what encodings it supports, JSONSerialization.jsonObject(with:options:) does.

    The data must be in one of the 5 supported encodings listed in the JSON specification: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. The data may or may not have a BOM.

    As far as I know, JSONDeocder uses the JSONSerialization APIs under the hood, in which case it would support the same encodings too. From a quick check, this is at least true for the JSONDecoder implementation in swift-foundation. Here is how swift-foundation autodetect the encoding.

    So if you can assume that the encoding will be one of those listed above, you can just use JSONDecoder as usual.

    Otherwise, you would have to read the textEncodingName from the URLResponse (or autodetect it), decode the text, and re-encode it into data using a supported encoding.

    You also mentioned that you want to get the response data as plain text if it is not valid JSON. In that case, you must get the encoding via textEncodingName (or autodetect it), as well.