swiftutf-8openai-api

How do I properly decode French characters from the OpenAI API JSON in Swift?


I am having trouble decoding French outputs from OpenAPI. Issue occurs in the terminal as well as in Xcode, where all of the French special characters are shown in Hexcode, event though I am using the utf8 encoding protocol. Not sure what I am doing wrong.

I have tried various openai models and different coding methods. Here's my decoding class and output.

Code:

  // MARK: - Initializers
    init() {
        setupRequest()
    }
    
    // MARK: - Private Methods
    private func setupRequest() {
        guard let apiUrl = URL(string: url) else { return }
        self.request = URLRequest(url: apiUrl)
        self.request?.httpMethod = "POST"
        self.request?.allHTTPHeaderFields = ["Content-Type": "application/json; charset=UTF-8", "Authorization": "Bearer \(apiKey)"]
    }
}

// MARK: - Private Methods
extension GPTNetwork {
    private func setupRequestData() -> [String: Any] {
        return [
            "model": "gpt-4-turbo",
            "input": "\(basePrompt)"
        ]
    }
}

// MARK: - Public Methods
extension GPTNetwork {
    public func getGPTResponse() async throws -> String {
        guard var request else { return "" }
        request.httpBody = try JSONSerialization.data(withJSONObject: setupRequestData())
        
        do {
            let (data, _) = try await URLSession.shared.data(for: request)
            let responseString = String(data: data, encoding: .utf8)
            return responseString ?? ""
        } catch {
            throw error
        }
    }
}

Response from OpenAPI

{
  "id": "resp_6887bcc97edc81a3a5a81f92ae1918e40d30a71e9f57907d",
  "object": "response",
  "created_at": 1753726153,
  "status": "completed",
  "background": false,
  "error": null,
  "incomplete_details": null,
  "instructions": null,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "model": "gpt-4-turbo-2024-04-09",
  "output": [
    {
      "id": "msg_6887bcca35d481a3963cab8dfc68af300d30a71e9f57907d",
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "Dans une petite ville pittoresque du sud de la France, se trouve un march\u00e9 hebdomadaire plein de vie et de couleur. Chaque samedi matin, les habitants se rassemblent pour acheter des produits frais : fruits juteux, l\u00e9gumes croquants, fromages odorants et viandes savoureuses. Les vendeurs, chaleureux et accueillants, discutent avec leurs clients, partageant des recettes et des conseils pour choisir les meilleurs produits. L'atmosph\u00e8re est anim\u00e9e par des rires et des conversations, tandis que les enfants courent entre les \u00e9tals, attir\u00e9s par les sucreries artisanales. Ce march\u00e9 est un v\u00e9ritable c\u0153ur battant de la communaut\u00e9 o\u00f9 chacun se sent connect\u00e9 \u00e0 la terre et \u00e0 ses voisins."
        }
      ],
      "role": "assistant"
    }
  ],
  "parallel_tool_calls": true,
  "previous_response_id": null,
  "prompt_cache_key": null,
  "reasoning": {
    "effort": null,
    "summary": null
  },
  "safety_identifier": null,
  "service_tier": "default",
  "store": true,
  "temperature": 1.0,
  "text": {
    "format": {
      "type": "text"
    }
  },
  "tool_choice": "auto",
  "tools": [],
  "top_logprobs": 0,
  "top_p": 1.0,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 16,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 181,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 197
  },
  "user": null,
  "metadata": {}
}

Solution

  • There is nothing wrong with the JSON reponse from the server; this is correct JSON:

    "text": "Dans une ..., se trouve un march\u00e9 ..."
    

    However, you should not be calling

    String(data: data, encoding: .utf8)
    

    to translate the returned response to a string. That will be a JSON string and does you no good at all.

    Instead, you should operate directly on the Data object data: either call JSONSerialization jsonObject and parse the resulting Dictionary by hand, or you should define a set of nested Codable structs that match the structure of the expected response. When you, in either of those ways, obtain the value of the "text" property, you will find that you have a correctly formed Swift String in standard UTF-8 encoding, containing "marché" (and so forth).