I'm using Generative AI API to return text responses as JSON strings which I intend to feed data into an application in real time. The problem is that often the JSON response provided by GenAI API includes small errors- most commonly with double quotes. These syntax issues in the response JSON string trigger errors in my python code when converting them to JSON.
For instance, I have the following JSON string:
'{"test":"this is "test" of "a" test"","result":"your result is "out" in our website"}'
As you can see, the value for "test" has multiple double quotations. So if I try to convert this to json, I get an error. What I want to do is utilize regex to convert the double quotations to single quotations. So the result can look as follows:
'{"test":"this is 'test' of 'a' test'", "result": "your result is 'out' in our website"}'
The best I can do is as follows:
def repl_call(m):
preq = m.group(1)
qbody = m.group(2)
qbody = re.sub( r'"', "'", qbody )
return preq + '"' + qbody + '"'
print( re.sub( r'([:\[,{]\s*)"(.*?)"(?=\s*[:,\]}])', repl_call, text ))
The following code successfully returns the intended result. However, if I were to add a comma, such as
{"test":"this is "test" of "a", test"","result":"your result is "out" in our website"}
...the code breaks and returns the following:
'{"test":"this is 'test' of 'a", test"","result":"your result is 'out' in our website"}'
:(
I've presently have tried to improve my AI prompt (prompt engineering) to avoid the double quotations and return only a valid JSON string. This works to some degree, but I still encounter enough errors in syntax that require me to retry the same prompt multiple times- which incurs unnecessary delays and costs.
My question is: Is there such thing as a common function and REGEX pattern I can apply in python to fix my JSON string so that it properly cleanses syntax errors? Specifically relating to misplaced double quotes.
I'm open to a variety of suggestions, including possible Python packages that can deal with JSON string cleansing. Even any advice on advanced GenAI tools that do JSON enforcement. I presently use Gemeni- which I like a lot. But doesn't allow JSON enforcement like OpenAI's API allows more explicitly.
If you are requesting JSon back you should be using the response_mime_type and then you will not have these issues with parsing the JSon.
from dotenv import load_dotenv
import google.generativeai as genai
import os
load_dotenv()
genai.configure(api_key=os.environ['API_KEY'])
MODEL_NAME_LATEST = os.environ['MODEL_NAME_LATEST']
model = genai.GenerativeModel(
model_name=MODEL_NAME_LATEST,
# Set the `response_mime_type` to output JSON
generation_config={"response_mime_type": "application/json"})
prompt = """
List 5 popular cookie recipes.
Using this JSON schema:
Recipe = {"recipe_name": str}
Return a `list[Recipe]`
"""
response = model.generate_content(prompt)
print(response.text)
Just remember to ensure that the JSon object you tell it to use is actually correct JSon or it may build it incorrectly include all , where they should be
Another option would be to use response schema.
from dotenv import load_dotenv
import google.generativeai as genai
import os
import typing_extensions as typing
load_dotenv()
genai.configure(api_key=os.environ['API_KEY'])
MODEL_NAME_LATEST = os.environ['MODEL_NAME_LATEST']
class Recipe(typing.TypedDict):
recipe_name: str
model = genai.GenerativeModel(
model_name=MODEL_NAME_LATEST,
# Set the `response_mime_type` to output JSON
# Pass the schema object to the `response_schema` field
generation_config={"response_mime_type": "application/json",
"response_schema": list[Recipe]})
prompt = "List 5 popular cookie recipes"
response = model.generate_content(prompt)
print(response.text)
see Json mode