I am running qwen2.5vl:7b with Ollama and want the model to analyze an image.
When running:
lm = dspy.LM(model="ollama_chat/qwen2.5vl:latest", api_base="http://...")
dspy.configure(lm=lm)
img = dspy.Image.from_file("image.png")
response = lm("Describe this image", image=img)
I get
TypeError: Object of type Image is not JSON serializable
Following the instructions from https://www.langtrace.ai/blog/attribute-extraction-from-images-using-dspy, I do not get an error, but the output seems to be unrelated to the image.
class SceneDescriptionSignature(dspy.Signature):
"""
...
"""
image: dspy.Image = dspy.InputField(desc="""...""")
scene_description = dspy.OutputField(desc="...")
with open("image.png", "rb") as image_file:
base64_data = base64.b64encode(
image_file.read()
).decode("utf-8")
image_data_uri = f"data:image/png;base64,{base64_data}"
scene_description = dspy.Predict(SceneDescriptionSignature)(image=image_data_uri)
scene_description["scene_description"]
Output:
This image appears to be a digital artwork or painting that features a landscape scene. The artwork is rich in detail and uses a variety of colors, predominantly blues, greens, and browns, which suggest a natural setting. The foreground includes what looks like a body of water, possibly a lake or river, with reflections of the surrounding environment. The middle ground shows a variety of trees and possibly a small settlement or village, with structures that could be houses or buildings. The background is dominated by a mountain range, which adds depth to the scene. The overall composition is serene and evokes a sense of tranquility, typical of landscape art.
I am pretty sure the model is not really analyzing the image. How do I need to pass the image to it?
As of dspy v3.0.2, you'd create a signature and pass dspy.Image as input, instead of a data uri for the image, as so:
describe = dspy.Predict(dspy.Signature("image -> scene_description"))
img = dspy.Image.from_file("image.png")
print(describe(image=img).scene_description)
or, if using a class-based Signature,
class SceneDescriptionSignature(dspy.Signature):
"""Describe the contents of an image in detail."""
image: dspy.Image = dspy.InputField(desc="...")
scene_description: str = dspy.OutputField(desc="...")
describe = dspy.Predict(SceneDescriptionSignature)
img = dspy.Image.from_file("image.png")
print(describe(image=img).scene_description)