How do I pass an image to DSPy for analysis?

I am running qwen2.5vl:7b with Ollama and want the model to analyze an image.

When running:

lm = dspy.LM(model="ollama_chat/qwen2.5vl:latest", api_base="http://...")
dspy.configure(lm=lm)
img = dspy.Image.from_file("image.png")
response = lm("Describe this image", image=img)

I get

TypeError: Object of type Image is not JSON serializable

Following the instructions from https://www.langtrace.ai/blog/attribute-extraction-from-images-using-dspy, I do not get an error, but the output seems to be unrelated to the image.

class SceneDescriptionSignature(dspy.Signature):
    """
    ...
    """
    image: dspy.Image = dspy.InputField(desc="""...""")
    scene_description = dspy.OutputField(desc="...")

with open("image.png", "rb") as image_file:
    base64_data = base64.b64encode(
        image_file.read()
    ).decode("utf-8")
image_data_uri = f"data:image/png;base64,{base64_data}"

scene_description = dspy.Predict(SceneDescriptionSignature)(image=image_data_uri)
scene_description["scene_description"]

Output:

This image appears to be a digital artwork or painting that features a landscape scene. The artwork is rich in detail and uses a variety of colors, predominantly blues, greens, and browns, which suggest a natural setting. The foreground includes what looks like a body of water, possibly a lake or river, with reflections of the surrounding environment. The middle ground shows a variety of trees and possibly a small settlement or village, with structures that could be houses or buildings. The background is dominated by a mountain range, which adds depth to the scene. The overall composition is serene and evokes a sense of tranquility, typical of landscape art.

I am pretty sure the model is not really analyzing the image. How do I need to pass the image to it?

Solution

As of dspy v3.0.2, you'd create a signature and pass dspy.Image as input, instead of a data uri for the image, as so:

describe = dspy.Predict(dspy.Signature("image -> scene_description"))
img = dspy.Image.from_file("image.png")

print(describe(image=img).scene_description)

or, if using a class-based Signature,

class SceneDescriptionSignature(dspy.Signature):
    """Describe the contents of an image in detail."""
    image: dspy.Image = dspy.InputField(desc="...")
    scene_description: str = dspy.OutputField(desc="...")

describe = dspy.Predict(SceneDescriptionSignature)
img = dspy.Image.from_file("image.png")

print(describe(image=img).scene_description)