pythonollamagemma

Ollama multimodal - Gemma not seeing image


This sample multimodal/main.py appears to show Ollama

I am trying to do the same with an image loaded from my machine. I am using the gemma2:27b model. The model is working with chat so that is not the issue.

my Code

import os.path
import PIL.Image
from dotenv import load_dotenv
from ollama import generate

load_dotenv()
CHAT_MODEL_NAME = os.getenv("MODEL_NAME_LATEST")

image_path = os.path.join("data", "image_one.jpg")
test_image = PIL.Image.open(image_path)

# test 1:

for response in generate(CHAT_MODEL_NAME, 'What do you see', images=[test_image], stream=True):
    print(response['response'], end='', flush=True)

# response: ollama._types.RequestError: image must be bytes, path-like object, or file-like object

# test 2: bytes

for response in generate(CHAT_MODEL_NAME, 'What do you see', images=[test_image.tobytes()], stream=True):
    print(response['response'], end='', flush=True)

# response: Please provide me with the image!

# test 3: Path
for response in generate(CHAT_MODEL_NAME, 'What do you see', images=[image_path], stream=True):
    print(response['response'], end='', flush=True)

# response: Please provide me with the image!

How do i properly load an image to Gemma

Cross posted on issue forum 289


Solution

  • The Gemma2 models are not multimodal. They accept only text as input.

    If you want to process images, you need to use PaliGemma which is not supported by Ollama yet (you can follow this issue about it).

    You may find some PaliGemma examples at the Gemma cookbook github repo.