This sample multimodal/main.py appears to show Ollama
I am trying to do the same with an image loaded from my machine. I am using the gemma2:27b model. The model is working with chat so that is not the issue.
import os.path
import PIL.Image
from dotenv import load_dotenv
from ollama import generate
load_dotenv()
CHAT_MODEL_NAME = os.getenv("MODEL_NAME_LATEST")
image_path = os.path.join("data", "image_one.jpg")
test_image = PIL.Image.open(image_path)
# test 1:
for response in generate(CHAT_MODEL_NAME, 'What do you see', images=[test_image], stream=True):
print(response['response'], end='', flush=True)
# response: ollama._types.RequestError: image must be bytes, path-like object, or file-like object
# test 2: bytes
for response in generate(CHAT_MODEL_NAME, 'What do you see', images=[test_image.tobytes()], stream=True):
print(response['response'], end='', flush=True)
# response: Please provide me with the image!
# test 3: Path
for response in generate(CHAT_MODEL_NAME, 'What do you see', images=[image_path], stream=True):
print(response['response'], end='', flush=True)
# response: Please provide me with the image!
How do i properly load an image to Gemma
Cross posted on issue forum 289
The Gemma2
models are not multimodal. They accept only text as input.
If you want to process images, you need to use PaliGemma which is not supported by Ollama yet (you can follow this issue about it).
You may find some PaliGemma examples at the Gemma cookbook github repo.