godotgodot4ollama

Slow Ollama API - how to make sure the GPU is used


I made a simple demo for a chatbox interface in Godot, using which you can chat with a language model, which runs using Ollama. Currently, the interface between Godot and the language model is based on the Ollama API. The response time is about 30 seconds.

If I chat directly with the LM using the Ollama CLI, the response time is much lower (less than 1 sec), and it's noticeably lower even if I interact with the API using Curl (curl http://localhost:11434/api/generate -d '{ "model": "qwen2:1.5b", "prompt": "What is water made of?", "stream": false}').

Here is the code snippet I am using to interact with Ollama:

func send_to_ollama(message):
    var url = "http://localhost:11434/api/generate"
    var headers = ["Content-Type: application/json"]
    var body = JSON.stringify({
        "model": "qwen2:1.5b",
        "prompt": message,
        "stream": false
    })

Do you spot anything wrong? Am I calling the API correctly? Should I add somehow that I want Ollama to use the GPU?


Solution

  • Problem fixed by running Ollama using Docker (see here: https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image)