I made a simple demo for a chatbox interface in Godot, using which you can chat with a language model, which runs using Ollama. Currently, the interface between Godot and the language model is based on the Ollama API. The response time is about 30 seconds.
If I chat directly with the LM using the Ollama CLI, the response time is much lower (less than 1 sec), and it's noticeably lower even if I interact with the API using Curl (curl http://localhost:11434/api/generate -d '{ "model": "qwen2:1.5b", "prompt": "What is water made of?", "stream": false}'
).
Here is the code snippet I am using to interact with Ollama:
func send_to_ollama(message):
var url = "http://localhost:11434/api/generate"
var headers = ["Content-Type: application/json"]
var body = JSON.stringify({
"model": "qwen2:1.5b",
"prompt": message,
"stream": false
})
Do you spot anything wrong? Am I calling the API correctly? Should I add somehow that I want Ollama to use the GPU?
Problem fixed by running Ollama using Docker (see here: https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image)