I deployed an adk agent to agent engine and I'm trying to send multimodal queries to it. I need to send text, image and voice to the agent but nothing makes it work. The documentation is awful and even following the instructions in the docs it doesn't work. I can't find anything online.
If I send this it doesn't work.
const res = await fetch(url, {
method: "POST",
headers: {
Authorization: `Bearer ${accessToken}`,
"Content-Type": "application/json",
Accept: "text/event-stream, application/json",
},
body: JSON.stringify({
class_method: "async_stream_query",
input: {
user_id: "sdg5d456f464g564d",
session_id: "21d23s45f46",
message: [
{ text: "Do you like the photo?" },
{
file_data: {
file_uri: "gs://...",
mime_type: "image/png",
},
},
],
},
}),
});
But this works:
const res = await fetch(url, {
method: "POST",
headers: {
Authorization: `Bearer ${accessToken}`,
"Content-Type": "application/json",
Accept: "text/event-stream, application/json",
},
body: JSON.stringify({
class_method: "async_stream_query",
input: {
user_id: "sdg5d456f464g564d",
session_id: "21d23s45f46",
message: "How are you?",
},
}),
});
However, I need to send images and audio to the agent too.
The docs aren’t very clear, but I was having the same issue. I managed to get it working with this payload:
{
"class_method": "stream_query",
"input": {
"user_id": "...",
"session_id": "...",
"message": {
"role": "user",
"parts": [
{
"file_data": {
"file_uri": "gs://cloud-samples-data/generative-ai/image/scones.jpg",
"mime_type": "image/jpeg"
}
},
{
"text": "this image"
}
]
}
}
}