google-cloud-platformgoogle-cloud-vertex-aigoogle-geminigoogle-agent-development-kit

VertexAI Agent Engine Multimodal query using REST API


I deployed an adk agent to agent engine and I'm trying to send multimodal queries to it. I need to send text, image and voice to the agent but nothing makes it work. The documentation is awful and even following the instructions in the docs it doesn't work. I can't find anything online.

If I send this it doesn't work.

 const res = await fetch(url, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${accessToken}`,
        "Content-Type": "application/json",
        Accept: "text/event-stream, application/json",
      },
      body: JSON.stringify({
        class_method: "async_stream_query",
        input: {
          user_id: "sdg5d456f464g564d",
          session_id: "21d23s45f46",
          message: [
            { text: "Do you like the photo?" },
            {
              file_data: {
                file_uri: "gs://...",
                mime_type: "image/png",
              },
            },
          ],
        },
      }),
    });

But this works:

 const res = await fetch(url, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${accessToken}`,
        "Content-Type": "application/json",
        Accept: "text/event-stream, application/json",
      },
      body: JSON.stringify({
        class_method: "async_stream_query",
        input: {
          user_id: "sdg5d456f464g564d",
          session_id: "21d23s45f46",
          message: "How are you?",
        },
      }),
    });

However, I need to send images and audio to the agent too.


Solution

  • The docs aren’t very clear, but I was having the same issue. I managed to get it working with this payload:

    {
        "class_method": "stream_query",
        "input": {
            "user_id": "...",
            "session_id": "...",
            "message": {
                "role": "user",
                "parts": [
                    {
                        "file_data": {
                            "file_uri": "gs://cloud-samples-data/generative-ai/image/scones.jpg",
                            "mime_type": "image/jpeg"
                        }
                    },
                    {
                        "text": "this image"
                    }
                ]
            }
        }
    }