google-cloud-platform google-api google-cloud-vertex-ai google-generativeai

How to correctly structure the 'video' object for Veo 3.1 endpoint?

I'm trying to use the video extension feature with the Veo 3.1 model using the Google Generative Language API. I am sending requests to the endpoint:

https://generativelanguage.googleapis.com/v1beta/models/veo-3.1-generate-preview:predictLongRunning

My goal is to provide an initial video segment for the model to extend. However, every time I include the video object in my instances payload, the API rejects the request. It always complains about the video object, suggesting my format is wrong.

"`bytesBase64Encoded` isn't supported by this model. Please remove it or refer to the Gemini API documentation for supported usage."

Here is the basic structure of the JSON payload I'm sending. I've left the bytesBase64Encoded string empty here for brevity.

{
  "instances": [
    {
      "prompt": "People from a card are becoming alive",
      "video": {
        "bytesBase64Encoded": "..."
      }
    }
  ],
  "parameters": {
    "sampleCount": 1,
    "resolution": "720p",
    "aspectRatio": "9:16",
    "durationSeconds": 4
  }
}

I have tried several variations for the video object based on how other Google APIs work, but none are successful:

"video": { "bytesBase64Encoded": "..." }
"video": { "gcsUri": "gs://my-bucket/my-video.mp4" }
"video": { "uri": "gs://my-bucket/my-video.mp4" }

My main issue is that I cannot find any official Google Cloud documentation for this specific endpoint or the veo-3.1 model. All available documentation seems to be for different models or the Vertex AI endpoints, which appear to use a different structure.

Has anyone successfully used the video extension feature with this veo-3.1-generate-preview endpoint? What is the correct JSON structure (or Maybe aother endpoint for this) for passing a video (either as bytes or a GCS URI) in the instances payload?Could you share any documentation or a working example payload for this specific Veo 3.1 API? Any help would be greatly appreciated.

Solution

The following data input works me:

"video": { "uri": "${initial_video_id" }

Where initial_video_id is the video generated from the previous Veo generation (note: You can only extend video generated by Veo).

Here is the detail scripts to test:

Generate the initial video:

# export GEMINI_API_KEY="YOUR_API_KEY"
if [[ -z "$GEMINI_API_KEY" ]]; then
  echo "Error: GEMINI_API_KEY environment variable is not set."
  exit 1
fi

# GEMINI API Base URL
BASE_URL="https://generativelanguage.googleapis.com/v1beta"
VIDEO_OUTPUT_FILE="veo_generated_video.mp4"

echo "--- Step 1: Requesting initial video generation ---"
echo "Prompt: An origami butterfly flaps its wings and flies out of the french doors into the garden."

# Send request to generate video and capture the operation name into a variable.
operation_response=$(curl -s "${BASE_URL}/models/veo-3.1-generate-preview:predictLongRunning" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -X "POST" \
  -d '{
    "instances": [{
        "prompt": "An origami butterfly flaps its wings and flies out of the french doors into the garden."
      }
    ],
    "parameters": {
      "sampleCount": 1,
      "resolution": "720p",
      "aspectRatio": "16:9",
      "durationSeconds": 8
    }
  }')

operation_name=$(echo "${operation_response}" | jq -r .name)

if [[ -z "$operation_name" || "$operation_name" == "null" ]]; then
  echo "Error: Failed to start video generation. Operation name not found."
  exit 1
fi

echo "Operation Name: ${operation_name}"
echo "--- Step 2: Polling for video generation completion ---"

# Poll the operation status until the video is ready
while true; do
  echo "Polling operation status for ${operation_name}..."
  # Get the full JSON status and store it in a variable.
  status_response=$(curl -s -H "x-goog-api-key: $GEMINI_API_KEY" "${BASE_URL}/${operation_name}")

  # Check the "done" field from the JSON stored in the variable.
  is_done=$(echo "${status_response}" | jq .done)

  if [ "${is_done}" = "true" ]; then
    echo "Operation completed."

    # Check for an error in the response
    error_message=$(echo "${status_response}" | jq -r .error.message)
    if [[ "${error_message}" != "null" ]]; then
      echo "Error: Video generation failed with: ${error_message}"
      error_details=$(echo "${status_response}" | jq -r .error.details)
      if [[ "${error_details}" != "null" ]]; then
        echo "Error Details: ${error_details}"
      fi
      exit 1
    fi

    # Extract the download URI from the final response.
    video_uri=$(echo "${status_response}" | jq -r '.response.generateVideoResponse.generatedSamples[0].video.uri')

    if [[ -z "$video_uri" || "$video_uri" == "null" ]]; then
      exit 1
    fi

    echo "Generated Video URI: ${video_uri}"

    # Download the video using the URI and API key and follow redirects.
    echo "Attempting to download video to ${VIDEO_OUTPUT_FILE}..."
    if curl -L -o "${VIDEO_OUTPUT_FILE}" -H "x-goog-api-key: $GEMINI_API_KEY" "${video_uri}"; then
      echo "Successfully downloaded video to ${VIDEO_OUTPUT_FILE}"
    else
      echo "Error: Failed to download video from ${video_uri}"
    fi
    break
  fi
  # Wait for 10 seconds before checking again.
  sleep 10
done

echo "Script finished."

Copy and export the Generated Video URI (video_uri). For example:

export initial_video_id=https://generativelanguage.googleapis.com/v1beta/files/xxxx:download?alt=media

Extend the initial video:

if [[ -z "$GEMINI_API_KEY" ]]; then
  exit 1
fi

# GEMINI API Base URL
BASE_URL="https://generativelanguage.googleapis.com/v1beta"
EXTENDED_VIDEO_OUTPUT_FILE="veo_extended_video.mp4"

# --- REQUIRED: Set initial_video_id ---
# Manually set it here or ensure it's exported in your environment.
if [[ -z "$initial_video_id" ]]; then
  echo "Error: initial_video_id is not set."
  exit 1
fi
echo "Using Initial Video ID for extension: ${initial_video_id}"

# --- Step 3: Requesting video extension ---
echo "--- Step 3: Requesting video extension using Initial Video ID ---"
extension_prompt="Track the butterfly into the garden as it lands on an orange origami flower. A fluffy white puppy runs up and gently pats the flower."
echo "Extension Prompt: \"${extension_prompt}\""


extension_operation_response=$(curl -s "${BASE_URL}/models/veo-3.1-generate-preview:predictLongRunning" \
 -H "x-goog-api-key: $GEMINI_API_KEY" \
 -H "Content-Type: application/json" \
 -X "POST" \
 -d "{
  \"instances\": [{
    \"prompt\": \"${extension_prompt}\",
    \"video\": {
      \"uri\": \"${initial_video_id}\"
    }
  }],
  \"parameters\": {
    \"sampleCount\": 1,
    \"resolution\": \"720p\",
    \"aspectRatio\": \"16:9\",
    \"durationSeconds\": 8
  }
}")

extension_operation_name=$(echo "${extension_operation_response}" | jq -r .name)

if [[ -z "$extension_operation_name" || "$extension_operation_name" == "null" ]]; then
  exit 1
fi

echo "Extension Operation Name: ${extension_operation_name}"
echo "--- Step 4: Polling for extension video completion ---"

# --- Step 4: Poll the extension operation status ---
while true; do
  echo "Polling extension operation status for ${extension_operation_name}..."
  status_response=$(curl -s -H "x-goog-api-key: $GEMINI_API_KEY" "${BASE_URL}/${extension_operation_name}")
  is_done=$(echo "${status_response}" | jq .done)

  if [ "${is_done}" = "true" ]; then

    # Check for an error in the response
    error_message=$(echo "${status_response}" | jq -r .error.message)
    if [[ "${error_message}" != "null" ]]; then
      echo "Error: Video extension failed with: ${error_message}"
      error_details=$(echo "${status_response}" | jq -r .error.details)
      if [[ "${error_details}" != "null" ]]; then
        echo "Error Details: ${error_details}"
      fi
      exit 1
    fi

    # Extract the download URI for the extended video.
    extended_video_uri=$(echo "${status_response}" | jq -r '.response.generateVideoResponse.generatedSamples[0].video.uri')

    if [[ -z "$extended_video_uri" || "$extended_video_uri" == "null" ]]; then
      exit 1
    fi

    echo "Extended video generation complete!"
    echo "Generated Extended Video URI: ${extended_video_uri}"

    # Download the extended video. This corresponds to the Python SDK's `client.files.download`.
    echo "Attempting to download extended video to ${EXTENDED_VIDEO_OUTPUT_FILE}..."
    if curl -L -o "${EXTENDED_VIDEO_OUTPUT_FILE}" -H "x-goog-api-key: $GEMINI_API_KEY" "${extended_video_uri}"; then
      echo "Successfully downloaded extended video to ${EXTENDED_VIDEO_OUTPUT_FILE}"
    else
      echo "Error: Failed to download extended video from ${extended_video_uri}"
    fi
    break
  fi
  echo "Waiting for extended video generation to complete..."
  sleep 10
done

echo "Script finished."

This feature is supported in Python SDK, I suggest that you use the SDK version.