pythonserveropenai-apigoogle-gemini

How to send large videos to Gemini AI API 1.5 Pro for inference?


I'm currently working with the Gemini AI API 1.5 Pro (latest version) and need to send large video files for inference. These videos are several hundred megabytes each (~700MB) but are within the API's constraints (e.g., less than 1 hour in length). I want to upload them once and perform inference without re-uploading.

In GPT-4o, there was an option to use image_urls to reference images. Is there a similar method or best practice for handling large video files with the Gemini AI API 1.5 Pro?

The videos are too large to send repeatedly, so an efficient method for uploading and referencing them is crucial.

Any guidance on API endpoints, required parameters, or example code snippets would be greatly appreciated.


Solution

  • In your situation, how about the following sample script?

    Sample script 1:

    Before you test the following script, please update google-generativeai to the latest version.

    import google.generativeai as genai
    import time
    
    apiKey = "###" # Please set your API key.
    video_file_name = "sample.mp4" # Please set your video file with the path.
    display_name = "sampleDisplayName" # Please set the display name of the uploaded file on Gemini. The file is searched from the file list using this value.
    
    genai.configure(api_key=apiKey)
    
    # Get file list in Gemini
    fileList = genai.list_files(page_size=100)
    
    # Check uploaded file.
    video_file = next((f for f in fileList if f.display_name == display_name), None)
    if video_file is None:
        print(f"Uploading file...")
        video_file = genai.upload_file(path=video_file_name, display_name=display_name, resumable=True)
        print(f"Completed upload: {video_file.uri}")
    else:
        print(f"File URI: {video_file.uri}")
    
    # Check the state of the uploaded file.
    while video_file.state.name == "PROCESSING":
        print(".", end="")
        time.sleep(10)
        video_file = genai.get_file(video_file.name)
    
    if video_file.state.name == "FAILED":
        raise ValueError(video_file.state.name)
    
    # Generate content using the uploaded file.
    prompt = "Describe this video."
    model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")
    print("Making LLM inference request...")
    response = model.generate_content([video_file, prompt], request_options={"timeout": 600})
    print(response.text)
    

    In this sample script, when the file has already been uploaded, the existing file is used. On the other hand, when the file is not found, the file is uploaded and the uploaded file is used. In order to search the file, in this sample, display_name is used.

    Sample script 2:

    As another approach, when the value of name can be directly given, the following sample script can be also used. In this case, the value of name is required to be the unique value in the uploaded files.

    import google.generativeai as genai
    import time
    
    apiKey = "###" # Please set your API key.
    video_file_name = "sample.mp4" # Please set your video file with the path.
    name = "sample-name-1" # Please set the name of the uploaded file on Gemini. The file is searched from the file list using this value.
    
    genai.configure(api_key=apiKey)
    
    # Check uploaded file.
    try:
        video_file = genai.get_file(f"files/{name}")
        print(f"File URI: {video_file.uri}")
    except:
        print(f"Uploading file...")
        video_file = genai.upload_file(path=video_file_name, name=name, resumable=True)
        print(f"Completed upload: {video_file.uri}")
    
    # Check the state of the uploaded file.
    while video_file.state.name == "PROCESSING":
        print(".", end="")
        time.sleep(10)
        video_file = genai.get_file(video_file.name)
    
    if video_file.state.name == "FAILED":
        raise ValueError(video_file.state.name)
    
    # Generate content using the uploaded file.
    prompt = "Describe this video."
    model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")
    print("Making LLM inference request...")
    response = model.generate_content([video_file, prompt], request_options={"timeout": 600})
    print(response.text)
    

    This script is the same result with the above script.

    Note:

    Reference: