javascriptnode.jsfirebasegoogle-cloud-vertex-ai

How to Use the Vertex API With Long Text Prompts


Overview

I've been using the Vertex API relatively successfully for the past few months, but I've noticed that when the text part of your prompt becomes extremely long, for example, 130,000 characters or so, the API seems to malfunction.

Implementation Details

API Infrastructure

I've tried two approaches to integrating with Vertex:

  1. Using the vertexai-preview package that ships with firebase
  2. Using the VertexAI NodeJS package in a cloud function with increased (1 GB) memory allocation and a longer (120 sec) max runtime

Package Structure

All of my calls to Vertex follow the documentation's pattern, where the "files" sent to the llm are included via a Cloud Storage URI and the text parts of the prompt are text parts. Like this:

async function multiPartContent() {
    const filePart = {fileData: {fileUri: "gs://generativeai-downloads/images/scones.jpg", mimeType: "image/jpeg"}};
    const textPart = {text: 'What is this picture about?'};
    const request = {
        contents: [{role: 'user', parts: [textPart, filePart]}],
      };
    const streamingResult = await generativeVisionModel.generateContentStream(request);
    for await (const item of streamingResult.stream) {
      console.log('stream chunk: ', JSON.stringify(item));
    }
    const aggregatedResponse = await streamingResult.response;
    console.log(aggregatedResponse.candidates[0].content);
}

In my case, I am using the generateContentStream approach.

Expected Behavior

Given the massive context window, I expect to be able to send lots of information to the llm, then get a response back.

Observed Behavior

vertexai-preview client-side package

When using the vertexai-preview package, I get a Firebase Error with no message property as I start pushing the requests, including more files and text.

I can confirm that my usage is nowhere near the 2m token context window. Usually, these heavier requests are around 200k tokens.

VertexAI server-side approach

Here's a relevant code block from my Cloud Function:

const req = {
      contents: [{ role: "user", parts }],
    };

    console.log(`Initiating content generation for docId: ${docId}`);
    const streamingResp = await generativeModel.generateContentStream(req);

This logic will work for non-large requests, but when there's a heavier request, it will fail. In the cloud logs, I'll see the "Initiating content generation" and, even though I'm catching the errors in my code (the block you see is inside a try / catch block), I don't see any additional cloud logs. The process literally just poof ends.

Some things I've tried

Chunking the text parts

I've tried to convert long text strings into multiple smaller (e.g. ~50k character) text parts. So, the parts I send the llm have, for example:

This didn't work at all.

Sending the long text part as a fileUri part

I've tried converting long text strings into stored plain text files, then sending them as fileUri parts.

This approach does seem to improve reliability. Here, I run into something of a prompt engineering problem, because the prompt actually is in the now-stored text file that I've sent to the LLM.

Summary

Overall, I'm finding it difficult to work with the Vertex API with these larger requests. The Vertex API claims to be able to process these heavy requests, but I'm just finding that as I make these higher-token requests the API completely fails with errors that are non-descriptive.

I'd love to know how to approach this.


Solution

  • Handling Large Requests with Vertex AI: Lessons Learned

    After extensive testing, I discovered several key insights about working with Vertex AI for large-scale AI operations. Here's what I learned and how I solved it:

    The Problem with Vertex AI Preview Package

    The vertexai-preview package, while convenient for simple implementations, has significant limitations when handling:

    The Solution: Architecture Changes

    1. Move Away from Direct Client-Server Connection

    Instead of this:

    // Client directly waiting for AI response
    const response = await vertexai.generateContent(prompt);
    

    Use this pattern:

    // Client
    // 1. Initialize request
    const requestId = await initiateAIProcess(prompt);
    // 2. Listen for updates
    onSnapshot(doc(db, 'aiResponses', requestId), (snapshot) => {
      const response = snapshot.data();
      if (response.status === 'complete') {
        // Handle completion
      }
    });
    
    // Server
    exports.processAIRequest = functions
      .runWith({
        timeoutSeconds: 540,  // 9 minutes
        memory: '1GB'
      })
      .https.onCall(async (data) => {
        // Process AI request
        // Write results to Firestore
    });
    

    2. Use Custom Cloud Functions

    Rather than relying on the Preview package's default function configuration:

    3. Implement Progress Tracking

    For long-running operations:

    Key Takeaways

    1. The Vertex AI Preview package is best suited for:

      • Simple AI integrations
      • Quick response requirements
      • Limited data processing
    2. For production applications, especially those handling large amounts of data:

      • Use custom Cloud Functions
      • Implement asynchronous processing
      • Design for potential disconnections
      • Monitor and handle timeouts explicitly
    3. Consider breaking large requests into smaller, manageable chunks if possible

    This approach has proven much more reliable for handling complex AI operations with large datasets.