[SOLVED] Pre-inference tokenisation for Gemini

Pre-inference tokenisation for Gemini

I'm using Gemini for my Rag implementation. In particular, the most recent model - GEMINI_25_FLASH_PREVIEW_04_17. I'm also using Gemini's REST API to embed the text before upsetting into my vector DB.

I'm trying to find the number of tokens just before I pass the text to the endpoint for embedding creation. Google's vertex python API provides this functionalit as explained in this accepted answer and the connected article: https://www.googlecloudcommunity.com/gc/AI-ML/Please-share-Gemini-tokenize-information/m-p/709495

https://medium.com/google-cloud/counting-gemini-text-tokens-locally-with-the-vertex-ai-sdk-78979fea6244

However, when I used this, I get an error that the following exception:

{ "detail": "Model text-embedding-004 is not supported. Supported models: gemini-1.0-pro-001, gemini-1.0-pro-002, gemini-1.5-pro-001, gemini-1.5-flash-001, gemini-1.5-flash-002, gemini-1.5-pro-002.\n" }

I've a suspicion this API hasn't been updated. If not, am I correct to assume that the tokenisation remains the same across the gemini family? If not, then what other method can I use to determine the tokens. I know there are some other gemini compatible tokenisers but I'd rather use gemini's own solutions.

Solution

So you are using an older version of SDK (vertexai.preview) which doesn't support the newer Gemini models (i.e 2.0, 2.5). I would suggest that you use the latest unified Google GenAI SDK. Here is the suggestion:

Install the Python SDK:

pip install --upgrade google-genai

Execute the following code:

from google import genai
from google.genai.types import HttpOptions

client = genai.Client(http_options=HttpOptions(api_version="v1"))
response = client.models.count_tokens(
    model="gemini-2.5-flash-preview-04-17",
    contents="Hello World",
)
print(response)
# Example output:
# total_tokens=10
# cached_content_token_count=None

For detail, reference this doc