pythonopenai-apilarge-language-modelword-embedding

Create native picture embeddings and then make a similarity search with text


Is it generally possible to create image embeddings directly (without additional text) and store them in a database? The aim is to make the content of the images findable later via a text input in the front end using a similarity search. Is this feasible?

In best case I dont want to use any OCR and natively embed the images.


Solution

  • Have you looked into multimodal embedding models?

    A commercial option would be Amazons Titan Multimodal Embeddings G1 model. Another one is Coheres Embed which is multimodal too.

    There are also Open Source options on Huggingface - see e.g. here.