Is it generally possible to create image embeddings directly (without additional text) and store them in a database? The aim is to make the content of the images findable later via a text input in the front end using a similarity search. Is this feasible?
In best case I dont want to use any OCR and natively embed the images.
Have you looked into multimodal embedding models?
A commercial option would be Amazons Titan Multimodal Embeddings G1 model. Another one is Coheres Embed which is multimodal too.
There are also Open Source options on Huggingface - see e.g. here.