linuxubuntuimage-processingcbir

Content-based image search linux


I'm looking for an image that contains some certain text on my machine running Ubuntu 12.04

Say for example I'm looking for "Some text here," like in the folliwing image: some text

I want to be able to find any larger images containing that text on my hard drive: larger image

Is there a way to search my machine for that?

Thanks for any tips!


Solution

  • Check out tesseract, it should do the job: https://code.google.com/p/tesseract-ocr/wiki/ReadMe

    You can run:

    tesseract Sometext_big.png out.txt
    

    And out.txt will contain "Some text here".

    Then it's just a matter of some shell scripting to call find to find all of the images of a particular type, run them through tesseract and see if the output file contains the text you want.