pythondocumentationgoogle-cloud-vision

Google Vision AI does not include correct documentation


I am using the following code to do OCR using Google Vision AI:

from google.cloud import vision
client = vision.ImageAnnotatorClient()
bindata = base64.b64decode(b64data)       # b64data is a Base64 encoded image file
image = vision.Image(content=bindata)
results = client.text_detection(image=image)

Which works fine for most images. I found now an image where text is not detected, but when I rotate the image, it does get detected.

So, I am looking for extra options I could possibly give the text_detection method, but I can nowhere find good reference documentation on it. Even the official Google documentation does not include that method.

Where can I find the right documentation for the text_detection method? (Or, better, how can I improve the OCR detection to correctly detect upside down text).


Solution

  • Google's documentation Detect text in Images references text_detection but it's unclear to me where this method is defined even though it's available on vision_v1.ImageAnnotatorClient and not obviously documented in the SDK, APIs Explorer nor googleapis (gRPC) repo.

    Subsequently on the page, the documentation uses APIs Explorer and annotate_image which supports providing features.

    See the gRPC Message called Featureas defined in the Google's googleapis repo.

    from google.cloud import vision
    from google.cloud.vision_v1.types import AnnotateImageRequest,Image,ImageContext, Feature, TextDetectionParams
    
    client = vision.ImageAnnotatorClient()
    
    try:
        with open("question.png", "rb") as image_file:
            content = image_file.read()
    except FileNotFoundError:
        print("Image file not found.")
        content = b""
    
    image = Image(content=content)
    features = [
        Feature(
            model="builtin/latest",
            type_=Feature.Type.DOCUMENT_TEXT_DETECTION,
        ),
    ]
    image_context = ImageContext(
        language_hints=["en"],
        text_detection_params=TextDetectionParams(
            enable_text_detection_confidence_score=True
        ),
    )
    
    rqst = AnnotateImageRequest(
        image=image,
        features=features,
        image_context=image_context,
    )
    
    resp = client.annotate_image(request=rqst)
    print(resp)
    

    question.png is a screenshot of your question.

    Yields:

    text_annotations { ... }
    ...
    full_text_annotation {
      pages { ... }
      text: "Google Vision Al does not include correct documentation\nAsked today Modified today Viewed 21 times\nPart of Google Cloud Collective\nVote\nI am using the following code to do OCR using Google Vision Al:\nfrom google.cloud import vision\nclient = vision. ImageAnnotatorClient()\nbindata = base64. b64decode (b64data)\nimage = vision. Image (content=bindata)\n#b64data is a Base64 encoded image file\nresults = client.text_detection (image-image)\nWhich works fine for most images. I found now an image where text is not detected, but when I\nrotate the image, it does get detected.\nSo, I am looking for extra options I could possibly give the text_detection method, but I can\nnowhere find good reference documentation on it. Even the official Google documentation does\nnot include that method.\nWhere can I find the right documentation for the text_detection method? (Or, better, how can I\nimprove the OCR detection to correctly detect upside down text)."
    }
    ...