I have a python application which uses tesseract for detecting checkboxes in scanned images, works perfectly fine on my local machine, but when I push my code to Bluemix along with the python-tesseract buildpack it fails generating the output file which means the tesseract is not getting detected on Bluemix.
applications:
- path: .
memory: 512M
instances: 1
domain: mybluemix.net
name: edge-noise-detector-bluemix
host: edge-noise-detector-bluemix
disk_quota: 1024M
buildpack: https://github.com/LeoKotschenreuther/python-tesseract-buildpack.git
Flask
numpy
Pillow==4.1.1
pycparser
pyOpenSSL
pyparsing
pytesseract
python-dateutil
python-swiftclient
pytz
PyWavelets
scikit-image
scipy
requests
matplotlib==1.4.3
opencv-python
cf_deployment_tracker
tesseract
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/app/.heroku/python/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/app/.heroku/python/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "server.py", line 217, in predict_square_checkboxes
ImgOcr = image_hocr_class.ocr_hocr('temporary.png')
File "/home/vcap/app/src/image_hocr_class.py", line 39, in __init__
self.HTMLTree = xml.etree.ElementTree.parse(self.HOCRFileName).getroot()
File "/app/.heroku/python/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
tree.parse(source, parser)
File "/app/.heroku/python/lib/python3.6/xml/etree/ElementTree.py", line 586, in parse
source = open(source, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'images/8e297b93a39f1e08a490f72c8db53bf0.hocr'
This normally happens when pytesseract could not locate the path of tesseract. Not sure how to get this work on Bluemix. Does anyone got python with tesseract working on Bluemix? Please help.
IBM Cloud gives you a number of possibilities to run your applications. Cloud Foundry Runtimes is one of them, but in your situation doesn't seem a good fit. Whenever you have a dependency that you need to install you need to create your custom buildpack which can be a rather complex task. (https://docs.cloudfoundry.org/buildpacks/custom.html) Ever heard about Docker/Kubernetes? If you have a number of application dependency (like tesseract in your case) I would suggest you to create a Kubernetes environment to build your app! have a look at these resources: https://hub.docker.com/r/tesseractshadow/tesseract4re/ https://console.bluemix.net/docs/containers/container_index.html#container_index