javapythonapache-tikapyjniustika-server

JNIUS & TIKA - error trying to parseToString


tried to run the tike-app with jnius but got a problem (macOS Sierra, Java 1.8 JDK, Python 2.7 & Python 3.6) Everything works fine (output for tika.detect is fine) until the parseToString command. It seems there's a pop up showing off if you run this command (tested with a java program too and it works). But running with jnius it stops working and there's no output and no error.

import os

os.environ['CLASSPATH'] = "tika-app-1.14.jar"
from jnius import autoclass
from jnius import JavaException

# Import the Java classes
Tika = autoclass('org.apache.tika.Tika')
Metadata = autoclass('org.apache.tika.metadata.Metadata')
File = autoclass('java.io.File')

# Raise an exception and continue if parsing fails
try:
    file = File('./source/test.doc')
    tika = Tika()
    meta = Metadata()
    detectText = tika.detect(file)
    print(detectText) # Working the output is: application/msword
    contentText = tika.parseToString(file) #here it stops no further steps are executed
    print(contentText)
except (JavaException,UnicodeDecodeError) as e:
    print("ERROR: %s" % (e))

Solution

  • Finally i've found the solution. There's a option for the JVM missing telling the tiki.jar to use the headless Mode.

    #Config have to be before import minus
    import jnius_config
    jnius_config.add_options('-Djava.awt.headless=true')
    
    import os
    os.environ['CLASSPATH'] = "tika-app-1.14.jar"
    
    from jnius import autoclass
    
    ## Import the Java classes we are going to need
    Tika = autoclass('org.apache.tika.Tika')
    Metadata = autoclass('org.apache.tika.metadata.Metadata')
    FileInputStream = autoclass('java.io.FileInputStream')
    
    tika = Tika()
    meta = Metadata()
    text = tika.parseToString(FileInputStream("./source/test.doc"), meta)
    print(text)