javaapacheapache-tikaslingjackrabbit

NoClassDefFoundError errors in Sling logs when uploading docx, xslx, pptx


I am getting the below multiple errors (see below - one per file) when uploading any office 2007 docs (e.g. pptx, docx, xslx) into Sling. I am using Sling 6 stable standalone.

Is anyone else experiencing this? Are there any known issues with the tika bundle?

Thanks

23.01.2013 14:32:27.248 *WARN* [jackrabbit-pool-1] org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField Failed to extract text from a binary property org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@5217e8de
                at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
                at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
                at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(JackrabbitParser.java:189)
                at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:174)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
                at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:60)
                at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:256)
                at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:196)
                at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:94)
                at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:45)
                at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:111)
                at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:86)
                at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:47)
                at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
                ... 11 more
Caused by: java.lang.reflect.InvocationTargetException
                at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
                at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
                at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:58)
                ... 19 more
Caused by: java.lang.NoClassDefFoundError: org/openxmlformats/schemas/wordprocessingml/x2006/main/SettingsDocument$Factory
                at org.apache.poi.xwpf.usermodel.XWPFSettings.readFrom(XWPFSettings.java:129)
                at org.apache.poi.xwpf.usermodel.XWPFSettings.<init>(XWPFSettings.java:43)
                ... 24 more
Caused by: java.lang.ClassNotFoundException: org.openxmlformats.schemas.wordprocessingml.x2006.main.SettingsDocument$Factory not found by org.apache.tika.bundle [63]
                at org.apache.felix.framework.ModuleImpl.findClassOrResourceByDelegation(ModuleImpl.java:787)
                at org.apache.felix.framework.ModuleImpl.access$400(ModuleImpl.java:71)
                at org.apache.felix.framework.ModuleImpl$ModuleClassLoader.loadClass(ModuleImpl.java:1768)
                at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
                ... 26 more

Solution

  • This was due to missing/incorrect dependencies in the tika 0.6 bundle.

    I had to recompile tika 0.6 with the below changes for it to work. I then replaced the tika bundle in the sling standalone jar file. Please let me know if there is a better way to do this as I am a java beginner. Thanks

    Changes made to tika-0.6.tika-parsers.pom.xml:

    Added:

    <dependency>
          <groupId>org.apache.poi</groupId>
          <artifactId>ooxml-schemas</artifactId>
          <version>1.1</version>
        </dependency>
        <dependency>
          <groupId>org.apache.poi</groupId>
          <artifactId>poi-ooxml-schemas</artifactId>
          <version>${poi.version}</version>
        </dependency>
    

    Removed:

    <dependency>
          <groupId>org.apache.geronimo.specs</groupId>
          <artifactId>geronimo-stax-api_1.0_spec</artifactId>
          <version>1.0.1</version>
        </dependency>