With .NET Core 3.1 and .NET 5 I tried to reference the TikaOnDotNet NuGet package.
Sample code is as follows.
using System;
using System.IO;
using TikaOnDotNet.TextExtraction;
namespace tika
{
class Program
{
static void Main(string[] args)
{
var textExtractor = new TextExtractor();
var original = new FileInfo(Path.Combine(Directory.GetCurrentDirectory(), @"pptexamples.ppt"));
var wordDocContents = textExtractor.Extract(original.FullName);
}
}
}
In the textExtractor.Extract
method it throws below exception.
TikaOnDotNet.TextExtraction.TextExtractionException: "Extraction of text from the file '/Users/serhatonal/Projects/tika/tika/bin/Debug/netcoreapp3.1/pptexamples.ppt' failed." ---> TikaOnDotNet.TextExtraction.TextExtractionException: "Extraction failed." ---> System.MissingMethodException: "Method not found: 'Void System.IO.FileStream..ctor(System.String, System.IO.FileMode, System.Security.AccessControl.FileSystemRights, System.IO.FileShare, Int32, System.IO.FileOptions)'."
at Java_java_io_FileDescriptor.open(String name, FileMode fileMode, FileAccess fileAccess)
at java.io.FileDescriptor.open(String , FileMode , FileAccess )
at java.io.FileDescriptor.open(String , Int32 , Int32 )
at java.io.FileDescriptor.openReadOnly(String )
at Java_java_io_RandomAccessFile.open0(Object _this, String name, Int32 mode, FileDescriptor fd, Int32 O_RDWR)
at java.io.RandomAccessFile.open0(String , Int32 )
at java.io.RandomAccessFile.open(String , Int32 )
at java.io.RandomAccessFile..ctor(File file, String mode)
at java.util.zip.ZipFile..ctor(File file, Int32 mode, Charset charset)
at java.util.zip.ZipFile..ctor(File file, Int32 mode)
at java.util.jar.JarFile..ctor(File file, Boolean verify, Int32 mode)
at java.util.jar.JarFile..ctor(String name)
at IKVM.NativeCode.ikvm.runtime.AssemblyClassLoader.lazyDefinePackages(ClassLoader _this)
at ikvm.runtime.AssemblyClassLoader.lazyDefinePackages()
at ikvm.runtime.AssemblyClassLoader.lazyDefinePackagesCheck()
at ikvm.runtime.AssemblyClassLoader.getPackage(String name)
at java.lang.Package.getPackage(Class )
at java.lang.Class.getPackage()
at org.apache.tika.mime.MimeTypesFactory.create(String coreFilePath, String extensionFilePath, ClassLoader classLoader)
at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(ClassLoader classLoader)
at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(ClassLoader )
at org.apache.tika.config.TikaConfig..ctor()
at org.apache.tika.config.TikaConfig.getDefaultConfig()
at org.apache.tika.parser.AutoDetectParser..ctor()
at TikaOnDotNet.TextExtraction.Stream.StreamTextExtractor.Extract(Func`2 streamFactory, Stream outputStream)
--- End of inner exception stack trace ---
at TikaOnDotNet.TextExtraction.Stream.StreamTextExtractor.Extract(Func`2 streamFactory, Stream outputStream)
at TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](Func`2 streamFactory, Func`3 extractionResultAssembler)
at TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](String filePath, Func`3 extractionResultAssembler)
--- End of inner exception stack trace ---
at TikaOnDotNet.TextExtraction.TextExtractor.Extract[TExtractionResult](String filePath, Func`3 extractionResultAssembler)
at TikaOnDotNet.TextExtraction.TextExtractor.Extract(String filePath)
at tika.Program.Main(String[] args) in /Users/serhatonal/Projects/tika/tika/Program.cs:16
Even though I found out that issue "System.MissingMethodException: "Method not found: 'Void System.IO.FileStream..ctor(System.String, System.IO.FileMode, System.Security.AccessControl.FileSystemRights, System.IO.FileShare, Int32, System.IO.FileOptions)'."" is considered fixed with the .NET 5 release according to below issue. But the problem still persists.
https://github.com/dotnet/runtime/issues/30435
Anyone having the same issue?
IMVM the basis of the library, as this is a java port, is not dotnet core compatible.
https://github.com/KevM/tikaondotnet/issues/136#issuecomment-583695410
Unfortunately, this is why.