So I am trying to implement a basic index writer in PyLucene. I am usually a java dev but due to technical constraints I am doing this in python or else this wouldn't be a problem. I am following the sample in the PyLucene Tarball but
import lucene
from java.io import File
from org.apache.lucene.analysis.standard import StandardAnalyzer
from org.apache.lucene.document import Document, Field
from org.apache.lucene.index import IndexWriter, IndexWriterConfig
from org.apache.lucene.store import SimpleFSDirectory
from org.apache.lucene.util import Version
from org.apache.lucene.store import IOContext
lucene.initVM()
fl = File('index')
indexDir = SimpleFSDirectory(fl)
writerConfig = IndexWriterConfig(Version.LUCENE_6_4_1, StandardAnalyzer())
The problem I am having is that whenever I run this I get the following error:
Traceback (most recent call last):
File "Indexer.py", line 40, in <module>
indexer = Indexer()
File "Indexer.py", line 22, in __init__
indexDir = SimpleFSDirectory(fl)
lucene.InvalidArgsError: (<type 'SimpleFSDirectory'>, '__init__', (<File: index>,))
I have checked the Java code here and it appears there is a constructor public SimpleFSDirectory(File path)
and it looks like that's what I am passing in even in the traceback error. Am I missing something out of jcc
?
This is using Lucene 6.4.1 and I can import lucene and jcc successfully.
So some of the documentation has this
fl = File('index')
indexDir = SimpleFSDirectory(fl)
In newer versions (I am using PyLucene based on Lucene 6.4.1) SimpleFSDirectory
expects a Path
Instead of a File
(Such are the joys of using a java library in python: the brevity of java with the type safety of python.) In the above I also didn't realize I had to attachCurrentThread
Corrected code:
path = Paths.get('index')
self.index_directory = SimpleFSDirectory(path)