I am using pylucne to build a search system. I am using TREC data to test my system. I have successfully written the indexer and searcher code. Now I want to use TREC topics to evaluate my system. To do this there is a class named TrecTopicsReader()
which reads the queries from the TREC formatted topics file. But readQueries(BufferedReader reader)
of that class needs a BufferedReader
topics file object passed to it.
How to do this in pylucene. BufferedReader is not available in pylucene JCC.
After waiting for some one to answer, I also asked this question on pylucene developer mailing list.
Andi Vajda replied there. I am answering this question on Andi's behalf.
Quoting Andi:
In the PyLucene Makefile find the jcc invocation and add java.io.BufferedReader to the long command line (don't forget the ending \ as needed) and rebuild PyLucene.
More information:
In the Makefile of pyLucene you will find this line GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \
. In this there should be a line like --package java.io
, add the class(BufferedReader) you want to add to JCC so that it will be available to the python code.
Then compile and install the pylucene again. (You can find the info about compilation & installation at PyLucene's documentation or you can also use this).
Also, for making a BufferedReader
object from a file you will need FileReader
. So add that also.
Just for Completenes: After adding this line my GENERATE
will look like:
GENERATE=$(JCC) $(foreach jar,$(JARS),--jar $(jar)) \
$(JCCFLAGS) --use_full_names \
--package java.lang java.lang.System \
java.lang.Runtime \
--package java.util java.util.Arrays \
java.util.Collections \
java.util.HashMap \
java.util.HashSet \
java.util.TreeSet \
java.lang.IllegalStateException \
java.lang.IndexOutOfBoundsException \
java.util.NoSuchElementException \
java.text.SimpleDateFormat \
java.text.DecimalFormat \
java.text.Collator \
--package java.util.concurrent java.util.concurrent.Executors \
--package java.util.regex \
--package java.io java.io.StringReader \
java.io.InputStreamReader \
java.io.FileInputStream \
java.io.BufferedReader \
java.io.FileReader \
--exclude org.apache.lucene.sandbox.queries.regex.JakartaRegexpCapabilities \
--exclude org.apache.regexp.RegexpTunnel \
--python lucene \
--mapping org.apache.lucene.document.Document 'get:(Ljava/lang/String;)Ljava/lang/String;' \
--mapping java.util.Properties 'getProperty:(Ljava/lang/String;)Ljava/lang/String;' \
--sequence java.util.AbstractList 'size:()I' 'get:(I)Ljava/lang/Object;' \
org.apache.lucene.index.IndexWriter:getReader \
--version $(LUCENE_VER) \
--module python/collections.py \
--module python/ICUNormalizer2Filter.py \
--module python/ICUFoldingFilter.py \
--module python/ICUTransformFilter.py \
$(RESOURCES) \
--files $(NUM_FILES)
Doing this doesn't suffice, you also have to compile the lucene benchmark lib, which is not included in the installation libs by default, because TrecTopicsReader
is present in benchmark api.
To compile and install benchmark:
You have to modify the build.xml inside the main lucene folder, where the benchmark folder is present and then you have to include this jar in main Makefile to install it into python libs as egg.
build.xml:
You have to three modifications. For simplicity follow the jar-test-framework
and wherever this is present try to create the similar pattern for jar-benchmark
.
The three changes you have to do are:
1) <target name="package" depends="jar-core, jar-test-framework, build-modules, init-dist, documentation"/>
replace it with <target name="package" depends="jar-core, jar-test-framework, jar-benchmark, build-modules, init-dist, documentation"/>
2) For the rule
<target name="jar" depends="jar-core,jar-test-framework"
description="Jars core, codecs, test-framework, and all modules">
<modules-crawl target="jar-core"/>
</target>
replace it with
<target name="jar" depends="jar-core,jar-test-framework, jar-benchmark"
description="Jars core, codecs, test-framework, and all modules">
<modules-crawl target="jar-core"/>
</target>
3) Add the following target/rule after the target named jar-test-framework
<target name="jar-benchmark">
<ant dir="${common.dir}/benchmark" target="jar-core" inheritAll="false">
<propertyset refid="uptodate.and.compiled.properties"/>
</ant>
</target>
MakeFile:
Here also you have to do three modifications. For simplicity follow HIGHLIGHTER_JAR
and add similar rules for BENCHMARK_JAR
. The three changes you have to are:
1) Find JARS+=$(HIGHLIGHTER_JAR)
and add JARS+=$(BENCHMARK_JAR)
after that in similar manner.
2) Find HIGHLIGHTER_JAR=$(LUCENE)/build/highlighter/lucene-highlighter-$(LUCENE_VER).jar
and add BENCHMARK_JAR=$(LUCENE)/build/benchmark/lucene-benchmark-$(LUCENE_VER).jar
after this line in similar manner.
3) Find the rule $(ANALYZERS_JAR):
and another rule for $(BENCHMARK_JAR):
after that.
$(BENCHMARK_JAR): $(LUCENE_JAR)
cd $(LUCENE)/benchmark; $(ANT) -Dversion=$(LUCENE_VER) compile
For completeness here are my final Mkaefile and build.xml files.