I'm using jython-standalone-2.7.2.jar
within a Java application and executing a simple script that needs to parse a small XML file. For some reason, it takes almost two seconds for make_parser()
to return a parser object.
from xml.sax import make_parser
import time
start_time = time.time()
parser = make_parser()
print("--- %s seconds ---" % (time.time() - start_time))
Outputs:
--- 1.79200005531 seconds ---
Is there any way to speed this up per script run without "writing Java pseudo-code" in the Jython script?
I'm not sure what is taking up so much time - perhaps it is the lookup being performed? Since make_parser()
takes a list of parsers, what would one supply in this list in order to avoid the lookup (in the context of Jython)?
Indeed initialising the parser is slow. Jython uses the Java Sax under the hood. Creating all the Python objects of Java's Sax implementation is what takes its time.
If you had a pure Python parser you could plug it the list make_parser takes. So you might reduce the long startup time, but actual parsing would be slower. - However I'm not aware of a pure Python implementation of an XML parser. CPython's default (xml.sax.expatreader
) uses a C module.
On the bright side: Once the parser is created, parsing execution or creating an secondary parser should be pretty quick.
Therefore, if you need to parse various small XML files (as you indicate in your comment), you could create a global PythonInterpreter
instance in your Java app and in this instance create a make_parser
Python object. Here a skeleton: (This is Java code)
PythonInterpreter pi = new PythonInterpreter();
pi.exec("from xml.sax import make_parser\n" +
"parser = make_parser()");
The subsequent scripts then can all use parser
as long as you execute them through the same PythonInterpreter
instance like this: (Java again)
pi.execfile("your-jython-script.py");
your-jython-script.py might contain something like this: (This is Python)
with open('your-first-xml.xml`) as f:
xml = parser(f)
.
.
.
Note that the Python script doesn't need the XML import because it uses the pre-created Python object parser
.