javahtmldomxercesj9

Xerces behaving differently on SUN JRE v1.5 and IBM J9 v1.5


I am trying to parse some HTML using NekoHTML.

The problem is that when the below code snippet is executed on the SUN JDK 1.5.0_01 it works fine (this is when i am using eclipse with sun jre). But when the same thing is executed on IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled) then it is not working (this is when i am using the IBM RAD for development).

NodeList tags = doc.getElementsByTagName("td"); 

for (int i = 0; i < tags.getLength(); i++) 
{
 Element elem = (Element) tags.item(i);
 // do something with elem
}

By working fine I mean that I am getting a list of "td" elements which I can process further. In case of the J9 I am not entering the for loop.

I am using latest version of NekoHTML (along with the bundled Xerces jars). The doc in the above code is of type org.w3.dom.Document (the runtime class used is org.apache.html.dom.HTMLDocumentImpl)

The IBM J9 details are as follows:

java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (ifix 117674: SR4 + 116644 + 114941 + 116110 + 114881))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled)
J9VM - 20070322_12058_lHdSMR
JIT  - 20070109_1805ifx3_r8
GC   - WASIFIX_2007)
JCL  - 20070131

Any idea, suggestion or workaround is appreciated. Thanks.


Solution

  • I have 2 ideas.

    1. I have just verified that xerces is a part of the JRE installation, so I believe it arrives to the classpath of your application from there. Probably SUN and IBM bring you different versions of xerces. So, as a first approach check it and probably try to replace what you have under IBM to the SUN's version. If it helps you have 2 options: continue running IBM java with xerces from SUN or continue to investigate what's wrong with xerces from IBM.
    2. Are there other differences between your dev and production environments? Are these the same operating systems? Is it a chance that you are using (for example) windows for development and unix for production but your xml is written on Windows with \r\n as a new line? Or even more: if your XML contains unicode characters and written in windows it can contain special (invisible) prefix that indicates that this is unicode. This prefix may cause parser to fail.