jqueryxmlxml-parsingsbml

Techniques for writing an xml parser in javascript


There have been many questions already asking at how to write an xml parser, mainly for a website or other applications..

There are also other tutorials that have proved useful including:

http://www.switchonthecode.com/tutorials/xml-parsing-with-jquery

However, I am trying to write a parser for the file format sbml (systems biology markup language):

Specifications - http://sbml.org/Documents/Specifications

I have been trying to hardcode the parser and while it works for the case I have, it will not work for every section.

$(document).ready(function()
    {
    //alert("In function");
      $.ajax({
        type: "GET",
        url: "sbml.xml",
        dataType: "xml",
        success: parseXml

      });

    });

    function parseXml(xml) {
        //alert("Xml loaded");
        $("#output").append("Output loaded <br />" );
        $(xml).find("model").each(function() {

            $("#output").append("Found model <br />" );
            //alert("Found model");
            //alert($(this).attr("id"));
            $(xml).find("listOfCompartments").each(function() {
                //alert("Found list of compartments");
                $("#output").append("List of Compartments found <br />" );
                $.each($(this).children(), function() {
                    var id = $(this).attr("id");
                    var size = $(this).attr("size");
                    //alert("Id: " + id + ", Size: " + size);
                    $("#output").append("Compartment <br />" );
                    $("#output").append("Id: " + id + ", Size: " + size + "<br />");
                });
          });

      });
    }

As the specification is quite big (8 pages) and is prone to changing, is there a better way to write a parser for such a case?

Would it be possible to make an array of all the possible nodes and loop through rather than hardcoding everything. Would this be more efficient?


Solution

  • Do not write an XML parser unless there is no alternative. There are many things in the XML spec (such as parameter entities, internal subsets, etc.) which you must tackle and are quite involved. There are always excellent parsers for all languages and you should use one of those.

    If you write it yourself you will write a parser that only implements part of the spec. It will certainly break in the future and that will only cause problems to you and your collaborators.

    UPDATE: Distinguish between PARSING and manipulating the DOM. You do not want to parse the XML, you want the browser to do it for you (and it will). You want to manipulate the DOM, maybe with XPath.

    UPDATE: I am not an expert but here is a fairly recent example of a parser in a MS environment.

    XML Parser in Microsoft Browser:
    Microsoft’s XML parser is a COM component that comes with Internet Explorer 5 and higher. To load the XML Parser in JavaScript will have to follow series of steps.
    
        1. Create instance of XML Parser:
    
        <script type="text/javascript">  
             var xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
        </script>
    
        This will load the xml parser in the memory and will wait for the xml document. This component will automatically get erased when you close the browser window or the Browser. Here the xmlDoc holds the XML Object for JavaScript.
    

    Other browsers will have similar parsers.

    UPDATE3: "did you create a parser for CML..."? Not really. I took part in the development of XML and its parsers in 1997 (Norbert Mikula, Tim Bary and others). In fact we redesigned XML as a result of the difficulty of parsing XML.

    XML parsers create either a SAX event stream or a DOM and in theory all parsers should create the same. This is referred to as the Infoset. It has removed all the syntactic variations in XML (quoting, CDATA, entities, etc.). It is generally referred to as the DOM.

    I think you mean - "how to I turn the infoset into something specialised for my application"? If so, yes - I have written extensive code to manipulate the raw infoset. In my case it is to create specialised subclasses of XML Elements. Thus I have CMLMolecule, CMLAtom, etc. The code in is JUMBO (CMLXOM) https://bitbucket.org/wwmm/cmlxom

    This is the same philosophy as has been adopted by (say) MathML and SVG - they have specialised subclasses.

    It's quite a lot of work - I have used both automatic and handcrafted approaches. I don't like the W3CDom as a base and I'd advise a DOM where you can subclass Element. But if you are intending to write the definitive SBML Javascript DOM then I would not discourage you.

    I did do this for CML in Javascript some time ago but the browsers had flaky DOMs and I may need to revisit this. It's almost essential for doing interactive graphics.

    Look forward to hearing from you