xmlparsingxml-parsing

How do you build an XML parser?


Can anyone direct me to a good tutorial in building an XML parser? I realize most languages already have libraries to do this task, but I'm interested in learning about the grammar of XML and the theory behind how parsers work. I've tried searching for something that explains this but have been unable to find anything.


Solution

  • I think there isn't enough demand for people to write such tutorials; and as I commented, I don't think general parser techniques are of much help. XML parsers are not something usual lex+yacc approach works too well (lexer part more than parser, for what that's worth).

    I know most production ready XML parsers are beasts, but you might be best off starting reading one. Java has a few examples, and xmlpull might be amongst simplest proper parsers. Woodstox and Xerces are the most compliant ("full") parsers, with large codebase, so definitely not light reading. But they handle everything XML parser should, so they might be educational too. But beware half-backed fake parsers that skip checks for things XML specification mandates (Javolution for example checks very few things, for example none of character validity checks, or attribute name duplications).

    Another thing to read is obviously XML specification. It is one of most well-written specifications IMO; accurate and complete, even if not exactly light reading. But considering all it covers, it's actually not all that long.