javaphpperlscriptingscripting-language

Converting XML or HTML to Wiki mark up - what approach would you choose?


I need to convert HTML documents (generated from DocBook XML documents) to the Wiki mark up language, in particular to the PM Wiki mark up language. The goal is to include the company's application operations guides in our newly created wiki. This means that I actually have two options:

  1. Convert the HTMLs (generated from DocBook XMLs) to wiki
  2. Convert the Docbook XMLs directly to wiki

Since the HTMLs are generated by a DocBook to HTML converter, the way the tags are defined within the HTML documents do not vary much, only the contents of the documents.

I am looking for a solution that could be implemented quickly by myself. I will have to do this conversion once and then every time new versions of the application operations guides are created.

Solutions that I've thought of so far:

  1. Convert HTML to wiki with a Perl or PHP script, based on regular expressions.
  2. Convert Docbook XMLs directly to wiki. Since it is XML, I could use Java for XML parsing. The risk here is that I am not familiar with the DocBooks XML format (as I am with HTML), so this make take some time to learn.

What approach would you choose for this work?

Update:

I just tried a PMWiki extension called ConvertHTML. It did not work well, because it does not convert HTML tags (e.g. is not converted as is left as in the wiki), as its documentation says:

PmWiki markup does not support all of the HTML markup so a 100% conversion is not possible. However, PmWiki can make replacements to the text as it is being edited or saved. ConvertHTML implements a relatively comprehensive set of rules for converting HTML tags to wiki markup.


Solution

  • DocBook to Wiki might be useful, though it converts from DocBook to MediaWiki, not PM Wiki.

    There are Perl modules which can convert HTML to various Wiki dialects: HTML::WikiConverter. So if you can get your DocBook into HTML, then that might also work.