storagemarkdownmarkupmultimarkdown

Best markup format for future-proofing large text chunks?


I have a number of records (=< 100) that contain sizeable chunks of text that require marking up (semantically: lists, headings, tables, links, quotations, etc...) before storing in a re-usable file format.

When stored, it is likely to remain more or less unchanged for as many years into the future as possible.

It contains some non-ascii, so UTF-8 is required. I started using HTML, then considered Markdown... but would like to know what people think is the most future-proof markup format for long-term storage? The content is initially for a (mostly static) website, but may be used as content for other outputs.

Finally, opinions on the choice of storage for long-term use - database, separate documents...? Changes to records will be infrequent and edited by only 1-3 people, and read access should increase over time.


Update:

I've finally chosen the common features (e.g. for tables) between MultiMarkdown, PHP Markdown Extra and Kramdown as the text format (Markdown omits too many HTML tags), and am converting the resulting files to html with Kramdown. Now I'm trying out iOS Markdown editors that can handle an extended Markdown and sync via Dropbox to my desk/laptop.


Solution

  • Any storage not designed for long-term archiving will break.

    It is not so much a question of database vs. filesystem, but how to ensure that no (silent) data corruption happens, and how to migrate data. I can give you no definitive answers, because it depends on a lot of factors (incl. costs), but here are a few resources:

    I have no real answer for the format question, but I think HTML + UTF-8 should be readable even in decades, but document it.