githubmarkdowngithub-flavored-markdowncommonmark

What library does Github use for parsing markdown?


Github "uses" github flavored markdown but I haven't been able to find what that means exactly. What parsing library do they use on the client to render the preview(s)?

Is the same lib used for *.md files, issues, and wiki pages?

Bonus points if you can point me to a resource that shows how github flavored markdown and commonmark overlap and how they are different.


Solution

  • point me to a resource that shows how github flavored markdown and commonmark overlap and how they are different.

    2025: as suggested in Schwern's answer, GitHub seems to be using gjtorikian/commonmarker, a Ruby wrapper for the kivikakk/comrak (CommonMark parser) Rust crate.
    That crate is itself a port of github/cmark-gfm, with issue 371 illustrating that "it seems like GitHub has abandoned keeping up with the common mark spec.
    Because of the extensions and add-ons GitHub doesn't even follow all of the GFM specs".

    I'm specifically looking for info on how the header id attributes are generated, which is not covered at all in spec.

    kivikakk/comrak issues 93 deals with anchor ID, pointing to src/html.rs#Anchorizer as the source for converting header strings to canonical, unique, but still human-readable, anchors.

    The process seems to be:

    1. Lowercasing, where the header text is converted to lowercase.
    2. Character filtering, where only certain characters are allowed:
      • Spaces (which are later converted to dashes)
      • Dashes (-)
      • Letters
      • Unicode marks
      • Numbers
      • Connector punctuation (such as underscores)
    3. Space replacement, where each space is replaced with a dash (-).
    4. Uniqueness, to maintain a set of generated anchors. If the anchor already exists, a numeric suffix (e.g., -1, -2, etc.) is appended until a unique ID is produced.

    2017:

    This is now (March 2017) officially documented: see "A formal spec for GitHub Flavored Markdown"

    Starting today, all Markdown user content hosted in our website, including user comments, wikis, and .md files in repositories will be parsed and rendered following a formal specification for GitHub Flavored Markdown.

    This is detailed in "A formal spec for GitHub Flavored Markdown"

    This formal specification is based on CommonMark, an ambitious project to formally specify the Markdown syntax used by many websites on the internet in a way that reflects its real world usage.
    CommonMark allows people to continue using Markdown the same way they always have, while offering developers a comprehensive specification and reference implementations to interoperate and display Markdown in a consistent way between platforms.

    The idea is:

    Taking the CommonMark spec and re-engineering our current user content stack around it is not a trivial endeavour.
    The main issue we struggled with is that the spec (and hence its reference implementations) focuses strictly on the common subset of Markdown that is supported by the original Perl implementation.
    This does not include some of the extended features that have been always available on GitHub. Most notably, support for tables, strikethrough, autolinks and task lists are missing.

    In order to fully specify the version of Markdown we use at GitHub (known as GFM), we had to formally define the syntax and semantics of these features, something which we had never done before. We did this on top of the existing CommonMark spec, taking special care to ensure that our extensions are a strict and optional superset of the original specification.