perlxbrl

Reading and parsing a XBRL file in Perl (or converting into normal XML / JSON!)


I'm drawing a blank on this. XBRL seems to be based on XML - yet there seems to be no agreed structure for it. I'm taking data from http://download.companieshouse.gov.uk/en_monthlyaccountsdata.html , and I want to parse the file into usable data

How are you supposed to process XBRL files and output usable data structures? For instance, I want to see what the gross turnover was for last years return.

This must be possible, otherwise what is the point in Company House providing the data?

Any gudance is much appreciated! I feel like I'm going round and round in circles with this one


Solution

  • XBRL follows the XBRL specifications, which are built on XML. Companies House uses the Inline XBRL (iXBRL) variant of XBRL in which the XBRL tags are embedded in an HTML document.

    It's not accurate to say that the documents don't follow any defined structure; they follow the above specifications and are validated as doing so upon receipt by Companies House.

    However, the iXBRL reports collected by Companies House are financial reports, which follow applicable accounting standard and the accounting standards permit quite a lot of variation in exactly what is reported by each company.

    Data in XBRL is tagged by associating a value (e.g. 1,000) with a concept (e.g. "Revenue") and some dimensions (such as period and units).

    The accounting terms (such as "Assets", "Revenue", etc) are defined as concepts in a taxonomy. Because of the variation permitted by the accounting standards, you may find that not all companies disclose the concepts that you are looking for.

    In the case of Companies House data, this is further complicated by the fact that many smaller companies can and do file abbreviated accounts which don't include the Profit and Loss statement, so "turnover" often simply isn't reported. The filing of iXBRL to Companies House is optional, and many companies choose to make their data less accessible by filing on paper.

    In terms of making the data easier to work with, I would strongly recommend using an existing XBRL processor that will take care of reading not only the iXBRL report, but the associated taxonomy.

    The most widely used open source processor is Arelle, and there are also many commercial processors available too (see https://software.xbrl.org).

    Arelle will allow you to work with XBRL data via a Python API, or it can be used to convert it to the new, xBRL-JSON format.