pandasxmlxbrl

Convert NBB XBRL into DataFrame


--Edit--
Added the missing namespace for open

Summary

The National Bank of Belgium (NBB) provides financial statements of public companies available to the public, via its API. Since 2022, these can be requested as JSON-files. Before only in XBRL.

This question is about XBRL.

Sources:

As far as I understand, and maybe simplified, XBRL is regular XML file with the addition of being able to refer to a taxonomy, i.e. providing a "dictionary" for the attributes within elements to provide semantic meaning to the content encapsulated in the elements.

Example

<xbrldi:typedMember dimension="dim:afnp"><open:str>John</open:str></xbrldi:typedMember>

Given the namespaces are available, at one point or another, dim:afnp should point to a label first_name or equal.

I would like to be able to convert this XBRL data into a Pandas DataFrame (or .csv)

Reproduce the obstacle

Since an API key is necessary, I'll provide a piece of anonymised code that I believe should be sufficient. The taxonomy is available for download (.zip) in the second link I provided. It is simply a directory with organised folders that, similar to the namespaces, eventually leads to other xml/xld files.

Data from the API looks like this:

<xbrl 
xml:lang="nl"
 xmlns="http://www.xbrl.org/2003/instance"
 xmlns:link="http://www.xbrl.org/2003/linkbase"
 xmlns:xbrldi="http://xbrl.org/2006/xbrldi"
 xmlns:xlink="http://www.w3.org/1999/xlink"
 xmlns:iso4217="http://www.xbrl.org/2003/iso4217"
 xmlns:dim="http://www.nbb.be/be/fr/cbso/dict/dim"
 xmlns:open="http://www.nbb.be/be/fr/cbso/dict/dom/open"
>
<!--More namespaces are given but I believe irrelevant here-->

<link:schemaRef xlink:type="simple" xlink:href="http://www.nbb.be/be/fr/cbso/fws/23.0/mod/m87/m87-f.xsd"/>

<!--The file repeats itself with <context> elements, just different id's.-->

<context id="c19">
<entity><identifier scheme="http://www.fgov.be">Some number</identifier></entity>
<period><instant>Some date</instant></period>
<scenario>
<xbrldi:typedMember dimension="dim:afnp"><open:str>John</open:str></xbrldi:typedMember>
<xbrldi:typedMember dimension="dim:annp"><open:str>Doe</open:str></xbrldi:typedMember>
<xbrldi:explicitMember dimension="dim:bas">bas:m31</xbrldi:explicitMember>
<xbrldi:explicitMember dimension="dim:ctc">ctc:m1</xbrldi:explicitMember>
<xbrldi:explicitMember dimension="dim:part">part:m2</xbrldi:explicitMember>
<xbrldi:explicitMember dimension="dim:psn">psn:m12</xbrldi:explicitMember>
</scenario>
</context>
</xbrl>"""

stringIO_xml = StringIO(xml)

What have I've tried?

I started with Arelle but, in my opinion, their documentation is not very helpful, at least not for a layman. I did not really get results and I felt the problem was providing the taxonomy as a local folder (and not HTTP-link). Also, their GUI would not accept the .zip file (it crashed). I have not tried the CLI.

I also read about Brel, which I believe is more focussed on EDGAR, from the U.S.A.

However, I feel it's more interesting to explore Pandas pd.read_xml() or lxml and work towards a customised script to decode the financial information.

I noticed pd.read_xml() offers the parameter namespaces which accepts a dict. Since the taxonomy is static, a conversion to a dictionary would only have to be done once.

I thought maybe providing a link to the local folder might help.

I have tried:

pd.read_xml(stringIO_xml, namespaces={'dim':'file:////Users/myname/Downloads/nbb-cbso-23.0.1/www.nbb.be/be/fr/cbso/dict/dim'})

Three /// for the file, one / as the start of my absolute path.

But get an error: XMLSyntaxError: Namespace prefix open on str is not defined, line 16, column 51

If I read the error correct, that is about <open:str>John</open:str>, to which I do not even know how to counter that.

My questions

I will try to make it specific. Based on the information I've provided:

I am aware I should not ask broad, open questions so I'll ask more directly.


Solution

  • Well, working with these kind of files is certainly complex. As you say, it would be best to work with the lxml library which offers more versatility as compared to pandas.read_xml().

    But get an error: XMLSyntaxError: Namespace prefix open on str is not defined, line 16, column 51

    Based on the minimal example from the data in the API and the namespaces you showed in your question, I would say that you are missing a namespace for the open:str tag. This should solve the first error you are encountering with your MRE:

    from lxml import etree
    import pandas as pd
    from io import StringIO
    
    xml = """
    <xbrl 
        xml:lang="nl" 
        xmlns="http://www.xbrl.org/2003/instance" 
        xmlns:link="http://www.xbrl.org/2003/linkbase" 
        xmlns:xbrldi="http://xbrl.org/2006/xbrldi" 
        xmlns:xlink="http://www.w3.org/1999/xlink" 
        xmlns:iso4217="http://www.xbrl.org/2003/iso4217" 
        xmlns:dim="http://www.nbb.be/be/fr/cbso/dict/dim" 
        xmlns:open="http://www.example.com/open" ### Example of the missing part  
    >
    <link:schemaRef xlink:type="simple" xlink:href="http://www.nbb.be/be/fr/cbso/fws/23.0/mod/m87/m87-f.xsd"/>
    <context id="c19">
    <entity><identifier scheme="http://www.fgov.be">Some number</identifier></entity>
    <period><instant>Some date</instant></period>
    <scenario>
    <xbrldi:typedMember dimension="dim:afnp"><open:str>John</open:str></xbrldi:typedMember>
    <xbrldi:typedMember dimension="dim:annp"><open:str>Doe</open:str></xbrldi:typedMember>
    <xbrldi:explicitMember dimension="dim:bas">bas:m31</xbrldi:explicitMember>
    <xbrldi:explicitMember dimension="dim:ctc">ctc:m1</xbrldi:explicitMember>
    <xbrldi:explicitMember dimension="dim:part">part:m2</xbrldi:explicitMember>
    <xbrldi:explicitMember dimension="dim:psn">psn:m12</xbrldi:explicitMember>
    </scenario>
    </context>
    </xbrl>"""
    
    stringIO_xml = StringIO(xml)
    tree = etree.parse(stringIO_xml)
    root = tree.getroot()
    
    namespaces = {
        'xbrl': 'http://www.xbrl.org/2003/instance',
        'xbrldi': 'http://xbrl.org/2006/xbrldi',
        'dim': 'http://www.nbb.be/be/fr/cbso/dict/dim',
        'open': 'http://www.example.com/open' ### Example of the missing part
    }
    

    Then, to access the elements of the XPATH you need to work your way into the tree structure like so:

    data = []
    for context in root.findall('.//xbrl:context', namespaces):
        context_id = context.get('id')
        entity = context.find('.//xbrl:entity/xbrl:identifier', namespaces).text
        period = context.find('.//xbrl:period/xbrl:instant', namespaces).text
        scenario = context.find('.//xbrl:scenario', namespaces)
        
        row = {'context_id': context_id, 'entity': entity, 'period': period}
        
        for member in scenario.findall('.//xbrldi:typedMember', namespaces):
            dimension = member.get('dimension')
            value = member.find('.//open:str', namespaces).text
            row[dimension] = value
        
        for member in scenario.findall('.//xbrldi:explicitMember', namespaces):
            dimension = member.get('dimension')
            value = member.text
            row[dimension] = value
        
        data.append(row)
    
    df = pd.DataFrame(data)
    

    This will result in the following dataframe: enter image description here

    Again, I worked through the example you provided and not from the zipped files, which could lead to additional problems depending on how the XML structure is composed. However, I would say it should be pretty straightforward once you load the XML into a string.