pythonbase64draw.io

How to convert draw.io compressed data from exported png file to xml


I exported a draw.io diagram as png from local draw.io application. The xml is somehow hidden inside this png file probably in the "tExt" chunk. I am trying to "borrow" the draw.io JS implementation of parsePng and convert that to python. The XML is supposed to be hidden in zTxt, however I only see tExt (https://www.diagrams.net/blog/xml-in-png).

import png

filename="./image3.png"
im=png.Reader(filename)
ihdr, text, *rest = im.chunks()

chunk_type, chunk_bytes = text

vals = chunk_bytes.decode("utf-8").split("".join(map(chr, [0])))
print(vals)

These are the available chunks:

python test.py
b'IHDR' 13
b'tEXt' 1031
b'IDAT' 4709
b'IEND' 0

Output I am getting now (I assume the xml is hidden somewhere in this script prbly base64 encoded, but cannot get it out):

['mxfile', '%3Cmxfile%20host%3D%22Electron%22%20modified%3D%222021-11-15T10%3A44%3A54.487Z%22%20agent%3D%225.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2011_6_1)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20draw.io%2F14.5.1%20Chrome%2F89.0.4389.82%20Electron%2F12.0.1%20Safari%2F537.36%22%20etag%3D%22S6Lk2QkhAN9aeDDzQv4n%22%20version%3D%2214.5.1%22%20type%3D%22device%22%3E%3Cdiagram%20id%3D%223ZARfinUemRlELbDbWll%22%20name%3D%22Page-1%22%3EtZTBcoIwEEC%2FhmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd%2F1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk%2B22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs%2BN5Cpf%2B6Uyp3oAjqyzKnpZsF7NG0bPNnXA1%2FgNO3dXrU%2FXP7XrbYOVsn2lVKfTnA%2F6bGWu%2FgbeZf6c2%2F3ZseDlfHfu7oAmbLXw%3D%3D%3C%2Fdiagram%3E%3C%2Fmxfile%3E']

Output I would like to get (at least whats inside the root tag):

<?xml version="1.0" encoding="UTF-8"?>
<mxfile host="Electron" modified="2021-11-15T12:30:17.738Z" agent="5.0 (Macintosh; Intel Mac OS X 11_6_1) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/14.5.1 Chrome/89.0.4389.82 Electron/12.0.1 Safari/537.36" etag="f7nqQOQ3-W-PKNeU6aKq" version="14.5.1" type="device">
  <diagram id="3ZARfinUemRlELbDbWll" name="Page-1">
    <mxGraphModel dx="1106" dy="737" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
      <root>
        <mxCell id="0" />
        <mxCell id="1" parent="0" />
        <mxCell id="O3ffm1LxuBSNMCc37K82-24" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="O3ffm1LxuBSNMCc37K82-22" target="O3ffm1LxuBSNMCc37K82-23">
          <mxGeometry relative="1" as="geometry" />
        </mxCell>
        <mxCell id="O3ffm1LxuBSNMCc37K82-22" value="igor 1" style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
          <mxGeometry x="350" y="350" width="120" height="60" as="geometry" />
        </mxCell>
        <mxCell id="O3ffm1LxuBSNMCc37K82-23" value="igor 2" style="ellipse;whiteSpace=wrap;html=1;rounded=1;" vertex="1" parent="1">
          <mxGeometry x="350" y="480" width="120" height="80" as="geometry" />
        </mxCell>
      </root>
    </mxGraphModel>
  </diagram>
</mxfile>

The above XML represents this draw.io diagram: enter image description here

Note: This might be a copy of a existing question, however that one does not provide a proper answer (How to programatically extract XML data from draw.io PNG)


Solution

  • The output you're getting now is URI Encoded. Decoding it yields this result:

    <mxfile host="Electron" modified="2021-11-15T10:44:54.487Z" agent="5.0 (Macintosh; Intel Mac OS X 11_6_1) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/14.5.1 Chrome/89.0.4389.82 Electron/12.0.1 Safari/537.36" etag="S6Lk2QkhAN9aeDDzQv4n" version="14.5.1" type="device"><diagram id="3ZARfinUemRlELbDbWll" name="Page-1">tZTBcoIwEEC/hmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd/1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk+22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs+N5Cpf+6Uyp3oAjqyzKnpZsF7NG0bPNnXA1/gNO3dXrU/XP7XrbYOVsn2lVKfTnA/6bGWu/gbeZf6c2/3ZseDlfHfu7oAmbLXw==</diagram></mxfile>
    

    We can see the data is contained in the diagram tag. Thanks to this handy tool on draw.io we can see this data is compressed using pako, a javascript port of zlib.

    Thankfully, another user on stackoverflow has already written Python equivalents to Pako methods. Using that, we can continue your program to get the diagram's XML:

    from urllib.parse import quote, unquote
    import xml.etree.ElementTree as ET
    import zlib
    import base64
    
    
    def js_encode_uri_component(data):
        return quote(data, safe='~()*!.\'')
    
    
    def js_decode_uri_component(data):
        return unquote(data)
    
    
    def js_string_to_byte(data):
        return bytes(data, 'iso-8859-1')
    
    
    def js_bytes_to_string(data):
        return data.decode('iso-8859-1')
    
    
    def js_btoa(data):
        return base64.b64encode(data)
    
    def js_atob(data):
        return base64.b64decode(data)
    
    def pako_inflate_raw(data):
        decompress = zlib.decompressobj(-15)
        decompressed_data = decompress.decompress(data)
        decompressed_data += decompress.flush()
        return decompressed_data
    
    original_data = '%3Cmxfile%20host%3D%22Electron%22%20modified%3D%222021-11-15T10%3A44%3A54.487Z%22%20agent%3D%225.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2011_6_1)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20draw.io%2F14.5.1%20Chrome%2F89.0.4389.82%20Electron%2F12.0.1%20Safari%2F537.36%22%20etag%3D%22S6Lk2QkhAN9aeDDzQv4n%22%20version%3D%2214.5.1%22%20type%3D%22device%22%3E%3Cdiagram%20id%3D%223ZARfinUemRlELbDbWll%22%20name%3D%22Page-1%22%3EtZTBcoIwEEC%2FhmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd%2F1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk%2B22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs%2BN5Cpf%2B6Uyp3oAjqyzKnpZsF7NG0bPNnXA1%2FgNO3dXrU%2FXP7XrbYOVsn2lVKfTnA%2F6bGWu%2FgbeZf6c2%2F3ZseDlfHfu7oAmbLXw%3D%3D%3C%2Fdiagram%3E%3C%2Fmxfile%3E'
    uri_decoded_data = js_decode_uri_component(original_data)
    ## Extract diagram data from resulting XML
    root = ET.fromstring(uri_decoded_data)
    diagram_data = root[0].text
    ## Decode Base64
    diagram_data = js_atob(diagram_data)
    decompressed_diagram_data = pako_inflate_raw(diagram_data)
    ## Turn decompressed data into a usable string
    string_diagram_data = js_bytes_to_string(decompressed_diagram_data)
    string_diagram_data = js_decode_uri_component(string_diagram_data)
    print(string_diagram_data)
    

    Output (Formatted):

    <mxGraphModel dx="1106" dy="737" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
        <root>
            <mxCell id="0"/>
            <mxCell id="1" parent="0"/>
            <mxCell id="O3ffm1LxuBSNMCc37K82-24" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="O3ffm1LxuBSNMCc37K82-22" target="O3ffm1LxuBSNMCc37K82-23">
                <mxGeometry relative="1" as="geometry"/>
            </mxCell>
            <mxCell id="O3ffm1LxuBSNMCc37K82-22" value="igor 1" style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
                <mxGeometry x="350" y="350" width="120" height="60" as="geometry"/>
            </mxCell>
            <mxCell id="O3ffm1LxuBSNMCc37K82-23" value="igor 2" style="ellipse;whiteSpace=wrap;html=1;rounded=1;" vertex="1" parent="1">
                <mxGeometry x="350" y="480" width="120" height="80" as="geometry"/>
            </mxCell>
        </root>
    </mxGraphModel>