I exported a draw.io diagram as png from local draw.io application. The xml is somehow hidden inside this png file probably in the "tExt" chunk. I am trying to "borrow" the draw.io JS implementation of parsePng and convert that to python. The XML is supposed to be hidden in zTxt, however I only see tExt (https://www.diagrams.net/blog/xml-in-png).
import png
filename="./image3.png"
im=png.Reader(filename)
ihdr, text, *rest = im.chunks()
chunk_type, chunk_bytes = text
vals = chunk_bytes.decode("utf-8").split("".join(map(chr, [0])))
print(vals)
These are the available chunks:
python test.py
b'IHDR' 13
b'tEXt' 1031
b'IDAT' 4709
b'IEND' 0
Output I am getting now (I assume the xml is hidden somewhere in this script prbly base64 encoded, but cannot get it out):
['mxfile', '%3Cmxfile%20host%3D%22Electron%22%20modified%3D%222021-11-15T10%3A44%3A54.487Z%22%20agent%3D%225.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2011_6_1)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20draw.io%2F14.5.1%20Chrome%2F89.0.4389.82%20Electron%2F12.0.1%20Safari%2F537.36%22%20etag%3D%22S6Lk2QkhAN9aeDDzQv4n%22%20version%3D%2214.5.1%22%20type%3D%22device%22%3E%3Cdiagram%20id%3D%223ZARfinUemRlELbDbWll%22%20name%3D%22Page-1%22%3EtZTBcoIwEEC%2FhmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd%2F1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk%2B22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs%2BN5Cpf%2B6Uyp3oAjqyzKnpZsF7NG0bPNnXA1%2FgNO3dXrU%2FXP7XrbYOVsn2lVKfTnA%2F6bGWu%2FgbeZf6c2%2F3ZseDlfHfu7oAmbLXw%3D%3D%3C%2Fdiagram%3E%3C%2Fmxfile%3E']
Output I would like to get (at least whats inside the root tag):
<?xml version="1.0" encoding="UTF-8"?>
<mxfile host="Electron" modified="2021-11-15T12:30:17.738Z" agent="5.0 (Macintosh; Intel Mac OS X 11_6_1) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/14.5.1 Chrome/89.0.4389.82 Electron/12.0.1 Safari/537.36" etag="f7nqQOQ3-W-PKNeU6aKq" version="14.5.1" type="device">
<diagram id="3ZARfinUemRlELbDbWll" name="Page-1">
<mxGraphModel dx="1106" dy="737" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
<root>
<mxCell id="0" />
<mxCell id="1" parent="0" />
<mxCell id="O3ffm1LxuBSNMCc37K82-24" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="O3ffm1LxuBSNMCc37K82-22" target="O3ffm1LxuBSNMCc37K82-23">
<mxGeometry relative="1" as="geometry" />
</mxCell>
<mxCell id="O3ffm1LxuBSNMCc37K82-22" value="igor 1" style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
<mxGeometry x="350" y="350" width="120" height="60" as="geometry" />
</mxCell>
<mxCell id="O3ffm1LxuBSNMCc37K82-23" value="igor 2" style="ellipse;whiteSpace=wrap;html=1;rounded=1;" vertex="1" parent="1">
<mxGeometry x="350" y="480" width="120" height="80" as="geometry" />
</mxCell>
</root>
</mxGraphModel>
</diagram>
</mxfile>
The above XML represents this draw.io diagram:
Note: This might be a copy of a existing question, however that one does not provide a proper answer (How to programatically extract XML data from draw.io PNG)
The output you're getting now is URI Encoded. Decoding it yields this result:
<mxfile host="Electron" modified="2021-11-15T10:44:54.487Z" agent="5.0 (Macintosh; Intel Mac OS X 11_6_1) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/14.5.1 Chrome/89.0.4389.82 Electron/12.0.1 Safari/537.36" etag="S6Lk2QkhAN9aeDDzQv4n" version="14.5.1" type="device"><diagram id="3ZARfinUemRlELbDbWll" name="Page-1">tZTBcoIwEEC/hmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd/1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk+22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs+N5Cpf+6Uyp3oAjqyzKnpZsF7NG0bPNnXA1/gNO3dXrU/XP7XrbYOVsn2lVKfTnA/6bGWu/gbeZf6c2/3ZseDlfHfu7oAmbLXw==</diagram></mxfile>
We can see the data is contained in the diagram
tag. Thanks to this handy tool on draw.io we can see this data is compressed using pako, a javascript port of zlib.
Thankfully, another user on stackoverflow has already written Python equivalents to Pako methods. Using that, we can continue your program to get the diagram's XML:
from urllib.parse import quote, unquote
import xml.etree.ElementTree as ET
import zlib
import base64
def js_encode_uri_component(data):
return quote(data, safe='~()*!.\'')
def js_decode_uri_component(data):
return unquote(data)
def js_string_to_byte(data):
return bytes(data, 'iso-8859-1')
def js_bytes_to_string(data):
return data.decode('iso-8859-1')
def js_btoa(data):
return base64.b64encode(data)
def js_atob(data):
return base64.b64decode(data)
def pako_inflate_raw(data):
decompress = zlib.decompressobj(-15)
decompressed_data = decompress.decompress(data)
decompressed_data += decompress.flush()
return decompressed_data
original_data = '%3Cmxfile%20host%3D%22Electron%22%20modified%3D%222021-11-15T10%3A44%3A54.487Z%22%20agent%3D%225.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2011_6_1)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20draw.io%2F14.5.1%20Chrome%2F89.0.4389.82%20Electron%2F12.0.1%20Safari%2F537.36%22%20etag%3D%22S6Lk2QkhAN9aeDDzQv4n%22%20version%3D%2214.5.1%22%20type%3D%22device%22%3E%3Cdiagram%20id%3D%223ZARfinUemRlELbDbWll%22%20name%3D%22Page-1%22%3EtZTBcoIwEEC%2FhmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd%2F1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk%2B22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs%2BN5Cpf%2B6Uyp3oAjqyzKnpZsF7NG0bPNnXA1%2FgNO3dXrU%2FXP7XrbYOVsn2lVKfTnA%2F6bGWu%2FgbeZf6c2%2F3ZseDlfHfu7oAmbLXw%3D%3D%3C%2Fdiagram%3E%3C%2Fmxfile%3E'
uri_decoded_data = js_decode_uri_component(original_data)
## Extract diagram data from resulting XML
root = ET.fromstring(uri_decoded_data)
diagram_data = root[0].text
## Decode Base64
diagram_data = js_atob(diagram_data)
decompressed_diagram_data = pako_inflate_raw(diagram_data)
## Turn decompressed data into a usable string
string_diagram_data = js_bytes_to_string(decompressed_diagram_data)
string_diagram_data = js_decode_uri_component(string_diagram_data)
print(string_diagram_data)
Output (Formatted):
<mxGraphModel dx="1106" dy="737" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
<root>
<mxCell id="0"/>
<mxCell id="1" parent="0"/>
<mxCell id="O3ffm1LxuBSNMCc37K82-24" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="O3ffm1LxuBSNMCc37K82-22" target="O3ffm1LxuBSNMCc37K82-23">
<mxGeometry relative="1" as="geometry"/>
</mxCell>
<mxCell id="O3ffm1LxuBSNMCc37K82-22" value="igor 1" style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
<mxGeometry x="350" y="350" width="120" height="60" as="geometry"/>
</mxCell>
<mxCell id="O3ffm1LxuBSNMCc37K82-23" value="igor 2" style="ellipse;whiteSpace=wrap;html=1;rounded=1;" vertex="1" parent="1">
<mxGeometry x="350" y="480" width="120" height="80" as="geometry"/>
</mxCell>
</root>
</mxGraphModel>