I am attempting to inspect a PowerBI .pbix
file using python's zipfile
library.
When unzipping the .pbix
archive, I get the following structure:
DataMashup
DataModel
DiagramLayout
Metadata
Report
ReporLayout
ReporStaticResources
ReporStaticResourceSharedResources
ReporStaticResourceSharedResourceBaseThemes
ReporStaticResourceSharedResourceBaseThemeCY18SU07.json
SecurityBindings
Settings
Version
[Content_Types].xml
It appears that the DataMashup
file in the .pbix
archive is some sort of off-brand archive of a directory.
The DataMashup
object does not appear to be compressed, as I can easily read xml
data when printing the object in the python interpreter.
Using 7zip
I am able to access everything within:
DataMashup/
Config/
Package.xml
Formulas/
Section1.m # m and/or dax looking stuff
[Content_Types].xml
How can I discover the format of the DataMashup
archive-within-an-archive?
One clue is in the binary data at the top of the DataMashup
object: \x00\x00\x00\x00\x07\x05\x00\x00PK
which may indicate pkzip.
Another clue may be this output when attempting to use unzip
on the DataMashup
file:
$ unzip DataMashup
Archive: DataMashup
warning [DataMashup]: 6215 extra bytes at beginning or within zipfile
I was able to uncompress the DataMashup
directory on linux using 7za
:
WARNINGS:
There are data after the end of archive
--
Path = DataMashup
Type = zip
WARNINGS:
There are data after the end of archive
Offset = 8
Physical Size = 1303
Tail Size = 5148
Everything is Ok
Archives with Warnings: 1
Warnings: 1
Files: 3
Size: 2040
Compressed: 6459
Despite the warnings, the files appear okay. Unfortunately, this does not help me on windows.
pbix files are zipped, so one need to unzip the file. DataMashup follows the MS-QDEFF spec.
The DataMashup file within the archive is also an archive, it contains Section1.m which has the query source definitions
here a really good tutorial in c#
https://www.titanwolf.org/Network/q/8acb9f29-4b28-400b-b8df-cbe523edcb01/y
and another here, using power shell
https://querypower.medium.com/extracting-power-queries-41fd73d3d6a2