pythonxmlreporting-servicesrdlminidom

Using Python to scrape DataSet and Query data from RDL


I set out today with the intent to parse an SSRS RDL file (XML) using Python in order to gather the DataSet and Query data. A recent project has me back tracking on a variety of reports and data sources with the intention of consolidating and cleaning up what we have published.

I was able to use this script to create a CSV file with the following columns: system path|report file name|command type|command text|

It's not very elegant, but it works.

What I'm hoping to be able to do with this post is solicit for any of you experts out there who have either tried this already or are experienced in XML parsing with Python to take a shot at cleaning it up and provided the ability to:

Here is the full code in my "rdlparser.py" file:

import sys, os

from xml.dom import minidom
xmldoc = minidom.parse(sys.argv[1])

content = ""
TargetFile = sys.argv[1].split(".", 1)[0] + ".csv"
numberOfQueryNodes = 0

queryNodes = xmldoc.getElementsByTagName('Query')
numberOfQueryNodes = queryNodes.length -1


while (numberOfQueryNodes > -1):
    content = content + os.path.abspath(sys.argv[1])+ '|'+ sys.argv[1].split(".", 1)[0]+ '|' 
    outputNode = queryNodes.__getitem__(numberOfQueryNodes)
    children = [child for child in outputNode.childNodes if child.nodeType==1]
    numberOfQueryNodes = numberOfQueryNodes - 1
    for node in children:
        if node.firstChild.nodeValue != '\n          ':
            if node.firstChild.nodeValue != 'true':
                content = content + node.firstChild.nodeValue + '|'
    content = content + '\n'

fp = open(TargetFile, 'wb')
fp.write(content)
fp.close()

Solution

  • I know you asked for Python; but I figured Powershell's built in xml handling capabilities would make this fairly simple. While I'm sure it is not guru level, I think it came out pretty nicely (the lines starting with # are comments):

    # The directory to search 
    $searchpath = "C:\"
    
    # List all rdl files    from the given search path recusrivley searching sub folders, store results into a variable
    $files = gci $searchpath -recurse -filter "*.rdl" | SELECT FullName, DirectoryName, Name 
    
    # for each of the found files pass the folder and file name  and the xml content
    $files | % {$Directory = $_.DirectoryName; $Name = $_.Name; [xml](gc $_.FullName)}
                # in the xml content navigate to the the DataSets Element
                | % {$_.Report.DataSets} 
                        # for each query retrieve the Report directory , File Name, DataSource Name, Command Type, Command Text output thwese to a csv file
                        | % {$_.DataSet.Query} | SELECT  @{N="Path";E={$Directory}}, @{N="File";E={$Name}}, DataSourceName, CommandType, CommandText | Export-Csv Test.csv -notype