pythonpandasxmlexport-to-xml

How to remove prefix from every attribute in xml using python


I am trying to generate xml file from pandas dataframe using pd.to_xml() method.

I have data in SQL-Server and trying to read from it.

id first_name last_name email gender ip_address
1 Erika Pickless epickless0@twitter.com female 133.146.172.220
2 Victor Stodd vstodd1@foxnews.com male 111.125.132.0
3 Becky Berrisford bberrisford2@shutterfly.com female 200.62.247.237
RawData = pd.read_sql_query('''select * from RFP.dbo.MOCK_DATA;''', conn)


RawData.to_xml('output.xml', attr_cols=['id', 'first_name', 'last_name', 'email', 
'gender', 'ip_address'],namespaces={"soap": "http://example1.com", "wsse": 
"http://docs.example2.org/wss/"}, prefix="soap")

I am getteing the following output

<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
  <soap:row soap:index="0" soap:id="1" soap:first_name="Erika" soap:last_name="Pickless" soap:email="epickless0@twitter.com" soap:gender="Female" soap:ip_address="133.146.172.220"/>
  <soap:row soap:index="1" soap:id="2" soap:first_name="Victoir" soap:last_name="Stodd" soap:email="vstodd1@foxnews.com" soap:gender="Male" soap:ip_address="111.125.132.0"/>
  <soap:row soap:index="2" soap:id="3" soap:first_name="Becky" soap:last_name="Berrisford" soap:email="bberrisford2@shutterfly.com" soap:gender="Female" soap:ip_address="200.62.247.237"/>
</soap:data>

The above output is almost correct but, I want little different output: Issue: The prefix "soap" is attached to every attribute in the above output.

But I want following output:

<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
  <soap:row index="0" id="1" first_name="Erika" last_name="Pickless" email="epickless0@twitter.com" gender="Female" ip_address="133.146.172.220"/>
  <soap:row index="1" id="2" first_name="Victoir" last_name="Stodd" email="vstodd1@foxnews.com" gender="Male" ip_address="111.125.132.0"/>
  <soap:row index="2" id="3" first_name="Becky" last_name="Berrisford" email="bberrisford2@shutterfly.com" gender="Female" ip_address="200.62.247.237"/>    
</soap:data>

Solution

  • Since XML is an open-ended design standard and Pandas cannot support all possible output specifications with default arguments, you need a customized solution.

    Therefore consider running XSLT, the special-purpose language designed to transform XML files, which is supported in DataFrame.to_xml() using the default lxml parser and stylesheet argument. Below XSLT will run after your raw output to remove namespace prefixes from all attributes:

    XSLT (save as .xsl script, a special .xml file)

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
        <xsl:strip-space elements="*"/>
    
        <!-- IDENTITY TRANSFORM -->
        <xsl:template match="node()|@*">
         <xsl:copy>
           <xsl:apply-templates select="node()|@*"/>
         </xsl:copy>
        </xsl:template>
    
        <!-- REMOVE NAMESPACE PREFIXES FROM ALL ATTRIBUTES -->
        <xsl:template match="@*">
          <xsl:attribute name="{local-name()}">
            <xsl:value-of select="."/>
          </xsl:attribute>
        </xsl:template>
    </xsl:stylesheet>
    

    Online Demo

    Python

    RawData.to_xml(
        "output.xml", 
        attr_cols=["id", "first_name", "last_name", "email", "gender", "ip_address"],
        namespaces={"soap": "http://example1.com", "wsse": "http://docs.example2.org/wss/"}, 
        prefix="soap",
        stylesheet="style.xsl"
    )