I am trying to generate xml file from pandas dataframe using pd.to_xml() method.
I have data in SQL-Server and trying to read from it.
id | first_name | last_name | gender | ip_address | |
---|---|---|---|---|---|
1 | Erika | Pickless | epickless0@twitter.com | female | 133.146.172.220 |
2 | Victor | Stodd | vstodd1@foxnews.com | male | 111.125.132.0 |
3 | Becky | Berrisford | bberrisford2@shutterfly.com | female | 200.62.247.237 |
RawData = pd.read_sql_query('''select * from RFP.dbo.MOCK_DATA;''', conn)
RawData.to_xml('output.xml', attr_cols=['id', 'first_name', 'last_name', 'email',
'gender', 'ip_address'],namespaces={"soap": "http://example1.com", "wsse":
"http://docs.example2.org/wss/"}, prefix="soap")
I am getteing the following output
<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
<soap:row soap:index="0" soap:id="1" soap:first_name="Erika" soap:last_name="Pickless" soap:email="epickless0@twitter.com" soap:gender="Female" soap:ip_address="133.146.172.220"/>
<soap:row soap:index="1" soap:id="2" soap:first_name="Victoir" soap:last_name="Stodd" soap:email="vstodd1@foxnews.com" soap:gender="Male" soap:ip_address="111.125.132.0"/>
<soap:row soap:index="2" soap:id="3" soap:first_name="Becky" soap:last_name="Berrisford" soap:email="bberrisford2@shutterfly.com" soap:gender="Female" soap:ip_address="200.62.247.237"/>
</soap:data>
The above output is almost correct but, I want little different output: Issue: The prefix "soap" is attached to every attribute in the above output.
But I want following output:
<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
<soap:row index="0" id="1" first_name="Erika" last_name="Pickless" email="epickless0@twitter.com" gender="Female" ip_address="133.146.172.220"/>
<soap:row index="1" id="2" first_name="Victoir" last_name="Stodd" email="vstodd1@foxnews.com" gender="Male" ip_address="111.125.132.0"/>
<soap:row index="2" id="3" first_name="Becky" last_name="Berrisford" email="bberrisford2@shutterfly.com" gender="Female" ip_address="200.62.247.237"/>
</soap:data>
Since XML is an open-ended design standard and Pandas cannot support all possible output specifications with default arguments, you need a customized solution.
Therefore consider running XSLT, the special-purpose language designed to transform XML files, which is supported in DataFrame.to_xml()
using the default lxml
parser and stylesheet
argument. Below XSLT will run after your raw output to remove namespace prefixes from all attributes:
XSLT (save as .xsl script, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<!-- REMOVE NAMESPACE PREFIXES FROM ALL ATTRIBUTES -->
<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Python
RawData.to_xml(
"output.xml",
attr_cols=["id", "first_name", "last_name", "email", "gender", "ip_address"],
namespaces={"soap": "http://example1.com", "wsse": "http://docs.example2.org/wss/"},
prefix="soap",
stylesheet="style.xsl"
)