I am trying to understand why the code:
import pandas
xml = '''
<ROOT>
<ELEM atr="anything">1</ELEM>
<ELEM atr="anything">2</ELEM>
<ELEM atr="anything">3</ELEM>
<ELEM atr="anything">4</ELEM>
<ELEM atr="anything">5</ELEM>
<ELEM atr="anything">6</ELEM>
<ELEM atr="anything">7</ELEM>
<ELEM atr="anything">8</ELEM>
<ELEM atr="anything">9</ELEM>
<ELEM atr="anything">10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())
... works as expected and prints:
atr ELEM 0 anything 1 1 anything 2 2 anything 3 3 anything 4 4 anything 5 5 anything 6 6 anything 7 7 anything 8 8 anything 9 9 anything 10
Yet the following code:
import pandas
xml = '''
<ROOT>
<ELEM>1</ELEM>
<ELEM>2</ELEM>
<ELEM>3</ELEM>
<ELEM>4</ELEM>
<ELEM>5</ELEM>
<ELEM>6</ELEM>
<ELEM>7</ELEM>
<ELEM>8</ELEM>
<ELEM>9</ELEM>
<ELEM>10</ELEM>
</ROOT>
'''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')
print(df.to_string())
results in the error:
ValueError: xpath does not return any nodes or attributes. Be sure to specify in `xpath` the parent nodes of children and attributes to parse. If document uses namespaces denoted with xmlns, be sure to define namespaces and use them in xpath.
I have read the documentation here: https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html
And also checked my xpath here (code above is just a minimal example, actual XML I use is more complex): https://freeonlineformatter.com/xpath-validator/
In a nutshell I need to read into pandas dataframe a list of XML child elements at a known xpath. Child elements have no attributes but all have text values. I want to get a dataframe with one column containing these valyes. What am I doing wrong?
If you check the documentation, pandas expects the XML to have rows with columns. In your first example, each <ELEM>
is a row, and the atr
is the column. In your second example, there are no columns. If you had <ELEM><VAL>1</VAL></ELEM>
, it should work, because VAL would be the column.
https://pandas.pydata.org/docs/reference/api/pandas.read_xml.html