pythonxmltocsv

I am learning to work with xml files and parsing and saving them in csv format


I am using the following code to parse a xml file write to csv, however all I am able to get a csv file created with no columns and rows.

import csv
import xml.etree.ElementTree as ET

def xml_to_csv(xml_file, csv_file):
    # Parse the XML file
    tree = ET.parse(xml_file)
    root = tree.getroot()
    # Create the CSV file and write the header row
    with open(csv_file, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        header = [child.tag for child in root[0]]
        writer.writerow(header)
        # Write the data rows
        for record in root:
            row = [record.find(header[i]).text for i in range(len(header))]
            writer.writerow(row)

xml_to_csv("Badges.xml", "Badges.csv")

This a sample of the xml file I am using

<?xml version="1.0" encoding="utf-8"?>
<badges>
  <row Id="1" UserId="4" Name="Analytical" Date="2011-09-13T20:36:20.830" Class="3" TagBased="False" />
  <row Id="2" UserId="2" Name="Autobiographer" Date="2011-09-13T20:36:20.877" Class="3" TagBased="False" />
  <row Id="3" UserId="3" Name="Autobiographer" Date="2011-09-13T20:36:20.877" Class="3" TagBased="False" />
  <row Id="4" UserId="6" Name="Autobiographer" Date="2011-09-13T20:36:20.877" Class="3" TagBased="False" />
  <row Id="5" UserId="3" Name="Analytical" Date="2011-09-13T20:41:19.543" Class="3" TagBased="False" />
  <row Id="6" UserId="9" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="7" UserId="11" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="8" UserId="14" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="9" UserId="15" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="10" UserId="17" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="11" UserId="19" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="12" UserId="20" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="13" UserId="22" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="14" UserId="25" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="15" UserId="29" Name="Autobiographer" Date="2011-09-13T20:56:19.377" Class="3" TagBased="False" />
  <row Id="16" UserId="9" Name="Student" Date="2011-09-13T20:56:20.097" Class="3" TagBased="False" />
  <row Id="17" UserId="26" Name="Supporter" Date="2011-09-13T20:56:20.110" Class="3" TagBased="False" />
  <row Id="18" UserId="14" Name="Analytical" Date="2011-09-13T21:01:19.337" Class="3" TagBased="False" />
  <row Id="19" UserId="30" Name="Autobiographer" Date="2011-09-13T21:01:19.353" Class="3" TagBased="False" />
  <row Id="20" UserId="35" Name="Autobiographer" Date="2011-09-13T21:01:19.353" Class="3" TagBased="False" />
  <row Id="21" UserId="8" Name="Supporter" Date="2011-09-13T21:01:19.680" Class="3" TagBased="False" />
  <row Id="22" UserId="49" Name="Autobiographer" Date="2011-09-13T21:11:33.813" Class="3" TagBased="False" />
  <row Id="23" UserId="43" Name="Supporter" Date="2011-09-13T21:16:34.260" Class="3" TagBased="False" />
  <row Id="24" UserId="54" Name="Autobiographer" Date="2011-09-13T21:21:33.843" Class="3" TagBased="False" />
  <row Id="25" UserId="8" Name="Student" Date="2011-09-13T21:21:34.360" Class="3" TagBased="False" />
  <row Id="26" UserId="43" Name="Student" Date="2011-09-13T21:21:34.360" Class="3" TagBased="False" />
  <row Id="27" UserId="4" Name="Supporter" Date="2011-09-13T21:21:34.377" Class="3" TagBased="False" />
  <row Id="28" UserId="39" Name="Supporter" Date="2011-09-13T21:21:34.377" Class="3" TagBased="False" />
  <row Id="29" UserId="44" Name="Teacher" Date="2011-09-13T21:21:34.437" Class="3" TagBased="False" />
  <row Id="30" UserId="56" Name="Autobiographer" Date="2011-09-13T21:26:34.850" Class="3" TagBased="False" />
  <row Id="31" UserId="28" Name="Editor" Date="2011-09-13T21:26:34.943" Class="3" TagBased="False" />
  <row Id="32" UserId="39" Name="Editor" Date="2011-09-13T21:26:34.943" Class="3" TagBased="False" />
  <row Id="33" UserId="20" Name="Student" Date="2011-09-13T21:26:35.273" Class="3" TagBased="False" />
  <row Id="34" UserId="31" Name="Autobiographer" Date="2011-09-13T21:31:34.907" Class="3" TagBased="False" />
  <row Id="35" UserId="53" Name="Critic" Date="2011-09-13T21:31:34.967" Class="3" TagBased="False" />
  <row Id="36" UserId="43" Name="Editor" Date="2011-09-13T21:31:34.983" Class="3" TagBased="False" />
  <row Id="37" UserId="10" Name="Student" Date="2011-09-13T21:31:35.217" Class="3" TagBased="False" />
  <row Id="38" UserId="28" Name="Student" Date="2011-09-13T21:31:35.217" Class="3" TagBased="False" />
  <row Id="39" UserId="39" Name="Student" Date="2011-09-13T21:31:35.217" Class="3" TagBased="False" />
  <row Id="40" UserId="52" Name="Supporter" Date="2011-09-13T21:31:35.247" Class="3" TagBased="False" />
</badges>

Can I get some help in where my code is going wrong or my interpretation of the xml file structure is going wrong. Thanks.


Solution

  • The interactive console can be helpful here. If you look at your root variable in the console you can see that, for your data file, each tag is one of your rows and thus each root element (root[index]) contains a dictionary with the values, for each row, that you wish to write out to a CSV:

    >>> root[0].tag
    'row'
    
    >>> root[0].attrib
    {'Id': '1', 'UserId': '4', 'Name': 'Analytical', 'Date': '2011-09-13T20:36:20.830', 'Class': '3', 'TagBased': 'False'}
    
    >>> root[0].items()
    [('Id', '1'), ('UserId', '4'), ('Name', 'Analytical'), ('Date', '2011-09-13T20:36:20.830'), ('Class', '3'), ('TagBased', 'False')]
    
    ...
    
    >>> root[1].items()
    [('Id', '2'), ('UserId', '2'), ('Name', 'Autobiographer'), ('Date', '2011-09-13T20:36:20.877'), ('Class', '3'), ('TagBased', 'False')]
    

    As the items method returns a list of tuples you could do something like this to get the line of values you desire:

    line = [value[1] for value in root[0].items()]
    

    which returns:

    ['1', '4', 'Analytical', '2011-09-13T20:36:20.830', '3', 'False']
    

    So for the whole file you could do something like:

    for row in root:
        line = [value[1] for value in row.items()]
        print(line)
    

    which returns:

    ['1', '4', 'Analytical', '2011-09-13T20:36:20.830', '3', 'False']
    ['2', '2', 'Autobiographer', '2011-09-13T20:36:20.877', '3', 'False']
    ['3', '3', 'Autobiographer', '2011-09-13T20:36:20.877', '3', 'False']
    ['4', '6', 'Autobiographer', '2011-09-13T20:36:20.877', '3', 'False']
    ['5', '3', 'Analytical', '2011-09-13T20:41:19.543', '3', 'False']
    ...
    ['36', '43', 'Editor', '2011-09-13T21:31:34.983', '3', 'False']
    ['37', '10', 'Student', '2011-09-13T21:31:35.217', '3', 'False']
    ['38', '28', 'Student', '2011-09-13T21:31:35.217', '3', 'False']
    ['39', '39', 'Student', '2011-09-13T21:31:35.217', '3', 'False']
    ['40', '52', 'Supporter', '2011-09-13T21:31:35.247', '3', 'False']
    

    So you can see how you can use this loop above to write to your CSV.

    You can get your header for the CSV by using the method keys:

    >>> root[0].keys()
    ['Id', 'UserId', 'Name', 'Date', 'Class', 'TagBased']