pythonxmlcsvxmltocsv

convert xml to csv by python


My friends

In the following code, I try to convert XML (https://issat.ttn.tn/cu/export/akouda.php) to CSV file,

The Code :

import requests
import xml.etree.ElementTree as Xet
import pandas as pd
from html import unescape
url = "https://issat.ttn.tn/cu/export/akouda.php"

s = unescape(requests.get(url).text)[5:-6]

df = pd.read_xml(s, xpath="//phases/* | //time")#
#df["value"] = df["value"].ffill()
df
df.to_csv('output0.csv')

and here some of results :

,value,phases,id,act_energy,react_energy,current_inst,voltage_inst,power_inst,power_fact,thd
0,2022-04-14 15:45:00,,,,,,,,,
1,,,0.0,0.3000000000001819,0.4324445747717669,2.0,241.7,0.27,0.57,27.39
2,,,1.0,0.0,0.0,13.06,242.5,0.66,0.2,22.69
3,,,2.0,0.0,0.0,1.07,243.7,0.15,0.58,48.05
4,2022-04-14 15:30:00,,,,,,,,,
5,,,0.0,0.2999999999999545,0.108885460271677,1.02,240.4,0.23,0.94,23.7
6,,,1.0,0.0,0.0,14.54,241.0,0.86,0.24,23.99
7,,,2.0,0.0,0.0,1.07,243.5,0.15,0.59,48.08
8,2022-04-14 15:15:00,,,,,,,,,
9,,,0.0,0.3999999999998636,0.5618044649492236,0.7,243.1,0.1,0.58,42.46
10,,,1.0,0.0,0.0,17.82,241.9,1.99,0.46,33.59
11,,,2.0,0.0,0.0,1.08,246.3,0.15,0.58,51.09
12,2022-04-14 15:00:00,,,,,,,,,
13,,,0.0,0.6000000000001364,0.8427066974243144,0.71,241.7,0.1,0.58,44.02
14,,,1.0,0.0,0.0,18.74,240.5,2.21,0.49,31.3
15,,,2.0,0.0,0.0,1.08,245.3,0.15,0.58,51.77

I need to:

  1. remove the row like rows ( 0 & 4 & 8 & 12 ) that have date without readings.
  2. get the rows that have id = 1 only.
  3. remove the phases column.

Please, anyone can help?


Solution

  • Consider running two read_xml calls, adjusting xpath and use attrs_only. And because the two will be same level (one <phases> at @id=1 for one <time>), join the result:

    ...
    time_df = pd.read_xml(s, xpath="//time", attrs_only=True, names=["time"])
    phase_df = pd.read_xml(s, xpath="//phase[@id=1]")
    
    time_phase_df = time_df.join(phase_df)
    time_phase_df
                         time  id  act_energy  ...  power_inst  power_fact    thd
    0     2022-04-15 00:00:00   1           0  ...        0.84        0.28  22.35
    1     2022-04-14 23:45:00   1           0  ...        0.83        0.28  23.16
    2     2022-04-14 23:30:00   1           0  ...        0.83        0.28  22.43
    3     2022-04-14 23:15:00   1           0  ...        0.83        0.28  22.56
    4     2022-04-14 23:00:00   1           0  ...        0.82        0.28  22.57
                      ...  ..         ...  ...         ...         ...    ...
    1289  2022-04-01 02:15:00   1           0  ...        0.69        0.25  22.70
    1290  2022-04-01 02:00:00   1           0  ...        0.69        0.25  22.66
    1291  2022-04-01 01:45:00   1           0  ...        0.69        0.25  22.46
    1292  2022-04-01 01:30:00   1           0  ...        0.69        0.25  22.00
    1293  2022-04-01 01:25:00   1           0  ...        0.69        0.25  22.34
    

    And coming soon in Pandas 1.5, read_xml will support parsing dates:

    time_df = pd.read_xml(
        s, xpath="//time", attrs_only=True, names=["time"], parse_dates=["value"]
    )