pythonxmlxml.etree

In a xml file, how to get a tag that contains segmentation points which is placed after the key tag(not in it)


I have a xml file that contaions segmentation points but I dont know how to get them. It not well builded I guess because the points stands in a tag after a tag that contains "points_px" string. (It is not in the "point_px" tag.) My question is how to get the tags that contains the points with most efficient way?

This is what I use to get the segs now.

import xml.etree.ElementTree as ET

class XML_files:
    # other codes
    def get_points(self):
        anns = self.xml[0][1][0][5].iter() # self.xml carries the info
        segs = []
        a = -2
        for i,x in enumerate(anns):
            if x.text == "Point_px":
                a = i
            if a+1 == i:
                segs.append([a.text for a in x.findall("string")])
        segs = [[[int(float(value)) for value in tuples.strip("()").split(", ")] for tuples in part_cord] for part_cord in segs]
        
        return segs

This is how the files look like

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Images</key>
    <array>
        <dict>
            "other tags"
            <array>
                <dict>
                                        "other tags"
                    <key>Point_px</key>
                    <array>
                        <string>(468.612000, 2109.979980)</string>
                    </array>
                    "other tags"
                </dict>
                <dict>
                    "other tags"
                    <key>Point_px</key>
                    <array>
                        <string>(932.369019, 2154.489990)</string>
                        <string>(935.320984, 2151.000000)</string>
                        <string>(940.689026, 2149.389893)</string>
                        <string>(945.788025, 2149.659912)</string>
                        <string>(949.544983, 2151.810059)</string>
                        <string>(952.228027, 2154.219971)</string>
                        <string>(954.911987, 2158.520020)</string>
                        <string>(954.911987, 2162.540039)</string>
                        <string>(953.570007, 2167.100098)</string>
                        <string>(951.422974, 2170.590088)</string>
                        <string>(947.129028, 2173.540039)</string>
                        <string>(943.104004, 2173.810059)</string>
                        <string>(938.809998, 2173.280029)</string>
                        <string>(934.784973, 2171.669922)</string>
                        <string>(932.638000, 2167.909912)</string>
                        <string>(931.296021, 2164.149902)</string>
                        <string>(931.026978, 2159.320068)</string>
                    </array>
                    "other tags"
                </dict>
                <dict>
                    "other tags"
                    <key>Point_px</key>
                    <array>
                        <string>(1347.459961, 1894.459961)</string>
                    </array>
                    "other tags"
                </dict>
            </array>
        </dict>
    </array>
</dict>
</plist>

Expected output is a list like below

[[[468.612000, 2109.979980]],
[[932.369019, 2154.489990],
[935.320984, 2151.000000],
[940.689026, 2149.389893],
[945.788025, 2149.659912],
[949.544983, 2151.810059],
[952.228027, 2154.219971],
[954.911987, 2158.520020],
[954.911987, 2162.540039],
[953.570007, 2167.100098],
[951.422974, 2170.590088],
[947.129028, 2173.540039],
[943.104004, 2173.810059],
[938.809998, 2173.280029],
[934.784973, 2171.669922],
[932.638000, 2167.909912],
[931.296021, 2164.149902],
[931.026978, 2159.320068]],
[[1347.459961, 1894.459961]]]

Solution

  • For your xml structure, you are better off using lxml because of its better xpath support compared to that of ElementTree.

    Also, note that the xml in your question isn't well formed (because the <plist> element is never opened).

    Assuming that's fixed, try this:

    from lxml import etree
    images = """[your xml above, fixed]"""
    segs= []
    for d in doc.xpath('//dict[./key[.="Point_px"]]'):
        sg = d.xpath('.//array/string/text()')
        segs.append([s.strip('()') for s in sg])
    segs
    

    The output should be your expected output.