I have a xml file that contaions segmentation points but I dont know how to get them. It not well builded I guess because the points stands in a tag after a tag that contains "points_px" string. (It is not in the "point_px" tag.) My question is how to get the tags that contains the points with most efficient way?
This is what I use to get the segs now.
import xml.etree.ElementTree as ET
class XML_files:
# other codes
def get_points(self):
anns = self.xml[0][1][0][5].iter() # self.xml carries the info
segs = []
a = -2
for i,x in enumerate(anns):
if x.text == "Point_px":
a = i
if a+1 == i:
segs.append([a.text for a in x.findall("string")])
segs = [[[int(float(value)) for value in tuples.strip("()").split(", ")] for tuples in part_cord] for part_cord in segs]
return segs
This is how the files look like
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Images</key>
<array>
<dict>
"other tags"
<array>
<dict>
"other tags"
<key>Point_px</key>
<array>
<string>(468.612000, 2109.979980)</string>
</array>
"other tags"
</dict>
<dict>
"other tags"
<key>Point_px</key>
<array>
<string>(932.369019, 2154.489990)</string>
<string>(935.320984, 2151.000000)</string>
<string>(940.689026, 2149.389893)</string>
<string>(945.788025, 2149.659912)</string>
<string>(949.544983, 2151.810059)</string>
<string>(952.228027, 2154.219971)</string>
<string>(954.911987, 2158.520020)</string>
<string>(954.911987, 2162.540039)</string>
<string>(953.570007, 2167.100098)</string>
<string>(951.422974, 2170.590088)</string>
<string>(947.129028, 2173.540039)</string>
<string>(943.104004, 2173.810059)</string>
<string>(938.809998, 2173.280029)</string>
<string>(934.784973, 2171.669922)</string>
<string>(932.638000, 2167.909912)</string>
<string>(931.296021, 2164.149902)</string>
<string>(931.026978, 2159.320068)</string>
</array>
"other tags"
</dict>
<dict>
"other tags"
<key>Point_px</key>
<array>
<string>(1347.459961, 1894.459961)</string>
</array>
"other tags"
</dict>
</array>
</dict>
</array>
</dict>
</plist>
Expected output is a list like below
[[[468.612000, 2109.979980]],
[[932.369019, 2154.489990],
[935.320984, 2151.000000],
[940.689026, 2149.389893],
[945.788025, 2149.659912],
[949.544983, 2151.810059],
[952.228027, 2154.219971],
[954.911987, 2158.520020],
[954.911987, 2162.540039],
[953.570007, 2167.100098],
[951.422974, 2170.590088],
[947.129028, 2173.540039],
[943.104004, 2173.810059],
[938.809998, 2173.280029],
[934.784973, 2171.669922],
[932.638000, 2167.909912],
[931.296021, 2164.149902],
[931.026978, 2159.320068]],
[[1347.459961, 1894.459961]]]
For your xml structure, you are better off using lxml because of its better xpath support compared to that of ElementTree.
Also, note that the xml in your question isn't well formed (because the <plist>
element is never opened).
Assuming that's fixed, try this:
from lxml import etree
images = """[your xml above, fixed]"""
segs= []
for d in doc.xpath('//dict[./key[.="Point_px"]]'):
sg = d.xpath('.//array/string/text()')
segs.append([s.strip('()') for s in sg])
segs
The output should be your expected output.