I am trying to read a text file in a specific format and extract coordinates from them and store them in an ordered dict. One set in the text file consists of a title line followed by x and y coordinates. The x, y coordinates always start with .
followed by \t
(tab). One text file contains multiple such sets. My idea is to extract each of the sets' x and y into a list and append this to an ordered dict. Basically, in the end, it will be a list of lists with the number of lists being equal to the number of sets which will be appended to the ordered dict.
An illustration of how the text file looks like:
Freehand green 2 2 0,0 289618 .
. 104326.2,38323.8 104309.6,38307.2 104286.3,38287.3 104269.6,38270.6 104256.3,38254.0
. 104239.7,38237.4 104223.0,38220.7 104209.7,38204.1 104193.1,38194.1 104176.4,38187.5
Freehand green 2 3 0,0 63980 .
. 99803.4,37296.2 99826.7,37306.2 99843.3,37312.8 99860.0,37316.2 99876.6,37322.8
My code:
from collections import OrderedDict
import re
dict_roi = OrderedDict([
("title", []),
("X", []),
("Y", []) ])
with open(elements_file,"r") as f:
try:
# pattern to match to get coordinates
pattern = re.compile(".\t\d+.*")
# loop through lines and find title line and line with coordinates
for i, line in enumerate(f):
# get title line
if line.startswith('Freehand'):
dict_roi['title'].append(line)
# initiate empty list per set
XX = []
YY = []
# line with coordinates starts with .\t
# if pattern matches and line starts with .\t, get the coordinates
for match in re.finditer(pattern, line):
if line.startswith('.\t'):
nln = "{}".format(line[2:].strip())
val = nln.split('{:6.1f}')
# data-massaging to get to the coordinates
for v in val:
coordinates_list = v.split("\t")
for c in coordinates_list:
x, y = c.split(',')
print(x, y)
XX.append(float(x))
YY.append(float(y))
# this should append one list per set
dict_roi['X'].append(XX)
dict_roi['Y'].append(YY)
except ValueError:
print("Exiting")
print(dict_roi)
Ideally, I would like to have an ordered dict which would give me something like:
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[99803.4, 99826.7, 99843.3, 99860.0, 99876.6]])
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])
But my output looks like this:
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4],
[99803.4, 99826.7, 99843.3, 99860.0, 99876.6]])
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5],
[37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])
I get multiple copies of the list from the each of the set. For example, here the X
and Y
lists are duplicated from the first set. Probably it is something to do with clearing the lists after appending, or placement of the empty lists XX
and YY
. But I have tried multiple times with multiple variations and seem to get the output as above or a list per line instead of list per set in the ordered dict.
Does anyone have any idea how to format this code in a way that I get the output as mentioned in the ideal case?
I simplified it slightly by not using a regular expression.
Instead, for each line the coordinates are stored in a list named coords
.
Each x will have an even index, and y will be odd.
Thus, slicing this list will give you your XX
and YY
.
from collections import OrderedDict
input_text = '''Freehand green 2 2 0,0 289618 .
. 104326.2,38323.8 104309.6,38307.2 104286.3,38287.3 104269.6,38270.6 104256.3,38254.0
. 104239.7,38237.4 104223.0,38220.7 104209.7,38204.1 104193.1,38194.1 104176.4,38187.5
Freehand green 2 3 0,0 63980 .
. 99803.4,37296.2 99826.7,37306.2 99843.3,37312.8 99860.0,37316.2 99876.6,37322.8'''
dict_roi = OrderedDict([('title', []),
('X', []),
('Y', [])])
lines = input_text.split('\n')
Xs = []
Ys = []
for i, line in enumerate(lines):
# When a line contains a tile
if line.startswith('Freehand'):
dict_roi['title'].append(line)
if Xs and Ys:
dict_roi['X'].append(Xs)
dict_roi['Y'].append(Ys)
Xs = []
Ys = []
# When a line is empty
elif not line:
continue
# When a line contains coordinates
else:
line = line.replace('\n', '')
line = line.replace('\t', ',')
line = line.replace(' ', ',')
coords = line.split(',')
coords = [e for e in coords if e != '.' and e]
coords = [float(c) for c in coords]
# Xs are even, Ys are odd
Xs += coords[0:: 2]
Ys += coords[1:: 2]
dict_roi['X'].append(Xs)
dict_roi['Y'].append(Ys)
print(dict_roi)
Output:
[('title', ['Freehand green 2 2 0,0 289618 . ', 'Freehand green 2 3 0,0 63980 . ']),
('X', [[104326.2, 104309.6, 104286.3, 104269.6, 104256.3, 104239.7, 104223.0, 104209.7, 104193.1, 104176.4], [99803.4, 99826.7, 99843.3, 99860.0, 99876.6]]),
('Y', [[38323.8, 38307.2, 38287.3, 38270.6, 38254.0, 38237.4, 38220.7, 38204.1, 38194.1, 38187.5], [37296.2, 37306.2, 37312.8, 37316.2, 37322.8]])])