I have a 2D map where each pixel contains a spectrum. I want to convert the data from this format:
X Y Wave Intensity
-34727.180000 -4204.820000 1.484622 139.193512
-34727.180000 -4204.820000 1.484043 120.991280
-34727.180000 -4204.820000 1.483465 125.905304
-34726.180000 -4204.820000 1.483465 131.262970
-34726.180000 -4204.820000 1.482887 122.784081
-34726.180000 -4204.820000 1.482309 129.853088
-34725.180000 -4204.820000 1.483465 129.655670
-34725.180000 -4204.820000 1.482887 119.567032
-34725.180000 -4204.820000 1.482309 126.097000
-34727.180000 -4203.820000 1.463490 124.331985
-34727.180000 -4203.820000 1.462927 138.189377
-34727.180000 -4203.820000 1.462364 127.824867
to a dictionary where the keys are a tuple of the X,Y coordinate, and the values are a 3-by-2 numpy array. For example:
DICT = {
(-34727.180000, -4204.820000): [[1.484622, 139.193512], [1.484043, 120.991280], [1.483465, 125.905304]],
(-34726.180000, -4204.820000): [[1.482887, 122.784081], [1.482887, 122.784081], [1.482309, 129.853088]],
(-34725.180000, -4204.820000): [[1.483465, 129.655670], [1.482887, 119.567032], [1.482309, 126.097000]],
(-34727.180000, -4203.820000): [[1.463490, 124.331985], [1.462927, 138.189377], [1.462927, 138.189377]]}
This example is simplified; my actual map contains more than four pixels (X,Y coordinates), and there are 512 Wave-Intensity pairs for each coordinate. I hope the solution can be generalized from a four pixel map to a 400 pixel map, and each array from a 3-by-2 numpy array to an 512-by-2 numpy array.
The ultimate goal is to take the Wave-Intensity pairs for each coordinate, fit them to a Gaussian distribution, find the (maximum) amplitude for this distribution, and plot that maximum for each X,Y coordinate. This part of the problem does not need to be included in a solution, but if someone adds a solution to this part of the problem, that would be excellent!
I am open to approaches that do not involve a dictionary (e.g. a 4D numpy array), but at the moment I cannot see another way. Feel free to recommend an alternate approach. Currently, I am importing the data in its original format using pandas
:
import pandas as pd
IN_PATH = r'PATH_TO_FILE'
FNAME = r'\FILENAME.txt'
data = pd.read_csv(IN_PATH+FNAME, sep='\t', skiprows=1)
data.columns = ["X", "Y", "Wave", "Intensity"]
Thanks in advance!
First pandas.DataFrame.set_index
the coordinates, pandas.DataFrame.agg
list
along axis=1
, pandas.DataFrame.groupby
the indices, then pandas.DataFrame.groupby.agg
to list, and convert pandas.Series.to_dict
:
>>> df.set_index(['X', 'Y']).agg(list, 1).groupby(level=(0,1)).agg(list).to_dict()
{(-34727.18, -4204.82): [[1.484622, 139.193512],
[1.484043, 120.99128],
[1.483465, 125.905304]],
(-34727.18, -4203.82): [[1.46349, 124.331985],
[1.462927, 138.189377],
[1.462364, 127.824867]],
(-34726.18, -4204.82): [[1.483465, 131.26297],
[1.482887, 122.784081],
[1.482309, 129.853088]],
(-34725.18, -4204.82): [[1.483465, 129.65567],
[1.482887, 119.567032],
[1.482309, 126.097]]}
This will give the results in list
, if you want array, you can pandas.Series.transform
to numpy.array
:
>>> df.set_index(['X', 'Y']).agg(list, 1).groupby(level=(0,1)).agg(list).transform(np.array).to_dict()
{(-34727.18, -4204.82): array([[ 1.484622, 139.193512],
[ 1.484043, 120.99128 ],
[ 1.483465, 125.905304]]),
(-34727.18, -4203.82): array([[ 1.46349 , 124.331985],
[ 1.462927, 138.189377],
[ 1.462364, 127.824867]]),
(-34726.18, -4204.82): array([[ 1.483465, 131.26297 ],
[ 1.482887, 122.784081],
[ 1.482309, 129.853088]]),
(-34725.18, -4204.82): array([[ 1.483465, 129.65567 ],
[ 1.482887, 119.567032],
[ 1.482309, 126.097 ]])}