python-3.xnumpydata-scienceself-organizing-maps

Python Output Conversion


I am using Python 3.6 and I have the output of minisom package in below format

defaultdict(list,{(9,1):[array([0.1,0.3,0.5,0.9]),array([0.2,0.6,0.8,0.9])],(3,2):[array([1,3,5,9]),array([2,6,8,9])] })

and I would like to have my output(Pandas DataFrame) as shown below

X   Y   V1  V2  V3  V4
9   1   0.1 0.3 0.5 0.9
9   1   0.2 0.6 0.8 0.9
3   2   1   3   5   9
3   2   2   6   8   9

I appreciate your help on this.


Solution

  • I would try something like this:

    >>> x
    defaultdict(<class 'list'>, {(9, 1): [array([0.1, 0.3, 0.5, 0.9]), array([0.2, 0.6, 0.8, 0.9])], (3, 2): [array([1, 3, 5, 9]), array([2, 6, 8, 9])]})
    >>> df=pd.DataFrame()
    >>> df[["X", "Y", "V1", "V2", "V3", "V4"]]=pd.DataFrame(pd.DataFrame.from_dict(x, orient="index").stack().reset_index().drop("level_1", axis=1).rename(columns={0: "val"}, inplace=False).apply(lambda x: [el_inner for el in x.values for el_inner in el], axis=1).to_list())
    >>> df
       X  Y   V1   V2   V3   V4
    0  9  1  0.1  0.3  0.5  0.9
    1  9  1  0.2  0.6  0.8  0.9
    2  3  2  1.0  3.0  5.0  9.0
    3  3  2  2.0  6.0  8.0  9.0
    >>> df.dtypes
    X       int64
    Y       int64
    V1    float64
    V2    float64
    V3    float64
    V4    float64
    dtype: object
    

    Alternatively:

    >>> df=pd.DataFrame.from_dict(x, orient="index").stack().reset_index().drop("level_1", axis=1).rename(columns={0: "val"}, inplace=False).apply(lambda x: pd.Series({"x": x.level_0[0], "y": x.level_0[1], "v1": x.val[0], "v2": x.val[1], "v3": x.val[2], "v4": x.val[3]}), axis=1)
    >>> df
         x    y   v1   v2   v3   v4
    0  9.0  1.0  0.1  0.3  0.5  0.9
    1  9.0  1.0  0.2  0.6  0.8  0.9
    2  3.0  2.0  1.0  3.0  5.0  9.0
    3  3.0  2.0  2.0  6.0  8.0  9.0
    >>> df.dtypes
    x     float64
    y     float64
    v1    float64
    v2    float64
    v3    float64
    v4    float64
    dtype: object
    

    If you want to convert x and y to int:

    >>> df[["x", "y"]]=df[["x", "y"]].astype(int)
    >>> df
       x  y   v1   v2   v3   v4
    0  9  1  0.1  0.3  0.5  0.9
    1  9  1  0.2  0.6  0.8  0.9
    2  3  2  1.0  3.0  5.0  9.0
    3  3  2  2.0  6.0  8.0  9.0
    >>> df.dtypes
    x       int32
    y       int32
    v1    float64
    v2    float64
    v3    float64
    v4    float64
    dtype: object