pythonpython-3.xmatrixbiopythonskbio

Square a matrix in python


Hello let say I have a df such as :

G1  G2  VALUE 
SP1 SP2 1
SP1 SP3 2
SP1 SP4 3
SP2 SP3 4
SP2 SP4 5
SP3 SP4 6 

how can I get a the data as square ? (i.e., have the same number of rows and columns)

with something like

data = [[0,  1,  2,  3],
[1,  0, 4, 5],
[9, 10,  0,  8,  7],
[2, 4,  0,  6],
[3,  5,  6,  0]]

ids = ['SP1','SP2','SP3','SP4]

dm = DistanceMatrix(data, ids) (function from skbio package)

and get a matrix :

    SP1 SP2 SP3 SP4
SP1 0   1   2   3
SP2 1   0   4   5
SP3 2   4   0   6
SP4 3   5   6   0

And if some of you are familiar with it, how can we also do the same thing but with a 1/2 matrix :

SP1 0 
SP2 1   0   
SP3 2   4   0   
SP4 3   5   6   0
    SP1 SP2 SP3 SP4

(here is mor for biopython) thanks a lot for your help


Other exemple

d = {'G1': ['SP1','SP2','SP2'], 'G2': ['SP3','SP3','SP1'],'VALUE' :[1,2,3]}
df = pd.DataFrame(data=d)

I should get :

SP1 0
SP2 3   0
SP3 1   2  0
   SP1 SP2 SP3 

and

SP1 0   3  1
SP2 3   0  2
SP3 1   2  0
   SP1 SP2 SP3 

Solution

  • I think this is what you're looking for, more or less:

    In [257]: df
    Out[257]: 
        G1   G2  VALUE
    0  SP1  SP2      1
    1  SP1  SP3      2
    2  SP1  SP4      3
    3  SP2  SP3      4
    4  SP2  SP4      5
    5  SP3  SP4      6
    
    In [258]: df.pivot(index='G1', columns='G2', values='VALUE')
    Out[258]: 
    G2   SP2  SP3  SP4
    G1                
    SP1  1.0  2.0  3.0
    SP2  NaN  4.0  5.0
    SP3  NaN  NaN  6.0
    
    In [259]: df.pivot(index='G1', columns='G2', values='VALUE').fillna(value=0)
    Out[259]: 
    G2   SP2  SP3  SP4
    G1                
    SP1  1.0  2.0  3.0
    SP2  0.0  4.0  5.0
    SP3  0.0  0.0  6.0
    

    In response to the edit to the question:

    In [277]: d = {'G1': ['SP1','SP2','SP2'], 'G2': ['SP3','SP3','SP1'],'VALUE' :[1,2,3]}
    
    In [278]: df = pd.DataFrame(data=d)
    
    In [279]: d = df.pivot(index='G1', columns='G2', values='VALUE').fillna(value=0).to_dict()
    
    In [280]: for s,dd in {**d}.items(): 
         ...:     for t,v in {**dd}.items(): 
         ...:         d.setdefault(t, {})[s] = v 
         ...:
    
    In [281]: d
    Out[281]: 
    {'SP1': {'SP1': 0.0, 'SP2': 3.0, 'SP3': 1.0},
     'SP3': {'SP1': 1.0, 'SP2': 2.0},
     'SP2': {'SP1': 3.0, 'SP3': 2.0}}
    
    In [282]: pd.DataFrame(data=d)
    Out[282]: 
         SP1  SP3  SP2
    SP1  0.0  1.0  3.0
    SP2  3.0  2.0  NaN
    SP3  1.0  NaN  2.0
    
    In [283]: pd.DataFrame(data=d).fillna(value=0)
    Out[283]: 
         SP1  SP3  SP2
    SP1  0.0  1.0  3.0
    SP2  3.0  2.0  0.0
    SP3  1.0  0.0  2.0