pythonarrayspandasdataframerms

Calculate RMS, Count, SUM to array inside all columns of pandas dataframe


I would like to calculate RMS, Count, SUM to array inside all columns of pandas dataframe and then fulfill outputs into new three dataframes as shown below

enter image description here

P.S > solution should deal with N numbers of columns, in my case, I have around 300 columns x,y,z,a,b,c ....... etc ...... N

ID x y z ….. EF407412 [471, 1084, 1360, 2284] [1408, 1572, 2277] [977, 1003, 1493, 1519, 1650, 1676, 2804] ….. KM043272 [2173] [1293, 2354,] [1200] ….. KM043273 ….. ….. ….. ….. ….. ….. ….. ….. …..

Dataframe_RMS
ID x y z ….. EF407412 1454.5749 1792.4263 1685.6893 ….. KM043272 2173 1899.1004 1200 ….. KM043273 ….. ….. ….. ….. ….. ….. ….. ….. …..

Dataframe_Count
ID x y z ….. EF407412 4 3 7 ….. KM043272 1 2 1 ….. KM043273 ….. ….. ….. ….. ….. ….. ….. ….. …..


Solution

  • Updating answer as per the OP's comment - for any number of Columns Check Below code:

    import pandas as pd 
    from ast import literal_eval
    import numpy as np 
    
    df = pd.DataFrame({'ID':['EF407412','KM043272']
                       , 'x': ['[471, 1084, 1360, 2284]','[2173]']
                       , 'y': ['[1408, 1572, 2277]','[1293, 2354,]']
                       , 'z': ['[977, 1003, 1493, 1519, 1650, 1676, 2804]','[1200]']} )
    
    col_num = df.shape[1]
    

    COUNT

    df[[i+"_count" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].\
                                                        format( *(tuple([len(literal_eval(x[col])) for col in df.columns[1:col_num]] ))),axis=1).\
                                                        astype('str').str.split(',', expand=True).values
    
    df[['ID']+([ col for col in df.columns if col.endswith('count')])]
    

    OUTPUT:

    enter image description here

    SUM

    df[[i+"_sum" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].\
                                                        format( *(tuple([sum(literal_eval(x[col])) for col in df.columns[1:col_num]] ))),axis=1).\
                                                        astype('str').str.split(',', expand=True).values
    
    df[['ID']+([ col for col in df.columns if col.endswith('sum')])]
    

    Output:

    enter image description here

    RMS

    df[[i+"_rms" for i in df.columns[1:col_num]]] = df.apply(lambda x: ("{},"*(col_num-1))[:-1].\
                                                        format( *(tuple([np.sqrt(np.mean(np.square(literal_eval(x[col])))) for col in df.columns[1:col_num]] ))),axis=1).\
                                                        astype('str').str.split(',', expand=True).values
    
    df[['ID']+([ col for col in df.columns if col.endswith('rms')])]
    

    Output:

    enter image description here