pythonpandasvaex

python pandas - is there any faster way to do explode operation according to the requirement


The code is as following the input dataframe is enter image description here

import pandas as pd 
import numpy as np
df = pd.DataFrame([('bird', 'Falconiformes', 2),
                      ('bird', 'Psittaciformes', 4),
                      ('mammal', 'Carnivora', 8),
                     ('mammal', 'Primates', np.nan),
                    ('mammal', 'Carnivora', 12)],
                     index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
                    columns=('class', 'order', 'count'))

unique_class= df['class'].unique().tolist()
temp_list = []
for i in range(len(unique_class)):         
    temp =df[df['class']==unique_class[i]].reset_index(drop=True)
    pipe_values= temp['order'].values.tolist()
    count_values=temp['count'].values.tolist()
    Stri= "|".join(pipe_values)
    for j in range(len(pipe_values)):
        df1=temp[:1]
        df1['order']=  Stri         
        df1['count'+str(j)]=count_values[j]
        
    
    temp_list.append(df1)
    final = pd.concat(temp_list)
    final

the output is enter image description here

The problem is when huge data come - around 1 million - the process is slow, is there any other logic or inbuild function in pandas that we can do, also how will I do that using vaex library thanks


Solution

  • You could do it within Pandas (not sure why you need to combine the data this way):

    Groupby on class, convert everything to string, and aggregate with python's str.join:

    temp = df.set_index('class').astype(str).groupby(level=[0]).agg("|".join)
    

    Split count into individual columns and recombine with temp:

     anoda = (temp['count'].str.split("|", expand=True)
                           .astype(float)
                           .add_prefix('count')
              )
    
    temp.join(anoda).drop(columns='count')
    
    
                                   order  count0  count1  count2
    class                                                       
    bird    Falconiformes|Psittaciformes     2.0     4.0     NaN
    mammal  Carnivora|Primates|Carnivora     8.0     NaN    12.0