pandasduplicatesmultiple-columnsreshapelreshape

Rearranging dataframe dimensions


I have a df with lots of columns :

   date  risk lev   chemical  weight    date    risk lev   chemical  weight
 15-5-16   5         Potasium   5mg    15-5-16      3       Sodium     7 mg 
 14-5-16   6         Potasium   10mg   14-5-16      2       Sodium     2 mg  

I would like to re-arrange it so that each 4 columns breaks into a new row , such as the df is like this:

   date  risk lev   chemical  weight
 15-5-16      5      Potasium   5mg   
 15-5-16      3       Sodium    7mg  
 14-5-16      6      Potasium   10mg   
 14-5-16      2       Sodium    2mg

Sorry that I do not include my try but is the first time I counter this issue and not sure how to proceed


Solution

  • First remove duplicates with column names and then use pd.lreshape with dict created by dict comprehension:

    s = df.columns.to_series()
    df.columns = s.add(s.groupby(s).cumcount().astype(str))
    print (df)
         date0  risk lev0 chemical0 weight0    date1  risk lev1 chemical1 weight1
    0  15-5-16          5  Potasium     5mg  15-5-16          3    Sodium     7mg
    1  14-5-16          6  Potasium    10mg  14-5-16          2    Sodium     2mg
    
    
    cols = ['date','risk lev','chemical','weight']
    d = {x:df.columns[df.columns.str.startswith(x)].tolist() for x in cols}
    print (d)
    {'date': ['date0', 'date1'], 
     'weight': ['weight0', 'weight1'], 
     'risk lev': ['risk lev0', 'risk lev1'], 
     'chemical': ['chemical0', 'chemical1']}
    
    df = pd.lreshape(df, d)
    print (df)
          date weight  risk lev  chemical
    0  15-5-16    5mg         5  Potasium
    1  14-5-16   10mg         6  Potasium
    2  15-5-16    7mg         3    Sodium
    3  14-5-16    2mg         2    Sodium