pythonpandasdataframecolumnsorting

Copy corresponding dataframe column values to new dataframe with fixed column names python


I have a data frame (df1) as shown below. The elements in the 1st column start repeating after a certain number but the corresponding 2nd column elements are unique

    0        1
0   A        Ok
1   B        1234
2   C        True
3   D        Yes
4   A        Ok
5   B        6789
6   C        False
7   D        No 

I have created a new DataFrame(df2) with column names as the elements in the 1st column of df1 without repetition

column_names = df1[0].drop_duplicates()
df2 = pd.DataFrame(columns=column_names)

I want to copy the elements in the 2nd column of df1 under the column names in df2 as shown

    A     B       C      D
0   Ok   1234    True    Yes
1   Ok   6789    False   Yes
2   Ok   1567    False   N/A

Can anyone help me with this?


Solution

  • group the repeating values in the first column with cumcount and then use that in the pivot.

    # create a temp idx, which is cumcount of unique values in rows, and then use that with pivot
    # finally reset index and drop the temporary created column
    
    (df.assign(idx=df.groupby([0]).cumcount())
     .pivot(index='idx', columns=0, values=1)
     .reset_index()
     .drop(columns='idx')
     .sort_index(axis=1)
    )
    
    0    A     B       C     D
    0   Ok  1234    True    Yes
    1   Ok  6789    False   No
    

    Alternately, i modified the provided data and here the order of the column is preserved

    data = {0: {0: 'Z', 1: 'B', 2: 'X', 3: 'D', 4: 'Z', 5: 'B', 6: 'X', 7: 'D'},
            1: {0: 'Ok',  1: '1234',  2: 'True',  3: 'Yes',  4: 'Ok',  5: '6789',  6: 'False',  7: 'No'}}
    df=pd.DataFrame(data)
    
    (df.assign(idx=df.groupby([0]).cumcount())
     .groupby(['idx',0],sort=False )[1].sum().unstack() 
     .reset_index()
     .drop(columns='idx')
    )
    
         Z     B       X     D
    0   Ok  1234    True    Yes
    1   Ok  6789    False   No