pandastranspose

Way to transpose a Pandas DataFraa repeated blocks of 4 rows with values in the same column, with dupes


I am trying to transpose 2 columns into 4 rows, but it's been extremely tricky as there is duplicate lines, making pivoting difficult.

What I have is this data as df:

   Device      Serial   Property      Value
0  Computer2A  ABCSFT2  Display Name  Microsoft Edge
1  Computer2A  ABCSFT2  Install Date  2/22/2025
2  Computer2A  ABCSFT2  Publisher     Microsoft
3  Computer2A  ABCSFT2  Version       133.0.3065.92
4  Computer2A  ABCSFT2  Display Name  MSVC++ 2022 X64
5  Computer2A  ABCSFT2  Install Date  8/13/2024
6  Computer2A  ABCSFT2  Publisher     Microsoft
7  Computer2A  ABCSFT2  Version       14.40.33810
8  Computer2A  ABCSFT2  Display Name  MSVC++ 2022 X86
9  Computer2A  ABCSFT2  Install Date  8/13/2024
10 Computer2A  ABCSFT2  Publisher     Microsoft
11 Computer2A  ABCSFT2  Version       14.40.33810
[...]

What I want is this:

Device      Serial   Display Name     Install Date  Publisher  Version
Computer2A  ABCSFT2  Microsoft Edge   2/22/2025     Microsoft  133.0.3065.92
Computer2A  ABCSFT2  MSVC++ 2022 X64  8/13/2024     Microsoft  14.40.33810
Computer2A  ABCSFT2  MSVC++ 2022 X86  8/13/2024     Microsoft  14.40.33810

The closest I've been able to come up with is using categoricals and unstacking, but it doesn't produce wanted results.

Code with df already defined (see here for origin):

cats = ['Display Name','Install Date','Publisher','Version']
df['Property'] = pd.Categorical(df['Property'], categories=cats, ordered=True)

df = (df.set_index(['Device','Serial','Property'], append=True)['Value']
       .unstack()
       .reset_index()
       .rename_axis(None, axis=1)
       .sort_values(['Device','Serial']))

What is produced:

       Device      Serial   Display Name     Install Date  Publisher  Version
12510  Computer01  STWF2G3  Microsoft Edge   NaN           NaN        NaN
12511  Computer01  STWF2G3  NaN              2/22/2025     NaN        NaN
12512  Computer01  STWF2G3  NaN              NaN           Microsoft  NaN
12513  Computer01  STWF2G3  NaN              NaN           NaN        133.0.3065.92
12514  Computer01  STWF2G3  MSVC++ 2022 X64  NaN           NaN        NaN
12515  Computer01  STWF2G3  NaN              8/13/2024     NaN        NaN
12516  Computer01  STWF2G3  NaN              NaN           Microsoft  NaN
12517  Computer01  STWF2G3  NaN              NaN           NaN        14.40.33810
12518  Computer01  STWF2G3  MSVC++ 2022 X86  NaN           NaN        NaN
12519  Computer01  STWF2G3  NaN              8/13/2024     NaN        NaN
12520  Computer01  STWF2G3  NaN              NaN           Microsoft  NaN
12521  Computer01  STWF2G3  NaN              NaN           NaN        14.40.33810
[...]

What am I missing here? Does anyone know?

I've tried pivot_table() and melt(), but they aren't viable options because of various reasons. Pivoting seems to be designed for numbers, not text.

I have a feeling the issue lies with using Append=True, but I have duplicate identifiers with the left two columns, so I often get this error using many recommended code I've found, and I have no idea how to work with the data:

ValueError: Index contains duplicate entries, cannot reshape

Any help would be appreciated. Thanks.


Solution

  • De-duplicate with groupby.cumcount, then pivot:

    index = ['Device', 'Serial']
    cols = ['Property']
    values = 'Value'
    
    (df.assign(n=df.groupby(index+cols).cumcount())
       .pivot(index=index+['n'], columns=cols, values=values)
       .reset_index(index).rename_axis(index=None, columns=None)
    )
    

    Output:

           Device   Serial     Display Name Install Date  Publisher        Version
    0  Computer2A  ABCSFT2   Microsoft Edge    2/22/2025  Microsoft  133.0.3065.92
    1  Computer2A  ABCSFT2  MSVC++ 2022 X64    8/13/2024  Microsoft    14.40.33810
    2  Computer2A  ABCSFT2  MSVC++ 2022 X86    8/13/2024  Microsoft    14.40.33810