I am trying to transpose 2 columns into 4 rows, but it's been extremely tricky as there is duplicate lines, making pivoting difficult.
What I have is this data as df
:
Device Serial Property Value
0 Computer2A ABCSFT2 Display Name Microsoft Edge
1 Computer2A ABCSFT2 Install Date 2/22/2025
2 Computer2A ABCSFT2 Publisher Microsoft
3 Computer2A ABCSFT2 Version 133.0.3065.92
4 Computer2A ABCSFT2 Display Name MSVC++ 2022 X64
5 Computer2A ABCSFT2 Install Date 8/13/2024
6 Computer2A ABCSFT2 Publisher Microsoft
7 Computer2A ABCSFT2 Version 14.40.33810
8 Computer2A ABCSFT2 Display Name MSVC++ 2022 X86
9 Computer2A ABCSFT2 Install Date 8/13/2024
10 Computer2A ABCSFT2 Publisher Microsoft
11 Computer2A ABCSFT2 Version 14.40.33810
[...]
What I want is this:
Device Serial Display Name Install Date Publisher Version
Computer2A ABCSFT2 Microsoft Edge 2/22/2025 Microsoft 133.0.3065.92
Computer2A ABCSFT2 MSVC++ 2022 X64 8/13/2024 Microsoft 14.40.33810
Computer2A ABCSFT2 MSVC++ 2022 X86 8/13/2024 Microsoft 14.40.33810
The closest I've been able to come up with is using categoricals and unstacking, but it doesn't produce wanted results.
Code with df
already defined (see here for origin):
cats = ['Display Name','Install Date','Publisher','Version']
df['Property'] = pd.Categorical(df['Property'], categories=cats, ordered=True)
df = (df.set_index(['Device','Serial','Property'], append=True)['Value']
.unstack()
.reset_index()
.rename_axis(None, axis=1)
.sort_values(['Device','Serial']))
What is produced:
Device Serial Display Name Install Date Publisher Version
12510 Computer01 STWF2G3 Microsoft Edge NaN NaN NaN
12511 Computer01 STWF2G3 NaN 2/22/2025 NaN NaN
12512 Computer01 STWF2G3 NaN NaN Microsoft NaN
12513 Computer01 STWF2G3 NaN NaN NaN 133.0.3065.92
12514 Computer01 STWF2G3 MSVC++ 2022 X64 NaN NaN NaN
12515 Computer01 STWF2G3 NaN 8/13/2024 NaN NaN
12516 Computer01 STWF2G3 NaN NaN Microsoft NaN
12517 Computer01 STWF2G3 NaN NaN NaN 14.40.33810
12518 Computer01 STWF2G3 MSVC++ 2022 X86 NaN NaN NaN
12519 Computer01 STWF2G3 NaN 8/13/2024 NaN NaN
12520 Computer01 STWF2G3 NaN NaN Microsoft NaN
12521 Computer01 STWF2G3 NaN NaN NaN 14.40.33810
[...]
What am I missing here? Does anyone know?
I've tried pivot_table() and melt(), but they aren't viable options because of various reasons. Pivoting seems to be designed for numbers, not text.
I have a feeling the issue lies with using Append=True, but I have duplicate identifiers with the left two columns, so I often get this error using many recommended code I've found, and I have no idea how to work with the data:
ValueError: Index contains duplicate entries, cannot reshape
Any help would be appreciated. Thanks.
De-duplicate with groupby.cumcount
, then pivot
:
index = ['Device', 'Serial']
cols = ['Property']
values = 'Value'
(df.assign(n=df.groupby(index+cols).cumcount())
.pivot(index=index+['n'], columns=cols, values=values)
.reset_index(index).rename_axis(index=None, columns=None)
)
Output:
Device Serial Display Name Install Date Publisher Version
0 Computer2A ABCSFT2 Microsoft Edge 2/22/2025 Microsoft 133.0.3065.92
1 Computer2A ABCSFT2 MSVC++ 2022 X64 8/13/2024 Microsoft 14.40.33810
2 Computer2A ABCSFT2 MSVC++ 2022 X86 8/13/2024 Microsoft 14.40.33810