I'm currently try to take accelerometer data samples and combine them into a single dataframe for use with machine learning. There are 100 entries per sample saved to csv's and each gets loaded in through a for loop and the merged through a for loop. After that my results look something like this:
ID | X | Y | Z | Class |
---|---|---|---|---|
ID 1 | 4 | 2 | 6 | Action 1 |
ID 1 | 4 | 2 | 5 | Action 1 |
ID 2 | 3 | 4 | 4 | Action 2 |
ID 2 | 2 | 2 | 1 | Action 2 |
ID 3 | 1 | 4 | 5 | Action 3 |
ID 3 | 2 | 3 | 2 | Action 3 |
ID 4 | 2 | 1 | 5 | Action 1 |
ID 4 | 2 | 2 | 5 | Action 1 |
I'm wanting to get a wide format table to make the data easier to use with machine learning. I'm trying to get it to look more along the lines of this second table:
ID | X1 | Y1 | Z1 | X2 | Y2 | Z2 | Class |
---|---|---|---|---|---|---|---|
ID 1 | 4 | 2 | 6 | 4 | 2 | 5 | Action 1 |
ID 2 | 3 | 4 | 4 | 2 | 2 | 1 | Action 2 |
ID 3 | 1 | 4 | 5 | 2 | 3 | 2 | Action 3 |
ID 4 | 2 | 1 | 5 | 2 | 2 | 5 | Action 1 |
I've looked into melts, stacking and unstacking, groupby options, and for loops where I try to move everything over. Nothing has gotten me close to what I've been trying to accomplish. Just looking for advice on how to proceed or if there is a better method.
Try:
df['tmp'] = df.groupby('ID').cumcount() + 1
df = df.pivot(index=['ID', 'Class'], columns=['tmp'])
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.reset_index()
print(df)
Prints:
ID Class X1 X2 Y1 Y2 Z1 Z2
0 ID 1 Action 1 4 4 2 2 6 5
1 ID 2 Action 2 3 2 4 2 4 1
2 ID 3 Action 3 1 2 4 3 5 2
3 ID 4 Action 1 2 2 1 2 5 5