I have a dataframe that looks like :
df = pd.DataFrame({'qty': [10,7,2,1],
'status 1': [5,2,2,0],
'status 2': [3,2,0,1],
'status 3': [2,3,0,0]
})
Each row has a qty of items. These items have one status (1,2 or 3).
So qty = sum of values of status 1,2,3.
I would like to :
Duplicate each row by the "qty" column
Then edit 3 status (or update a new column), to get just 1 status.
The output should look like this:
Edit: the order is not important, but I will need to keep other columns of my initial df.
My (incomplete) solution so far - I found a way to duplicate the rows using this :
df2= df2.loc[df2.index.repeat(df2['qty'])].reset_index(drop=True)
But I can't find a way to fill the status.
Do I need to use a for loop approach to fill the status?
Should I do this transform in 1 or 2 steps?
Something like: for each initial row, the n first rows take the first status, where n is the value of status 2....
The output could maybe looks like :
Edit1 : Thank you for your answers !
Last question : now I'm trying to integrate this to my actual df. What is the best approach to apply these methods to my df who contains many other column ?
I will answer my last question :
Split df in 2: dfstatus
and dfwithoutstatus
, keeping the qty column in both
Apply one of your method on the dfstatus
Apply my method on the dfwithoutstatus
(a simple duplication)
Merge on index
Thank you all for your answers.
Best
Here is a possible solution:
import numpy as np
import pandas as pd
E = pd.DataFrame(np.eye(df.shape[1] - 1, dtype=int))
result = pd.DataFrame(
df['qty'].reindex(df.index.repeat(df['qty'])).reset_index(drop=True),
)
result[df.columns[1:]] = pd.concat(
[E.reindex(E.index.repeat(df.iloc[i, 1:]))
for i in range(len(df))],
).reset_index(
drop=True,
)
Here is the result:
>>> result
qty status 1 status 2 status 3
0 10 1 0 0
1 10 1 0 0
2 10 1 0 0
3 10 1 0 0
4 10 1 0 0
5 10 0 1 0
6 10 0 1 0
7 10 0 1 0
8 10 0 0 1
9 10 0 0 1
10 7 1 0 0
11 7 1 0 0
12 7 0 1 0
13 7 0 1 0
14 7 0 0 1
15 7 0 0 1
16 7 0 0 1
17 2 1 0 0
18 2 1 0 0
19 1 0 1 0