pythonpandasreplicate

Pandas row replication python


So my dataframe has multiple columns, one of them is named "multiple" which contains boolean, only 1s and 0s. Now, I want to replicate all the rows 4 times only for all the df.loc[df.multiple==1]. How can I do that? (I don't want to replicate indexes)

example input:
df=
index strings  multiple
0        A        0
1        B        1
2        C        1
3        D        0
4        E        1


Expected output:

index strings  multiple
0        A        0
1        B        1
2        B        1
3        B        1
4        B        1
5        B        1
6        C        1
7        C        1
8        C        1
9        C        1
10       C        1
11       D        0
12       E        1
13       E        1
14       E        1
15       E        1
16       E        1

Solution

  • Here is another alternative, based on @Vinzent answer. It is using the same approach to construct the repeats, but doesn't require to reconstruct the full dataframe. It is instead based on indexing. This solution is ~30% faster on the provided dataset and larger datasets.

    df.loc[np.repeat(df.multiple, df.multiple.values*4+1).index].reset_index(drop=True)