pythonpandasdataframecycle

How do I repeat one Dataframe to match the length of another DataFrame


I want to combine two DataFrames of unequal length to a new DataFrame with the size of the larger one. Now, specifically, I want to pad the values of the shorter array by repeating it until it is large enough.

I know this is possible for lists using itertools.cycle as follows:

from itertools import cycle

x = range(7)
y = range(43)

combined = zip(cycle(x), y)

Now I want to do the same for DataFrames:

import pandas as pd

df1 = pd.DataFrame(...)  # length 7
df2 = pd.DataFrame(...)  # length 43

df_comb = pd.concat([cycle(df1),df2], axis=1)

Of course this doesn't work, but I don't know if there is an option to do this or to just manually repeat the array.


Solution

  • If you want to combine the two DataFrames to obtain an output DataFrame of the length of the longest input with repetitions of the smallest input that restart like itertools.cycle, you could compute a common key (with numpy.arange and the modulo (%) operator) to perform a merge:

    out = (df1.merge(df2, left_on=np.arange(len(df1))%len(df2),
                          right_on=np.arange(len(df2))%len(df1))
              .drop(columns=['key_0'])
          )
    

    Output:

      col1 col2 col3 col4
    0    A    X    a    Y
    1    B    X    b    Y
    2    C    X    c    Y
    3    D    X    a    Y
    4    E    X    b    Y
    5    F    X    c    Y
    6    G    X    a    Y
    

    Intermediate without dropping the merging key:

       key_0 col1 col2 col3 col4
    0      0    A    X    a    Y
    1      1    B    X    b    Y
    2      2    C    X    c    Y
    3      0    D    X    a    Y
    4      1    E    X    b    Y
    5      2    F    X    c    Y
    6      0    G    X    a    Y
    

    Used inputs:

    # df1
      col1 col2
    0    A    X
    1    B    X
    2    C    X
    3    D    X
    4    E    X
    5    F    X
    6    G    X
    
    # df2
      col3 col4
    0    a    Y
    1    b    Y
    2    c    Y