pythonpandasdatetimeconcatenationreplicate

Duplicate datetime values in a dataframe column using a list of numbers


I have a dataframe with a datetime column in string type, like this:

>>> df2
       date     a    b
0  2020/1/1   8.0  5.0
1  2020/1/2  10.0  7.0
2  2020/1/3   6.0  1.0
3  2020/1/4   6.0  3.0

I want use its 'date' column to generate a new index with various length by multiply a array, like this:

>>> idx_list = [2,3,1,2]
>>> df2.date*idx_list

but I got a unexpected result:

>>> df2.date*idx_list
0            2020/1/12020/1/1
1    2020/1/22020/1/22020/1/2
2                    2020/1/3
3            2020/1/42020/1/4

I want to make a new index series to be a sequential data, like:

0 2020/1/1
1 2020/1/1
2 2020/1/2
3 2020/1/2
4 2020/1/2
5 2020/1/3
6 2020/1/4
7 2020/1/4

How do I do that?


Solution

  • To duplicate column values, you can use repeat. Make sure that the length of idx_list matches the length of the column.

    df2 = pd.DataFrame({'date': ['2020/1/1', '2020/1/2', '2020/1/3', '2020/1/4'],
                        'a':    [8.0, 10.0, 6.0, 6.0],
                        'b':    [5.0, 7.0, 1.0, 3.0]})
    idx_list = [2,3,1,2]
    # use repeat
    df2['date'].repeat(idx_list)
    
    
    0    2020/1/1
    0    2020/1/1
    1    2020/1/2
    1    2020/1/2
    1    2020/1/2
    2    2020/1/3
    3    2020/1/4
    3    2020/1/4
    Name: date, dtype: object
    

    If you want to duplicate rows of the entire dataframe, then make date the index, try Index.repeat to duplicate the index and loc to duplicate the rows.

    # make date the index
    df2 = df2.set_index('date')
    idx_list = [2,3,1,2]
    # use repeat and loc to create duplicated rows
    df2 = df2.loc[df2.index.repeat(idx_list)]
    print(df2)
    
    
                 a    b
    date               
    2020/1/1   8.0  5.0
    2020/1/1   8.0  5.0
    2020/1/2  10.0  7.0
    2020/1/2  10.0  7.0
    2020/1/2  10.0  7.0
    2020/1/3   6.0  1.0
    2020/1/4   6.0  3.0
    2020/1/4   6.0  3.0
    

    A reset_index() call afterwards would make date back into a column again.