pandaspanel-datalongitudinal

Generate a Pandas dataframe for joining longitudinal data


I have disparate longitudinal data. I want to create a "scaffolding" dataframe to join those data to. I have N longitudinal individuals and I know that each timeseries component should be Y periods long, uniform longitudinal segments. I'm trying to figure out a clean way to build this scaffolding datafame, with one column for individual ID and another for time, without using loops. Let's say that Y = 10. Here's a demo of what I have in mind, for two individuals:

timeseries = pd.DataFrame(np.arange(10),columns=['DATE'])

block1 = timeseries.copy()

block1['ID'] = 1

block2 = timeseries.copy()

block2['ID'] = 2

example = pd.concat([block1,block2])

example[['ID','DATE']] 

Building this out with a loop N times isn't the end of the world, but there's got to be a better way to do it.


Solution

  • Use assign in a list comprehension and concat:

    Y = 10
    example = pd.concat([timeseries.assign(ID=n+1) for n in range(Y)])[['ID', 'DATE']]
    

    Alternative:

    Y = 10
    example = (pd.concat([timeseries]*Y)
                 .assign(ID=lambda d: np.arange(len(d))//len(timeseries)+1)
                 [['ID', 'DATE']]
               )
    

    output:

        ID  DATE
    0    1     0
    1    1     1
    2    1     2
    3    1     3
    4    1     4
    ..  ..   ...
    5   10     5
    6   10     6
    7   10     7
    8   10     8
    9   10     9
    
    [100 rows x 2 columns]