pandasdataframecsviopython-3.11

Reshape stacked form data in csv file using pandas


I have a comma delimited input file with the below format and i'm looking for a fairly easy/fast way to convert to normal shape dataframe in pandas. the csv data file has all data stacked up into two columns with each data block separated by an empty row like below. note for the ease of explanation, i made the timestamp values the same for the three blockset, but in reality they can be different:

Trace Name,SignalName1
Signal,<signal info>
Timestamp,Value
,
Trace Name,SignalName2
Signal,<signal info>
Timestamp,Value
,
Trace Name,SignalName3
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,13
2023-10-04 15:36:43.829083 EDT,14
2023-10-04 15:36:43.895651 EDT,17
2023-10-04 15:36:43.931145 EDT,11
,
Trace Name,SignalName4
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,131
2023-10-04 15:36:43.829083 EDT,238
2023-10-04 15:36:43.895651 EDT,413
2023-10-04 15:36:43.931145 EDT,689
,
Trace Name,SignalName5
Signal,<signal info>
Timestamp,Value
,
Trace Name,SignalName6
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,9867
2023-10-04 15:36:43.829083 EDT,1257
2023-10-04 15:36:43.895651 EDT,5736
2023-10-04 15:36:43.931145 EDT,4935
,
Trace Name,SignalName7
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757194 EDT,98670
2023-10-04 15:36:43.829084 EDT,12570
2023-10-04 15:36:43.895652 EDT,57360
2023-10-04 15:36:43.931146 EDT,49350
,
Trace Name,SignalName8
Signal,<signal info>
Timestamp,Value
,
Trace Name,SignalName9
Signal,<signal info>
Timestamp,Value
,

the desired output after reshaping should look like the following:

    Timestamp                   SignalName3 SignalName4 SignalName6 SignalName7
0   10/4/2023 15:36:43.757193   13          131         9867        nan
1   10/4/2023 15:36:43.757194   nan         nan         nan         98670
2   10/4/2023 15:36:43.829083   14          238         1257        nan
3   10/4/2023 15:36:43.829084   nan         nan         nan         12570
4   10/4/2023 15:36:43.895651   17          413         5736        nan
5   10/4/2023 15:36:43.895652   nan         nan         nan         57360
6   10/4/2023 15:36:43.931145   11          689         4935        nan
7   10/4/2023 15:36:43.931146   nan         nan         nan         49350 

Solution

  • The file is pretty easy to parse:

    from io import StringIO
    
    # Python >= 3.8
    with open('trace.txt') as fp:
        data = []
        for row in fp:
            if row.startswith('Trace'):
                signal = row.split(',')[1].strip()
                next(fp) # skip next row
                buf = StringIO()
                while not (row := fp.readline()).startswith(','):
                    buf.write(row)
                buf.seek(0)
                df = pd.read_csv(buf, header=0, names=['Timestamp', signal], index_col=0)
                if not df.empty:
                    df.index = pd.to_datetime(df.index.str[:-4])
                    data.append(df)
    

    Output:

    >>> data
    [                   Timestamp  SignalName3
     0 2023-10-04 15:36:43.757193           13
     1 2023-10-04 15:36:43.829083           14
     2 2023-10-04 15:36:43.895651           17
     3 2023-10-04 15:36:43.931145           11,
                        Timestamp  SignalName4
     0 2023-10-04 15:36:43.757193          131
     1 2023-10-04 15:36:43.829083          238
     2 2023-10-04 15:36:43.895651          413
     3 2023-10-04 15:36:43.931145          689,
                        Timestamp  SignalName6
     0 2023-10-04 15:36:43.757193         9867
     1 2023-10-04 15:36:43.829083         1257
     2 2023-10-04 15:36:43.895651         5736
     3 2023-10-04 15:36:43.931145         4935]
    
    >>> pd.concat(data, axis=1)
                                SignalName3  SignalName4  SignalName6  SignalName7
    Timestamp                                                                     
    2023-10-04 15:36:43.757193         13.0        131.0       9867.0          NaN
    2023-10-04 15:36:43.757194          NaN          NaN          NaN      98670.0
    2023-10-04 15:36:43.829083         14.0        238.0       1257.0          NaN
    2023-10-04 15:36:43.829084          NaN          NaN          NaN      12570.0
    2023-10-04 15:36:43.895651         17.0        413.0       5736.0          NaN
    2023-10-04 15:36:43.895652          NaN          NaN          NaN      57360.0
    2023-10-04 15:36:43.931145         11.0        689.0       4935.0          NaN
    2023-10-04 15:36:43.931146          NaN          NaN          NaN      49350.0