pythonbulkdatetimeindexcsv-import

DateTime Parse from multiple csv files


Searched myself silly, but couldn't find the answer.

Basically I want to import a number of GPS files with the aim to know the location of each GPS at any given time.

I wanted to use Panda's datetime index for this. What I can't seem to figure out is how to align this data.

My result is that each gps starts a new timedate index, I think I'm overwriting my timedata with every import.

I've tried creating a df outside the for loop first, but not with great results.

csv1

csv2

This is my code:

import pandas as pd
import glob
import os
from datetime import datetime
from pandas import ExcelWriter

pattern = '*.csv'
csv_files = glob.glob(pattern)
frames = []


for csv in csv_files:
    with open(csv) as fp:
        skip = next(filter(
            lambda x: x[1].startswith('trkpt'),
            enumerate(fp)
        ))[0] + 1
    df = pd.read_csv(csv, usecols = ['lat','lon','ele','time'], parse_dates=['time'], skiprows=skip)
    df['DateTime'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
    df = df.set_index('DateTime')
    df.rename(columns={'lat':'lat' + ' ' + csv,'lon':'lon' + ' ' + csv,'ele':'ele' + ' ' + csv}, inplace=True)
    df.drop(['time'], axis=1, inplace=True)
    frames.append(df)

df = pd.concat(frames)

df.to_csv('GPS Export.csv', sep=',')

File example

trkpt                   

ID  trksegID    lat lon ele time
1   1   -32.46226206    116.0619373 311.6   2021-01-22T01:54:03Z
2   1   -32.46225444    116.0619245 311.6   2021-01-22T01:54:04Z
3   1   -32.46225762    116.0619227 314.97  2021-01-22T01:54:05Z
4   1   -32.46226215    116.0619119 316.41  2021-01-22T01:54:06Z
5   1   -32.46226123    116.0618896 317.85  2021-01-22T01:54:07Z
6   1   -32.46225611    116.0618791 317.85  2021-01-22T01:54:08Z
7   1   -32.46224949    116.0618693 316.41  2021-01-22T01:54:09Z
8   1   -32.46224086    116.0618602 314.97  2021-01-22T01:54:10Z
9   1   -32.46223943    116.0618525 314.49  2021-01-22T01:54:11Z
10  1   -32.46225385    116.0618722 314.49  2021-01-22T01:54:12Z

also got a small problem with the date formatting, but I can live with that


Solution

  • The solution would be to set the datetime index after concatenating the files. The snippet below assumes that all csv's are formatted similarly to your csv snippet and skips the first row with trkpt. It also adds a column with the csv filename since it appears you wish to do some renaming or postprocessing with the filename.

    import glob
    import pandas as pd
    import os
    
    df = pd.concat([pd.read_csv(fp, skiprows=1).assign(filename=os.path.basename(fp)) for fp in glob.glob('*.csv')])
    df['DateTime'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
    df = df.set_index('DateTime')
    df.drop(['time'], axis=1, inplace=True)
    

    Output:

    | DateTime                  |   ID |   trksegID |      lat |     lon |    ele | filename   |
    |:--------------------------|-----:|-----------:|---------:|--------:|-------:|:-----------|
    | 2021-01-22 01:54:03+00:00 |    1 |          1 | -32.4623 | 116.062 | 311.6  | 2.csv      |
    | 2021-01-22 01:54:04+00:00 |    2 |          1 | -32.4623 | 116.062 | 311.6  | 2.csv      |
    | 2021-01-22 01:54:05+00:00 |    3 |          1 | -32.4623 | 116.062 | 314.97 | 2.csv      |
    | 2021-01-22 01:54:06+00:00 |    4 |          1 | -32.4623 | 116.062 | 316.41 | 2.csv      |
    | 2021-01-22 01:54:07+00:00 |    5 |          1 | -32.4623 | 116.062 | 317.85 | 2.csv      |
    | 2021-01-22 01:54:08+00:00 |    6 |          1 | -32.4623 | 116.062 | 317.85 | 2.csv      |
    | 2021-01-22 01:54:09+00:00 |    7 |          1 | -32.4622 | 116.062 | 316.41 | 1.csv      |
    | 2021-01-22 01:54:10+00:00 |    8 |          1 | -32.4622 | 116.062 | 314.97 | 1.csv      |
    | 2021-01-22 01:54:11+00:00 |    9 |          1 | -32.4622 | 116.062 | 314.49 | 1.csv      |
    | 2021-01-22 01:54:12+00:00 |   10 |          1 | -32.4623 | 116.062 | 314.49 | 1.csv      |