pythonobject-slicing

Select specific substing - dynamic indexing



I have several .csv files with 2 slightly different formats.

Format 1: X-XXX_2020-11-05_13-54-55-555__XX.csv
Format 2: X-XXX_2020-11-05_13-54-55-555__XXX.csv

I need to extract dametime field to add it to pandas dataframe. Normally I would just use simple slicing
datetime.datetime.strptime(string1[-31:-8], "%Y-%m-%d_%H-%M-%s-%f")

which would give me desired result, but only for Format1.

For Format2 I need to move indexes for slicing by 1 because of the different ending.
Also I can not index from the start because of other operations.

At the moment I got around it by using IF statement looking like this:

def tdate():
    if string1[-7]=='X':
        return datetime.datetime.strptime(string1[-32:-9], "%Y-%m-%d_%H-%M-%s-%f")
    else:
        return datetime.datetime.strptime(string1[-31:-8], "%Y-%m-%d_%H-%M-%s-%f")

is there simpler way to make "dynamic" indexes so I could avoid creating additional def?

Thank you!


Solution

  • Using str.split with list slicing

    Ex:

    import datetime
    
    for i in ("X-XXX_2020-11-05_13-54-55-555__XX.csv", "X-XXX_2020-11-05_13-54-55-555__XXX.csv"):
        print(datetime.datetime.strptime("_".join(i.split("_")[1:3]), "%Y-%m-%d_%H-%M-%S-%f"))
    

    OR using regex.

    Ex:

    import re
    import datetime
    
    for i in ("X-XXX_2020-11-05_13-54-55-555__XX.csv", "X-XXX_2020-11-05_13-54-55-555__XXX.csv"):
        d = re.search(r"(?<=_)(.+)(?=__)", i)
        if d:
            print(datetime.datetime.strptime(d.group(1), "%Y-%m-%d_%H-%M-%S-%f"))
    

    Output:

    2020-11-05 13:54:55.555000
    2020-11-05 13:54:55.555000