regexscalaapache-beamspotify-scio

How can I extract date from a .txt file which name contains date? (Scala)


I have a .txt file as input for my beam programming project, using scala spotify scio.

input= args.getOrElse("input", "/home/user/Downloads/trade-20181001.txt")

How can I extract the date 2018-10-01 (1st October) from the file name? Thank you!


Solution

  • In your example above I would simply use the regex below. It searches for anything ending in 8 numbers followed by .txt.

    (?<dateTime>\d{8})\.txt$
    
    (?<dateTime> is the start of a named capture group.
    \d{8} means exactly 8 digits.
    ) is the end of the named capture group.
    \. means match the character . literally.
    txt means match txt literally.
    $ means that the string ends there and nothing comes after it.
    

    If you cannot use named capture groups in your program you can always use the regex below without it and replace .txt out of it.

    \d{8}\.txt$