pysparkazure-databricks

Check if the file from blob storage is in format of MMDDYYYY


I have a file from blob storage

newfile = Supervisor_08292024_095618.csv

I want to check if the date is in MMDDYYYY format.

I tried to create a pattern for correct filename pattern:

pattern1 = r'Supervisor_[0-9]{2}[0-9]{2}[0-9]{4}_[0-9]{5}.parquet'

if re.match(pattern1 , newfile ):
    file_valid = "True"

else:
    file_valid = "False"
    print(file_valid)

The result is True because file Supervisor_08292024_095618 met the pattern1 but when I try to change the MM and DD : Supervisor_29082024_095618 .parquet...the result is still True. Which is invalid because the second file is DDMMYYYY...


Solution

  • You can change the pattern like below to achieve your requirement.

    import re
    
    pattern = '^Supervisor_(0[1-9]|1[0-2])(0[1-9]|[12][0-9]|3[01])\d{4}_[0-9]{6}.parquet$'
    
    filename = "Supervisor_08292024_095618.parquet"
        
    if re.match(pattern, filename):
        file_valid = "True"
    else:
        file_valid = "False"
    
    print(file_valid)
    

    Result when filename is in correct format:

    enter image description here

    Result when filename is not in desired format:

    enter image description here