I am trying to get the createdtime of a file stored in ADLS gen2. This file is generated by a downstream process. In databricks, A dataframe is created by reading the file and I need the createdtime of file to be added as a column in the dataframe.
I tried using dbutils. But it is only giving me modificationTime which can change if there is any modification to the file. I even tried os.stat which gives me createdtime but it is changing based on the modification to the file which is not expected.
Dbutils code
filepath='mount path of the file'
modificationTime=dbutils.fs.ls(filepath)[0].modificationTime
os.stat code
import datetime
import os
statinfo = os.stat('/dbfs/'+filepath)
create_date = datetime.fromtimestamp(statinfo.st_ctime)
Any help would be appreciated
To get the creation time of a file stored in Azure Data Lake Storage (ADLS) Gen2, you can use the Azure SDK for Python instead of relying on dbutils
or os.stat
, which can sometimes yield inconsistent results. The Azure Storage Blob SDK provides access to properties like creation time, which is stored as metadata in the blob. You can use below code to get creation time of file which is stored in ADLS account:
from azure.storage.blob import BlobServiceClient
connection_string = "<connectionString>"
container_name = "<containerName>"
file_path = "<filePath>"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_path)
properties = blob_client.get_blob_properties()
print(f"File: {file_path}")
print(" Creation Time:", properties.creation_time)
print(" Last Modified Time:", properties.last_modified)
print(" Size (in bytes):", properties.size)
You will get the output as shown below:
You will be able to find the difference between Creation Time
and Last Modified Time
in above output. If you want to get Creation Time
and Last Modified Time
multiple files in directory you can use below code:
from azure.storage.blob import BlobServiceClient
connection_string = "<connectionString>"
container_name = "<containerName>"
directory_path = "<directory>"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
container_client = blob_service_client.get_container_client(container_name)
files = dbutils.fs.ls("<mountPath>")
for file_info in files:
file_path = f"{directory_path}/{file_info.name}" # Construct relative path within container
blob_client = container_client.get_blob_client(blob=file_path)
try:
properties = blob_client.get_blob_properties()
print(f"File: {file_path}")
print(" Creation Time:", properties.creation_time)
print(" Last Modified Time:", properties.last_modified)
print(" Size (in bytes):", properties.size)
except Exception as e:
print(f"Error retrieving properties for {file_path}: {e}")
You will get output as shown below: