azuremachine-learningdeltaazureml-python-sdk

Unable to read current version of delta table in azure ml studio using Data asset


I am trying to create data assest with ADLS gen 2, and read a delta table on adls gen folder something like this:

/
└── my-data
    ├── _delta_log
    ├── part-0000-xxx.parquet
    └── part-0001-xxx.parquet

Currently, when creating the data asset I used file dataset type ML v1 APIs, but when reading the table, it shows all the rows(even the deleted ones), and not the most recent version.

I have attempted to create it all the other data asset types for azure Ml v1/v2. I ideally want to read the most recent version of the delta table and also have the option to change version.

No sucess. How to resolve this?


Solution

  • For the below code to work, you need to create a mltable(data asset) with correct folder path.

    import time
       import mltable
       from azure.ai.ml import MLClient
       from azure.identity import DefaultAzureCredential
       current_timestamp = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
       ml_client = MLClient.from_config(credential=DefaultAzureCredential())
       data_asset = ml_client.data.get("<enter your ml table name>", version="1")
     
       tbl = mltable.from_delta_lake(delta_table_uri=data_asset.path, 
       timestamp_as_of=current_timestamp)
       df = tbl.to_pandas_dataframe()
       df