For the Python API for tabular dataset of AzureML (azureml.data.TabularDataset
), there are two experimental methods which have been introduced:
download(stream_column, target_path=None, overwrite=False, ignore_not_found=True)
mount(stream_column, mount_point=None)
Parameter stream_column
has been defined as The stream column to mount or download.
What is the actual meaning of stream_column
? I don't see any example any where?
Any pointer will be helpful.
The stack trace:
Method download: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_11561/3904436543.py in <module>
----> 1 tab_dataset.download(target_path="../data/tabular")
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/_base_sdk_common/_docstring_wrapper.py in wrapped(*args, **kwargs)
50 def wrapped(*args, **kwargs):
51 module_logger.warning("Method {0}: {1} {2}".format(func.__name__, _method_msg, _experimental_link_msg))
---> 52 return func(*args, **kwargs)
53 return wrapped
54
/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs)
130 with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al:
131 try:
--> 132 return func(*args, **kwargs)
133 except Exception as e:
134 if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):
TypeError: download() missing 1 required positional argument: 'stream_column'
Update on 5th March, 2022
I posted this as a support ticket with Azure. Following is the answer I have received:
As you can see from our documentation of TabularDataset Class, the “stream_column” parameter is required. So, that error is occurring because you are not passing any parameters when you are calling the download method. The “stream_column” parameter should have the stream column to download/mount. So, you need to pass the column name that contains the paths from which the data will be streamed.
Please find an example here.