azureazure-machine-learning-serviceazureml-python-sdkazuremlsdk

Azure ML Tabular Dataset : missing 1 required positional argument: 'stream_column'


For the Python API for tabular dataset of AzureML (azureml.data.TabularDataset), there are two experimental methods which have been introduced:

  1. download(stream_column, target_path=None, overwrite=False, ignore_not_found=True)
  2. mount(stream_column, mount_point=None)

Parameter stream_column has been defined as The stream column to mount or download.

What is the actual meaning of stream_column? I don't see any example any where?

Any pointer will be helpful.

The stack trace:

Method download: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_11561/3904436543.py in <module>
----> 1 tab_dataset.download(target_path="../data/tabular")

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/_base_sdk_common/_docstring_wrapper.py in wrapped(*args, **kwargs)
     50     def wrapped(*args, **kwargs):
     51         module_logger.warning("Method {0}: {1} {2}".format(func.__name__, _method_msg, _experimental_link_msg))
---> 52         return func(*args, **kwargs)
     53     return wrapped
     54 

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs)
    130             with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al:
    131                 try:
--> 132                     return func(*args, **kwargs)
    133                 except Exception as e:
    134                     if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):

TypeError: download() missing 1 required positional argument: 'stream_column'

Solution

  • Update on 5th March, 2022

    I posted this as a support ticket with Azure. Following is the answer I have received:

    As you can see from our documentation of TabularDataset Class, the “stream_column” parameter is required. So, that error is occurring because you are not passing any parameters when you are calling the download method. The “stream_column” parameter should have the stream column to download/mount. So, you need to pass the column name that contains the paths from which the data will be streamed.
    Please find an example here.