pythonmlflowmicrosoft-fabrictsfresh

Fabric Notebook running into error while using TSfresh for feature extraction


When I Try to run TSFreshes's extract_features in Microsoft Fabric's notebooks I keep on running into the same set of errors.

Dependency not available for matrix_profile, this feature will be disabled!
Feature Extraction:   0%|          | 0/20 [00:00<?, ?it/s]2024-03-26:07:49:13,37 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,36 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,95 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,103 ERROR    [synapse_mlflow_utils.py:348] 'c'
Traceback (most recent call last):
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 345, in set_envs
    config = MLConfig(sc)
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 128, in __init__
    self.env_configs = self.get_mlflow_configs()
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 163, in get_mlflow_configs
    region = self._get_spark_config("spark.cluster.region")
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/synapse_mlflow_utils.py", line 135, in _get_spark_config
    value = self.sc.getConf().get(key, "")
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 2375, in getConf
    conf.setAll(self._conf.getAll())
  File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 238, in getAll
    return [(elem._1(), elem._2()) for elem in cast(JavaObject, self._jconf).getAll()]
  File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 238, in <listcomp>
    return [(elem._1(), elem._2()) for elem in cast(JavaObject, self._jconf).getAll()]
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py", line 342, in get_return_value
    return OUTPUT_CONVERTER[type](answer[2:], gateway_client)
KeyError: 'c'
2024-03-26:07:49:13,192 ERROR    [synapse_mlflow_utils.py:349] ## Not In PBI Synapse Platform ##
2024-03-26:07:49:13,336 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,341 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,342 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,344 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,346 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,347 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,350 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,351 ERROR    [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
    url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,357 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,348 ERROR    [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
    url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,360 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>._call_endpoint exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,361 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,362 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>._call_endpoint exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,362 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>.create_run exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,364 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'mlflow.store.tracking.rest_store.RestStore'>.create_run exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,343 ERROR    [tracking_store.py:67] get_host_credentials fatal error
Traceback (most recent call last):
  File "/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/synapse/ml/mlflow/tracking_store.py", line 64, in get_host_credentials
    url_base = get_mlflow_env_config(False).workload_endpoint
AttributeError: 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,371 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,369 WARNING  [synapse_mlflow_utils.py:360] Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.
2024-03-26:07:49:13,369 ERROR    [synapse_mlflow_utils.py:420] [fabric mlflow plugin]: <class 'synapse.ml.mlflow.tracking_store.TridentMLflowTrackingStore'>.get_host_credentials exception 'NoneType' object has no attribute 'workload_endpoint'
2024-03-26:07:49:13,372 ERROR    [tracking_store.py:67] get_host_credentials fatal error

The last few errors keep repeating if I just let the program run.

It also keeps referring to MLFlow, a package I know is integrated in Fabric's Notebooks but I'm not actively calling. I have tried to use set_mlflow_env_config like the error says but could find no such thing in the documentation.

Example code below creates my exact issue (from: https://tsfresh.readthedocs.io/en/latest/text/quick_start.html)

import pandas as pd
import numpy as np

import tsfresh
from tsfresh import extract_features
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute


# Example Dataset from Tsfresh
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, load_robot_execution_failures
download_robot_execution_failures()
timeseries, y = load_robot_execution_failures()

#Extract features, in the style of the documentation: https://tsfresh.readthedocs.io/en/latest/text/quick_start.html 
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
impute(extracted_features)
features_filtered = select_features(extracted_features, y)

How do I resolve this and stop MLFlow from interfering?

I have also already tried importing and setting an experiment with MLFlow to see if that resolved the issue, it did not. It created many ML models with data I couldnt trace back to anything. It still didn't extract any of my features.

My current best guess is that TSFresh uses SKlearn or similar to fit its features which MLFlow thinks it should track.


Solution

  • As it turns out, this code does just run apparently. MLflow throwing continuous errors does not seem to impact TSFresh's ability to do the feature extraction. My data set was just so large it took a while and all the errors obscured the progress.

    If you want to turn off mlflow however (which I Highly recommend) the following works (Source):

    import mlflow
    mlflow.autolog(disable=True)