I've set up a script to get data from my Dagshub repo, train some models on it, and use mlflow to log the training and evaluation data to the mlflow server associated with the Dagshub repo. This works locally, but when I try to run it through Github Actions it only logs some training parameters and none of the model-specific parameters or test metrics.
Looking through the logs, I notice an exception and a warning which I'm not sure how to fix. Any advice would be greatly appreciated.
Exception: The following failures occurred while performing one or more logging operations: [MlflowException('Failed to perform one or more operations on the run with ID f72e708e6f7b43c49e88769357547b54. Failed operations: [RestException("INVALID_PARAMETER_VALUE: Response: {\'error_code\': \'INVALID_PARAMETER_VALUE\'}")]')]
2024-03-16T11:05:16.7804154Z 2024/03/16 11:05:16 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during sklearn autologging: The following failures occurred while performing one or more logging operations: [MlflowException('Failed to perform one or more operations on the run with ID f72e708e6f7b43c49e88769357547b54. Failed operations: [RestException("INVALID_PARAMETER_VALUE: Response: {\'error_code\': \'INVALID_PARAMETER_VALUE\'}")]')]
2024-03-16T11:05:16.5074723Z 2024/03/16 11:05:16 WARNING mlflow.models.model: Logging model metadata to the tracking server has failed. The model artifacts have been logged successfully under mlflow-artifacts:/44ab2167890f4d81a6a74d258b2e05f0/f72e708e6f7b43c49e88769357547b54/artifacts. Set logging level to DEBUG via `logging.getLogger("mlflow").setLevel(logging.DEBUG)` to see the full traceback.
2024-03-16T11:05:16.5092754Z 2024/03/16 11:05:16 DEBUG mlflow.models.model:
2024-03-16T11:05:16.5093642Z urllib3.exceptions.ResponseError: too many 500 error responses
2024-03-16T11:05:16.5094263Z
2024-03-16T11:05:16.5094676Z The above exception was the direct cause of the following exception:
2024-03-16T11:05:16.5095344Z
2024-03-16T11:05:16.5095554Z Traceback (most recent call last):
2024-03-16T11:05:16.5104112Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
2024-03-16T11:05:16.5104940Z resp = conn.urlopen(
2024-03-16T11:05:16.5105853Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/connectionpool.py", line 948, in urlopen
2024-03-16T11:05:16.5106931Z return self.urlopen(
2024-03-16T11:05:16.5107877Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/connectionpool.py", line 948, in urlopen
2024-03-16T11:05:16.5108643Z return self.urlopen(
2024-03-16T11:05:16.5109452Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/connectionpool.py", line 948, in urlopen
2024-03-16T11:05:16.5110215Z return self.urlopen(
2024-03-16T11:05:16.5110524Z [Previous line repeated 2 more times]
2024-03-16T11:05:16.5111375Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/connectionpool.py", line 938, in urlopen
2024-03-16T11:05:16.5112261Z retries = retries.increment(method, url, response=response, _pool=self)
2024-03-16T11:05:16.5113237Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment
2024-03-16T11:05:16.5114221Z raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
2024-03-16T11:05:16.5116015Z urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='dagshub.com', port=443): Max retries exceeded with url: /***/Dublin-property-prices.mlflow/api/2.0/mlflow/runs/log-model (Caused by ResponseError('too many 500 error responses'))
2024-03-16T11:05:16.5117136Z
2024-03-16T11:05:16.5117395Z During handling of the above exception, another exception occurred:
2024-03-16T11:05:16.5117767Z
2024-03-16T11:05:16.5117889Z Traceback (most recent call last):
2024-03-16T11:05:16.5118782Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 108, in http_request
2024-03-16T11:05:16.5119578Z return _get_http_response_with_retries(
2024-03-16T11:05:16.5120873Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/utils/request_utils.py", line 212, in _get_http_response_with_retries
2024-03-16T11:05:16.5121907Z return session.request(method, url, allow_redirects=allow_redirects, **kwargs)
2024-03-16T11:05:16.5122998Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
2024-03-16T11:05:16.5123746Z resp = self.send(prep, **send_kwargs)
2024-03-16T11:05:16.5124557Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
2024-03-16T11:05:16.5125264Z r = adapter.send(request, **kwargs)
2024-03-16T11:05:16.5126058Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/requests/adapters.py", line 510, in send
2024-03-16T11:05:16.5126785Z raise RetryError(e, request=request)
2024-03-16T11:05:16.5128220Z requests.exceptions.RetryError: HTTPSConnectionPool(host='dagshub.com', port=443): Max retries exceeded with url: /***/Dublin-property-prices.mlflow/api/2.0/mlflow/runs/log-model (Caused by ResponseError('too many 500 error responses'))
2024-03-16T11:05:16.5129323Z
2024-03-16T11:05:16.5129563Z During handling of the above exception, another exception occurred:
2024-03-16T11:05:16.5129939Z
2024-03-16T11:05:16.5130056Z Traceback (most recent call last):
2024-03-16T11:05:16.5131031Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/models/model.py", line 625, in log
2024-03-16T11:05:16.5131858Z mlflow.tracking.fluent._record_logged_model(mlflow_model, run_id)
2024-03-16T11:05:16.5132906Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 1413, in _record_logged_model
2024-03-16T11:05:16.5133789Z MlflowClient()._record_logged_model(run_id, mlflow_model)
2024-03-16T11:05:16.5134816Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/tracking/client.py", line 1831, in _record_logged_model
2024-03-16T11:05:16.5135744Z self._tracking_client._record_logged_model(run_id, mlflow_model)
2024-03-16T11:05:16.5136843Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 524, in _record_logged_model
2024-03-16T11:05:16.5137764Z self.store.record_logged_model(run_id, mlflow_model)
2024-03-16T11:05:16.5138799Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 344, in record_logged_model
2024-03-16T11:05:16.5139668Z self._call_endpoint(LogModel, req_body)
2024-03-16T11:05:16.5140608Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 60, in _call_endpoint
2024-03-16T11:05:16.5141619Z return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
2024-03-16T11:05:16.5142714Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 219, in call_endpoint
2024-03-16T11:05:16.5143500Z response = http_request(**call_kwargs)
2024-03-16T11:05:16.5144376Z File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 130, in http_request
2024-03-16T11:05:16.5145312Z raise MlflowException(f"API request to {url} failed with exception {e}")
2024-03-16T11:05:16.5147853Z mlflow.exceptions.MlflowException: API request to https://dagshub.com/***/Dublin-property-prices.mlflow/api/2.0/mlflow/runs/log-model failed with exception HTTPSConnectionPool(host='dagshub.com', port=443): Max retries exceeded with url: /***/Dublin-property-prices.mlflow/api/2.0/mlflow/runs/log-model (Caused by ResponseError('too many 500 error responses'))
It looks like this is an MLflow version compatibility issue.
DagsHub at time of writing has v2.7 , and in v2.10 a compatibility breaking feature called Model Signature Supports Objects and Arrays was released: https://github.com/mlflow/mlflow/releases/tag/v2.10.0
It seems likely that in your environment that works, you use an MLflow client < 2.10 and that using the same version in your Github action will solve the issue for now.
For fast support on DagsHub, I'd recommend joining the community Discord to get direct support from the team: https://discord.com/invite/9gU36Y6