apache-sparkdatabrickshyperopt

Hyperopt spark 3.0 issues


I am running runtime 8.1 (includes Apache Spark 3.1.1, Scala 2.12) trying to get hyperopt working as defined by

https://docs.databricks.com/applications/machine-learning/automl-hyperparam-tuning/hyperopt- spark-mlflow-integration.html

py4j.Py4JException: Method maxNumConcurrentTasks([]) does not exist

when I try to

spark_trials = SparkTrials()

Is there anything special I need to do to get this working?

Here is the cluster I am using

{
    "autoscale": {
        "min_workers": 1,
        "max_workers": 2
    },
    "cluster_name": "mlops_tiny_ml",
    "spark_version": "8.2.x-cpu-ml-scala2.12",
    "spark_conf": {},
    "aws_attributes": {
        "first_on_demand": 1,
        "availability": "SPOT_WITH_FALLBACK",
        "zone_id": "us-west-2b",
        "instance_profile_arn": "arn:aws:iam::112437402463:instance-profile/databricks_instance_role_s3",
        "spot_bid_price_percent": 100,
        "ebs_volume_type": "GENERAL_PURPOSE_SSD",
        "ebs_volume_count": 3,
        "ebs_volume_size": 100
    },
    "node_type_id": "m4.large",
    "driver_node_type_id": "m4.large",
    "ssh_public_keys": [],
    "custom_tags": {},
    "spark_env_vars": {},
    "autotermination_minutes": 120,
    "enable_elastic_disk": false,
    "cluster_source": "UI",
    "init_scripts": [],
    "cluster_id": "0xxxxxt404"
}

this is the code I am using https://docs.databricks.com/applications/machine-learning/automl-hyperparam-tuning/hyperopt-model-selection.html


Solution

  • Hyperopt is only included into the DBR ML runtimes, not into the stock runtimes. You can check it by looking into release notes for each of runtimes: DBR 8.1 vs. DBR 8.1 ML.

    And from the docs:

    Databricks Runtime for Machine Learning incorporates MLflow and Hyperopt, two open source tools that automate the process of model selection and hyperparameter tuning.