I am using the databricks-connect module to run my code on my Databricks cluster form within PyCharm on my local machine. This works fine as long as I'm outside of the company network, i.e. working from home. But in the office I have to use the company network and I can't get it to work with databricks connect.
For other services that wouldn't work in the network (such as pip or git with Azure Devops) I set the HTTPS_PROXY environment variable with a proxy like this http://proxy.company.com:port
and it works fine then.
But I can't figure out which settings are required for databricks-connect. I have tried setting http_proxy and https_proxy in environment or the databrickscfg. But it just keeps failing when I try to create my spark session.
from databricks.connect import DatabricksSession # type: ignore[attr-defined]
spark = DatabricksSession.builder.profile("myprofile").getOrCreate()
ValueError: default auth: databricks-cli: cannot get access token: Error: oidc: fetch .well-known: Get "https://myworkspace/oidc/.well-known/oauth-authorization-server": authenticationrequired. Config: host=https://myworkspace, profile=myprofile, auth_type=databricks-cli, cluster_id=mycluster
I managed to get it to work by adding my username and password to the HTTPS_PROXY environment variable. I was so sure I tried this before, but must have failed for other reasons.
os.environ["HTTPS_PROXY"] = "http://username:password@proxy.company.com:port"