python-3.xdatabrickssaprfcpyrfc

Is it possible to install pyRFC onto a Databricks Spark cluster?


There is a Py-pi for pyRFC, but like all other C-python libraries, it has a lot of dependencies, and requires the setting of environment variables, etc.

Is it possible to install a c-python library like pyRFC onto a Databricks cluster? If so, how would you have to go about including the SDK dependencies?

Perhaps, someone has tried with the Java version already?


Solution

  • Yes, it's possible. It's usually done by attaching a cluster init script to a cluster. The task of the cluster init script is to setup all necessary dependencies, compile libraries/install packages, etc. on all cluster nodes. Usually, people are downloading their packages, etc. and put them on DBFS, and then accessing them from inside the init script using the /dbfs mount.

    Script could look like this (just example):

    #!/bin/bash
    
    # Unpack SAP SDK into some location
    tar zxvf /dbfs/FileStore/SAP-SDK.tar.gz
    
    # install package
    pip install pyrfc