odbcrstudiodatabricksdesktopdatabricks-community-edition

Connecting RStudio Desktop to Databricks Community Edition on Mac OS Ventura(13.4) with M1 chip via ODBC


Has anyone been successful in using/configuring ODBC for Rstudio desktop with databricks community edition on MAC OS Ventura (13.4) and M1 chip?

Tha link to databricks community edition is here --> link

I downloaded the MAC OS driver here --> link

I tried using the ODBC Manager here --> link

Rstudio/Posit seems to suggest we need to use unixODBC, which I got here --> link. I tried using home-brew to install the unixODBC but ran into path problems and decided to manually download.

I tried to configure the driver/dsn according to these parameters -->link. I additionally included UID and PWD key:value because community edition does not allow for the creation of the personal token.

Rstudio repeatedly provides Error: nanodbc/nanodbc.cpp:1118: 00000: [unixODBC][Driver Manager]Data source name not found and no default driver specified

When I go to new connection I only see Spark and Livy with no Simba Spark Odbc Driver. When I used unixODBC from home-brew with ODBCSYSINI=/opt/homebrew/etc specified in ~/.Renviron as specified here link, I could at least see Simba Spark Odbc Driver in the new connection window from Rstudio connection tab.

I know I have not given much detail here (i.e. reprex) but trying to see how to use /test Rstudio desktop with databricks for larger than RAM datasets. Testing locally to see if it can be replicated in my office environment. Working with these ODBC drivers etc are new to me. Thank you for any resources/examples/help that folks can provide.


Solution

  • Databricks just released a new ARM64-compatabile ODBC driver for MacOS at the link you shared. A major issue was the lack of ARM compilation, creating issues with RStudio and R packages searching for the package. The rough steps I went through:

    1. Install ODBC manager: brew install unixodbc
    2. Check to ensure installation and paths for .ini files with odbcinst -j
    3. Install the latest version of the Databricks ODBC driver. By default, it will be installed at the path /Library/simba/spark
    4. Copy and paste the following command in your CLI:
    echo "\n" >> /opt/homebrew/etc/odbcinst.ini
    echo "[databricks_spark_sql]" >> /opt/homebrew/etc/odbcinst.ini
    echo "Driver=/Library/simba/spark/lib/libsparkodbc_sb64-universal.dylib" >> /opt/homebrew/etc/odbcinst.ini
    echo "ODBCInstLib=/opt/homebrew/opt/unixodbc/lib/libodbcinst.2.dylib" >> /Library/simba/spark/lib/simba.sparkodbc.ini
    

    These commands add the databricks_spark_sql driver location to the “connector configuration files” used by the unixODBC driver manager. It also makes unixODBC utilities available to the databricks_spark_sql driver. Once complete, close your Terminal and open RStudio.

    1. Add an entry to your .Rprofile that sets an environment variable that tells R where to look for odbc's configuration files. This is where the odbcinst.ini file is stored. From within a project in RStudio, you can do this by opening your .Rprofile with the following command:
    usethis::edit_r_profile(scope = 'project')
    

    Then, in the .Rprofile, paste the following:

    Sys.setenv(ODBCSYSINI='/opt/homebrew/etc')
    

    Restart your R Session and check that the new ODBC driver is recognized/being listed correctly:

    odbc::odbcListDrivers()
    

    Go through the process of connecting to Databricks via ODBC using their documentation. From dbplyr, it may look something like this:

    db_odbc <- DBI::dbConnect(
        drv = odbc::odbc(),
        driver          = 'databricks_spark_sql',
        server          = 'adb-[...].azuredatabricks.net',
        host            = 'adb-[...].azuredatabricks.net',
        port            = 443,
        httppath        = '/sql/1.0/warehouses/01d71fe07e891bb6',
        catalog         = '...',
        schema          = '...',
        thrifttransport = 2,
        ssl             = 1,
        authmech        = 3,
        uid             = 'token',
        pwd             = keyring::key_get(service = 'simba_connector_token')
    )