Has anyone been successful in using/configuring ODBC for Rstudio desktop with databricks community edition on MAC OS Ventura (13.4) and M1 chip?
Tha link to databricks community edition is here --> link
I downloaded the MAC OS driver here --> link
I tried using the ODBC Manager here --> link
Rstudio/Posit seems to suggest we need to use unixODBC, which I got here --> link. I tried using home-brew to install the unixODBC but ran into path problems and decided to manually download.
I tried to configure the driver/dsn according to these parameters -->link. I additionally included UID and PWD key:value because community edition does not allow for the creation of the personal token.
Rstudio repeatedly provides
Error: nanodbc/nanodbc.cpp:1118: 00000: [unixODBC][Driver Manager]Data source name not found and no default driver specified
When I go to new connection I only see Spark and Livy with no Simba Spark Odbc Driver. When I used unixODBC from home-brew with ODBCSYSINI=/opt/homebrew/etc
specified in ~/.Renviron
as specified here link, I could at least see Simba Spark Odbc Driver in the new connection window from Rstudio connection tab.
I know I have not given much detail here (i.e. reprex) but trying to see how to use /test Rstudio desktop with databricks for larger than RAM datasets. Testing locally to see if it can be replicated in my office environment. Working with these ODBC drivers etc are new to me. Thank you for any resources/examples/help that folks can provide.
Databricks just released a new ARM64-compatabile ODBC driver for MacOS at the link you shared. A major issue was the lack of ARM compilation, creating issues with RStudio and R packages searching for the package. The rough steps I went through:
brew install unixodbc
.ini
files with odbcinst -j
/Library/simba/spark
echo "\n" >> /opt/homebrew/etc/odbcinst.ini
echo "[databricks_spark_sql]" >> /opt/homebrew/etc/odbcinst.ini
echo "Driver=/Library/simba/spark/lib/libsparkodbc_sb64-universal.dylib" >> /opt/homebrew/etc/odbcinst.ini
echo "ODBCInstLib=/opt/homebrew/opt/unixodbc/lib/libodbcinst.2.dylib" >> /Library/simba/spark/lib/simba.sparkodbc.ini
These commands add the databricks_spark_sql
driver location to the “connector configuration files” used by the unixODBC
driver manager. It also makes unixODBC
utilities available to the databricks_spark_sql
driver. Once complete, close your Terminal and open RStudio.
.Rprofile
that sets an environment variable that tells R where to look for odbc's configuration files. This is where the odbcinst.ini
file is stored. From within a project in RStudio, you can do this by opening your .Rprofile with the following command:usethis::edit_r_profile(scope = 'project')
Then, in the .Rprofile, paste the following:
Sys.setenv(ODBCSYSINI='/opt/homebrew/etc')
Restart your R Session and check that the new ODBC driver is recognized/being listed correctly:
odbc::odbcListDrivers()
Go through the process of connecting to Databricks via ODBC using their documentation. From dbplyr
, it may look something like this:
db_odbc <- DBI::dbConnect(
drv = odbc::odbc(),
driver = 'databricks_spark_sql',
server = 'adb-[...].azuredatabricks.net',
host = 'adb-[...].azuredatabricks.net',
port = 443,
httppath = '/sql/1.0/warehouses/01d71fe07e891bb6',
catalog = '...',
schema = '...',
thrifttransport = 2,
ssl = 1,
authmech = 3,
uid = 'token',
pwd = keyring::key_get(service = 'simba_connector_token')
)