pythonpysparkdatabricksfeature-store

DataBricks (10.2) Undocumented Case Sensitivity Related to Feature Store Database/Table Access


I created an input table intended to feed DataBricks Feature Store, mounting it (in Linux) and calling it as proscribed in DataBricks documentation (from their "RawDatasets" code example):

SourceDataFrameName_df = spark \
  .read \
  .format('delta') \
  .load("dbfs:/mnt/path/dev/version/database_name.tablename_extension")

However, this call fails with a "not-found"/"doesn't exist" error report related to locating the "database_name.tablename_extension" resource. This is how the name displays everywhere within the DataBricks GUI - that is as all lower-case.

I spent much time reviewing DataBricks documentation and SO while reviewing my DataBricks system setup but cannot find the solution to this error. Please assist.


Solution

  • This is an as-yet undocumented issue related to the nature of DataBricks Feature Store operations. Since DataBricks is largely pass-through (using registered views rather than storing the source data), the mount is a key issue here.

    This issue may not be documented/highlighted adequately in their documentation because it is actually a Linux-thing, since that operating system is case-sensitive (whereas DataBricks appears to be largely case-agnostic). In this example, the original database/Linux engineer created the table/mount this way:

    database_name.TableName_Extension
    

    Since the mount references a Linux path, the path is case-sensitive, too. So, the proper way to load this source dataset from such a mount would be:

    SourceDataFrameName_df = spark \
      .read \
      .format('delta') \
      .load("dbfs:/mnt/path/dev/version/database_name.TableName_Extension")
    

    The problem is that this case-sensitive nomenclature could potentially be unknown (and unknowable) if the DataBricks developer/engineer and the database/Linux developer/engineer are not the same person! For example, it might have been labeled "database_name.Tablename_extension" or "database_name.TableName_EXTENSION" or any other combination thereof.

    Obviously, this information isn't difficult to find, if the needy user knows to look for it. Beware.