python-3.xdatabricksazure-databricksdbf

Reading a .dbf file within a Databricks notebook using Python


I'm quite new to Databricks and Python, and one thing in particular has been really stumping me - I'd be really grateful if someone may be able to point me in the right direction.

I am trying to read a really simple DBF file within a Databricks notebook, using the dbfread library.

The file I’m trying to read is “people.dbf” (from here) which is used in many of the examples within the dbfread docs.

I have put this DBF file in the root of my DBFS: file_in_dbfs

But after importing the dbfread module, I get the error below when I try to read a .dbf file: dbfread_error

The file definitely exists, I can see it with dbutils.fs.ls, and if I pretend it's a CSV, I can see the contents with spark.read.csv: works_ok_with_dbutils

I've tried using a few other DBF reading modules too (dbf,geopandas,simpledbf) and get the exact same error message. I've also tried copying the file to the local filesystem, and an external location - same error.

Does anyone know what I'm doing wrong please?


Solution

  • When working with files on Databricks the way how you access them on DBFS depends on the context. More detailed explanation can be found here. I guess, under the hood, it uses os module to open files so, I suggest you try with:

    DBF("/dbfs/people.dbf")