I am trying to open en transform several DBF files to a dataframe. Most of them worked fine, but for one of the files I receive the error: "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 15: invalid start byte"
I have read this error on some other topics such as opening csv and xlsx and other files. The proposed solution was to include encoding = 'utf-8'
in the reading the file part. I haven't found a solution for DBF files unfortunately and I have very limited knowledge on DBF files.
What I have tried so far:
1)
from dbfread import DBF
dbf = DBF('file.DBF')
dbf = pd.DataFrame(dbf)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8: character maps to <undefined>
2)
from simpledbf import Dbf5
dbf = Dbf5('file.DBF')
dbf = dbf.to_dataframe()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 15: invalid start byte
3)
# this block of code copied from https://gist.github.com/ryan-hill/f90b1c68f60d12baea81
import pysal as ps
def dbf2DF(dbfile, upper=True): #Reads in DBF files and returns Pandas DF
db = ps.table(dbfile) #Pysal to open DBF
d = {col: db.by_col(col) for col in db.header} #Convert dbf to dictionary
#pandasDF = pd.DataFrame(db[:]) #Convert to Pandas DF
pandasDF = pd.DataFrame(d) #Convert to Pandas DF
if upper == True: #Make columns uppercase if wanted
pandasDF.columns = map(str.upper, db.header)
db.close()
return pandasDF
dfb = dbf2DF('file.DBF')
AttributeError: module 'pysal' has no attribute 'open'
And last, if I try to install the dbfpy
module, I receive:
SyntaxError: invalid syntax
Any suggestions on how to solve this?
Try using my dbf
library:
import dbf
table = dbf.Table('file.DBF')
Print it to see if an encoding is present in the file:
print table # print(table) in Python 3
One of my test tables looks like this:
Table: tempy.dbf
Type: dBase III Plus
Codepage: ascii (plain ol ascii)
Status: DbfStatus.CLOSED
Last updated: 2019-07-26
Record count: 1
Field count: 2
Record length: 31
--Fields--
0) name C(20)
1) desc M
The important line being the Codepage
line -- it sounds like that is not properly set for your DBF
file. If you know what it should be, you can either open it with that codepage (temporarily) with:
table = dbf.Table('file.DBF', codepage='...')
Or you can change it permanently (updates the DBF
file) with:
table.open()
table.codepage = dbf.CodePage('cp1252') # for example
table.close()