I've read a .xlsb file and parsed date columns using a code below:
dateparser = lambda x: pd.to_datetime(x)
data = pd.read_excel(r"test.xlsb", engine="pyxlsb",
parse_dates=["start_date","end_date"],
date_parser=dateparser
)
My input columns in the .xlsb file have format DD/MM/YYYY (e.g. 26/01/2008). As an output of the above-mentioned code I get, for example: 1970-01-01 00:00:00.000038840. Only the last 5 digits changes.
If I read the same file without parsing dates, the same columns are of float64 type and containing only the last 5 digits of output before (e.g. 38840.0).
I assume this is a problem associated with date encoding itself. Does anyone know how to fix this issue?
I am not sure if you were able to figure out the answer to this problem. But, below is how I resolved it:
from pyxlsb import convert_date
self.data: pd.DataFrame = pd.read_excel(self.file, sheet_name=self.sheet, engine='pyxlsb', header=0)
self.data["test"] = self.data.apply(lambda x: convert_date(x.SomeStupidDate), axis=1)
More details can be found here: https://pypi.org/project/pyxlsb/ by doing ctrl+F for "convert_date".