I have an Excel spreadsheet connected with an ACE.OLEDB.12.0 connection manager.
I'm having an issue with a column that should contain a date, but there is invalid data in the column.
Most of the entries consist of a date or blank.
The other entries are either a SPACE, "na", or "Dispute".
I found that I can get around the error by using the Input and Output Properties from the Excel source\Advanced Editor, by going to the Output Columns, select my column and change the DataType to DT_WSTR.
The SPACE, "na", and "Dispute" values do not get imported.
It's as if the Excel source is acting as a data converter and just dropping the non-date values. I was fully expecting to have to filter out the non-date values in a derived column, but I don't have to.
Does someone know why it does that?
Found the answer, it took a while to find, but here is the reason the Excel Source does what it does.
"The Excel driver reads a certain number of rows (by default, eight rows) in the specified source to guess at the data type of each column. When a column appears to contain mixed data types, especially numeric data mixed with text data, the driver decides in favor of the majority data type, and returns null values for cells that contain data of the other type. (In a tie, the numeric type wins.) Most cell formatting options in the Excel worksheet do not seem to affect this data type determination."
This is a very useful feature to have, if you don't want to change the perceived data type of the Excel spreadsheet.