I'm dealing with a legacy application written in VB6 which reads in Excel spreadsheets and inserts them into a database.
It works for the most part, but if the data in the spreadsheet does not start on the first row, the first data row gets duplicated.
Say for example the first 3 rows in the spreadsheet are blank, the top four rows of data look like the following:
_| A | B | C | D | E | F | G |
1| | | | | | | |
2| | | | | | | |
3| | | | | | | |
4| 99 |Text1|Text2|Text3|Text4|Text5| 77 |
The application connects to the Excel spreadsheet and reads it in using the following code:
Public Function obtainConnectionExcel(sql_conn, uid) As Variant
Dim cn As Object
Set cn = CreateObject("ADODB.Connection")
On Error Resume Next
cn.Provider = "Microsoft.ACE.OLEDB.12.0"
cn.Properties("Extended Properties").Value = "Excel 12.0;ReadOnly=True;HDR=No;IMEX=1"
If (Err <> 0) Then
cn.Provider = "Microsoft.Jet.OLEDB.4.0"
cn.Properties("Extended Properties").Value = "Excel 8.0;ReadOnly=True;HDR=No;IMEX=1"
End If
On Error Resume Next
cn.open getSpreadsheetPath(sql_conn, uid)
Set obtainConnectionExcel = cn
Exit Function
End Function
.....
Public Function extractAllData(parameters) As String
..... 'Variable declarations etc
On Error Resume Next
Set dbo_conn = obtainConnectionExcel(sql_conn, uid)
If Err <> 0 Then
....'logs error, goes to error handler
End If
On Error GoTo ErrorHandler
If (dbo_conn.State = 1) Then
rownumber = 1
Do While rownumber <= numberOfRowsToGet
For x = lettercount To lettercount + lettercount_offset
letter = Chr(x)
sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
On Error Resume Next
Set rs = dbo_conn.execute(sSql)
If (Not rs.EOF) Then
'inserts the data into the db
End If
Next x
rownumber = rownumber + 1
Loop
.... 'Post processing
Exit Function
....'Error handlers
End Function
That should be the relevant code. The issue occurs at the lines:
sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
On Error Resume Next
Set rs = dbo_conn.execute(sSql)
When the data is read in, regardless of whether we are using JET or ACE, the data is returned this way:
_| A | B | C | D | E | F | G |
1| 99 | | | | | | 77 |
2| 99 | | | | | | 77 |
3| 99 |Text1|Text2|Text3|Text4|Text5| 77 |
4| 99 |Text1|Text2|Text3|Text4|Text5| 77 |
I have tried connecting to the spreadsheet and getting the data in a multitude of ways, but nothing seems to work - either the connection would fail, or the data would be nothing but nulls.
I have found some work-arounds - for example, if I enter a space character into cell A1, the issue no longer occurs. However, I would like a program-based solution, rather than having to tell users to do extra steps to avoid this.
It only copies the first row of data. If the data in the cell is a number, then it copies the data into every cell above it in that column, if it is text then it only goes up by one level.
An interesting note is that, if I alter the spreadsheet to say that all of the data is text, it then copies every cell as if they are numbers (i.e. into every single cell above, rather than one single row)
All in all this is rather irritating - since I don't have any luck when searching for this issue, I can only conclude that we are doing something wrong, or very few people have bothered with this type of test data.
[edit] After some investigation I have made some headway in solving this - "The provider assumes that your table of data begins with the upper-most, left-most, non-blank cell on the specified worksheet" (http://support.microsoft.com/default.aspx?scid=kb;en-us;257819 ). This is confirmed if I use a statement to select the entire worksheet - it only returns the block of data.
So when I select any cell that is outside of that range, the provider, rather than doing something sensible like returning null, returns the data from the upper-most non-empty cell from that particular column.
I could hypothetically change the system so that it simply grabs all of the data and assumes that the upper-most left-most cell is cell A1, but that would break compatibility with data that is already out there.
What I now need is a way to get the cell-references for the data that is returned, so I can treat it appropriately, or a way to force this to no longer happen.
To answer my own question: It looks like you can't. You can, however, try to code around the issue.
The JET and ACE providers both take the upper-most left-most non-empty cell as being the start of the dataset (http://support.microsoft.com/default.aspx?scid=kb;en-us;257819 ) So, when you try to get the value from a cell that occurs before the start of that dataset, rather than doing something sensible and returning null, the providers return a guess based upon the top row of data.
I could find no way to get the cell-references of the returned dataset from the provider - it labelled everything as F1, F2 etc. ("Field 1", "Field 2")
So there were two remaining solutions:
1) Grab the entire dataset at once, assume the data starts at A1, and insert it into the DB using that assumption. This unfortunately would result in breaking compatability with pre-existing data.
2) Programmatically figure out the cell references, and input the data correctly. I did this using the following abbreviated code
sSql = "SELECT * FROM [" & worksheet & "$]"
Set rs = dbo_conn.execute(sSql)
rownumber = 1
If Not rs.EOF Then
oledata_array = rs.GetRows()
Else
ReDim oledata_array(0, 0)
End If
Do While rownumber <= numberOfRowsToGet
col_number = 1
For x = lettercount To lettercount + lettercount_offset
letter = Chr(x)
sSql = "SELECT * FROM [" & worksheet & "$" & letter & rownumber & ":" & letter & rownumber & "]"
On Error Resume Next
Set rs = dbo_conn.execute(sSql)
If Not rs.EOF Then
If rs(0) <> "" Then
If x < furthest_left Then
furthest_left = x
End If
If x > furthest_right Then
furthest_right = x
End If
If rownumber > bottom_of_set Then
bottom_of_set = rownumber
End If
Else
End If
End If
col_number = col_number + 1
Next x
rs.MoveNext
rownumber = rownumber + 1
Loop
rs.Close
top_of_set = bottom_of_set - UBound(oledata_array, 2)
If CLng(UBound(oledata_array, 1)) <> CLng(furthest_right - furthest_left) Then
'log the fact that there is a discrepency, and continue
End If
'now have the co-ords of the "square" of data as it occurs in the spreadsheet
rownumber = 1
row_index = 0
Do While rownumber <= numberOfRowsToGet
col_number = 1
For x = lettercount To lettercount + lettercount_offset
letter = Chr(x)
'construct the first chunk of the sql insert string
If (x <= furthest_right) And (x >= furthest_left) And (rownumber <= bottom_of_set) And (rownumber >= top_of_set) Then
sSql = sSql & "'" & oledata_array(col_number - 1, row_index) & "'"
col_number = col_number + 1
Else
sSql = sSql & "''"
End If
'finish the sql string and execute
Next x