While Excel is not a proper database, it is possible to connect to Excel files using ADO. This can be a convenient tool for importing Excel data into your database, or in situations where your first thought might be to use Excel automation.
However, Excel is not a true database, and Excel used as a data source has some quirks that may bite you if you don’t watch out – and they range from annoying (wrongly formatted data) to pretty bad (memory leaks). In this tutorial I will explain some of the most common issues that are particular to using ADO with Excel.
I assume that you already know the basics of ADO and SQL (if not, there’s an ADO tutorial in Tutor's Corner). I will ignore many of the finer points of ADO (cursor types, lock types etc) just to keep things simple and brief.
Using ADO can be safer, faster and simpler than automation – and unlike automation, it will work even if Excel is not installed on the computer. ADO is faster than automation, because it runs in-process, as opposed to out-of-process automation. ADO is also preferable for web applications since it is much more stable than server-side automation of Excel.
You can also use ADO from Excel VBA or in combination with Excel automation, to get data from other data sources and put it in Excel. You can use this for nice tricks such as worksheet functions that can read from closed Excel files, filtering data based on user-defined criteria, etc.
You can connect to an Excel file using either the OLE DB provider for Jet or ODBC. Personally, I have always used Jet.
The connection string looks like this (broken across several lines for ease of reading).
Data Source=Z:\MyFiles\Data source.xls;
Extended Properties="Excel 8.0;HDR=Yes"
The first part is fixed. The second part is the full path to your Excel file. The third part (extended properties) provides additional details about the file. Note the extra quotation marks around the Extended Properties part of the connection string.
Excel 8.0 is the Excel file version. This is not the same as the version of Excel that’s installed on the computer. Excel 8.0 applies to all workbooks created in any version of Excel from Excel97 onwards. (Should you find a workbook created in Excel 95, change that to 7.0.) To repeat, do not change the 8.0 to 10.0 just because you’re using Excel 2002.
HDR=Yes tells whether the first row in the Excel table has column headers (field names). Set this to Yes if the first row should be used as field names, and to No if the first row should be included as data. The fields will then be named F1, F2 etc.
Here’s a simple wrapper function for connecting to Excel:
Private Function GetExcelConnection(ByVal Path As String, _
Optional ByVal Headers As Boolean = True) As Connection
Dim strConn As String
Dim objConn As ADODB.Connection
Set objConn = New ADODB.Connection
strConn = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & Path & ";" & _
"Extended Properties=""Excel 8.0;HDR=" & _
IIf(Headers, "Yes", "No") & """"
Set GetExcelConnection = objConn
You will need version 4.0 of the Jet Provider to connect to an Excel file. If only version 3.51 is installed, you will get an error: “Couldn’t find installable ISAM”.
Excel files are not real databases, so they do not support multiple users. You cannot open multiple connections to the same file. And as long as your connection is open, you will not be able to open the file in Excel.
It is not possible to connect to Excel files that are password-protected; the Password parameter of the Open method will not help. This includes both files that require a password to be opened, and files where it's the workbook structure that's password-protected.
The simplest and most common thing you will likely want to do is to read data from Excel files. This can be useful in many situations as an alternative to automation – if Excel is not installed on the machine, if you want to read data from a closed workbook without opening it (in a worksheet function, for example), or if you simply want to be able to use SQL to filter your data.
In databases, data is organized in tables. In Excel workbooks, data are organized in worksheets and ranges. ADO considers both of these to be tables, and you can read both of them. To differentiate sheets from ranges, sheet names have to be followed by a $ sign. Ranges can be specified either by name or by address. You can also refer to worksheet-level named ranges (as opposed to workbook-level names).
'To read a sheet:
strRequest = "SELECT * FROM Sheet1$"
'To refer to a range by its address:
strRequest = "SELECT * FROM Sheet1$A1:D10"
'To refer to a single-cell range, pretend it's a multi-cell range
'and specify both the top-left and bottom-right cells:
strRequest = "SELECT * FROM Sheet1$A1:A1"
'To read a named range:
strRequest = "SELECT * FROM MyDataRange"
'To read a worksheet-level named range
strRequest = "SELECT * FROM Sheet1$MyData"
'and then just open a recordset
objRS.Open strRequest, objConn
All of this will work on visible as well as hidden and xlVeryHidden sheets.
Note that if you specify a range address, and that range extends outside the used range of the sheet (i.e. the range that would be returned if you pulled in the whole sheet), you will not get blank records for the empty unused rows. For example, if there are data in the first 15 rows in Sheet2, with headers in the first row, then [Sheet2$] would return 14 records and [Sheet2$A1:B100] will also return the same 14 records.
Excel is not a database, so it does not provide ADO with information about the data types for each column / field. When you open an Excel table, Jet will scan the first 8 rows of the table and attempt to guess at the data types.
This works fine as long as the data are not mixed, i.e. as long as each column only has data of one type. If the column mixes data types, the majority decides, and the minority are killed off. For example, if the data mixes strings and numbers, and there are more strings than numbers, then the strings will be returned properly, but the numbers will be replaced with null values.
You can override this behaviour, so that all data are read as text, by setting another extended property: IMEX=1. That tells the driver to use Import mode, which in turn forces it to follow the registry setting ImportMixedTypes=Text.
You may also have trouble if you have string data of varied length. If the first 8 rows contain short strings (< 255 characters), the driver will identify the field as a text field. If there are longer strings later on, after the first 8 rows, those will not fit into the space allocated. This can be fixed by changing the TypeGuessRows registry setting. Read more here: KB 189897: Data Truncated to 255 Characters with Excel ODBC Driver
Getting a list of available tables (worksheets and ranges)
What if you don’t know in advance which tables (worksheets) the workbook will contain? For example, you might want to get all the worksheets, but not know how many there are. No problem – you can get a list of tables using ADO.
You can take advantage of the naming convention for worksheets to separate worksheets from named ranges. If the table name ends with $, it's a worksheet, otherwise it's a named range.
Set objConn = GetExcelConnection("c:\temp\data source.xls")
Set objRS = objConn.OpenSchema(adSchemaTables)
Do While Not objRS.EOF
strTable = objRS.Fields("table_name").Value
If Right$(strTable, 1) = "$" Then
strWorksheetList = strWorksheetList & vbCrLf & strTable
strRangeList = strRangeList & vbCrLf & strTable
MsgBox "Worksheets:" & strWorksheetList & vbCrLf & vbCrLf & "Ranges:" & strRangeList
The contents of the list appears to vary depending on whether the workbook is open (not particularly logically):
For a closed workbook, this list will not include hidden sheets. However it will include named ranges in hidden sheets, and hidden names.
For a workbook that is already open in Excel (on the same computer), the list will include hidden sheets as well.
The usual SQL commands like UPDATE and INSERT INTO work nicely with Excel tables. Obviously the Excel file should not be open elsewhere if you try to do this, or you will get some rather bizarre behaviour, in addition to the memory leak mentioned above. (The changes will be applied to a read-only copy which gets opened in a hidden instance of Excel. Definitely not something I’d recommend.)
For INSERT INTO, the table should be set up beforehand – the column headers should be in place before you try to add data to the table. The data type for each field will be read from existing data in the table (see above under “reading data from Excel”). If there is no data (only column headers), then all fields will be treated as text.
This is where Excel really stops working like a database. Effectively you cannot delete records from an Excel file with ADO.
A simple “DELETE * FROM” will simply raise an error. With some creativity (such as using DROP TABLE) you can get to a point where it looks almost like you’ve deleted the records, but at best you can clear the cells’ contents. The cells will still be in use, so if you add records to the sheet, they will be added below the blank lines in the worksheet.
My recommendation is to not use ADO to delete data from an Excel workbook.
You can add worksheets to existing Excel files using the SELECT * INTO command.
The following will create a new sheet named Sheet2 in the file named data source.xls, and copy the contents of Sheet1 into that new sheet:
strRequest = "SELECT * INTO [C:\temp\data source.xls].[Sheet2] FROM [Sheet1$]"
Important: note how the name of the existing worksheet is followed by a $ sign, but the name of the new sheet is not. If the sheet already exists, this will raise an error.
With slight modifications, the same approach works for creating new workbooks as well. We just need to add a bit that tells ADO that the new database should be an Excel workbook:
strRequest = "SELECT * INTO [Excel 8.0;DATABASE=c:\temp\data copy.xls].[Sheet1] FROM [Sheet1$]"
Since this file creation method completely bypasses Excel itself, the newly-created file may not look like the files you create manually in Excel. In particular this will ignore the “Sheets in new workbook” and “Standard font size” settings in Excel.
Instead of creating new sheets every time, you may have an Excel report that you want to refresh regularly by replacing old data with new data. This would allow you to refer to the data in formulas in other sheets etc. Most of the time this can be done more easily from the Excel side (pulling instead of pushing), but ADO may be faster because you don’t have to automate Excel.
Because we cannot delete records in Excel, you won’t be able to refresh the table by deleting the old records and then inserting new ones in their place. The best approach in that case seems to be to use two recordsets. Open the source and the target, loop through both recordsets and transfer the contents of the source to the target, field by field. If your result set may change in size, watch out for recordsets of different sizes – remember, you cannot make the initial target recordset larger than the current used range in the sheet, so you may have to switch from replacing to adding when you reach the end of the “available space”.
Until now, all I’ve talked about is how your application can manipulate Excel files from the outside. Excel itself has one very useful method for working with recordsets: Range.CopyFromRecordset. You can use ADO to get data from an external source, and then put the entire recordset in your worksheet with a single command, with no tedious record-by-record, field-by-field looping. Field names have to be copied one-by-one, though.
Here’s a simple example of a VBA procedure that puts the recordset and (optionally) the field names in a worksheet range:
Public Sub PutRecordsetInSheet(ByVal RS As Recordset, ByVal TopLeft As Range, _
Optional ByVal Headers As Boolean = True)
'Lists field names in the top row if Headers=True, and pastes the records
Dim objField As ADODB.Field
Dim i As Integer
If Headers Then
For Each objField In RS.Fields
i = i + 1
TopLeft.Cells(1, i).Value = objField.Name
TopLeft.Cells(2, 1).CopyFromRecordset RS
TopLeft.Cells(1, 1).CopyFromRecordset RS
The ImportMixedTypes registry key is always read (whether it is
honored is discussed later). You can test this by changing the key to
ImportMixedTypes=OneDayWhen and trying to use the ISAM: you get the
error, 'Invalid setting in Excel key of the Engines section of the
Windows Registry.' The only valid values are:
Data type is determined column by column. 'Majority Type' means a
certain number of rows (more on this later) in each column are scanned
and the data types are counted. Both a cell's value and format are
used to determine data type. The majority data type (i.e. the one with
the most rows) decides the overall data type for the entire column.
There's a bias in favor os numeric in the event of a tie. Rows from
any minority data types found that can't be cast as the majority data
type will be returned with a null value.
For ImportMixedTypes=Text, the data type for the whole column will be:
Jet (MS Access UI): 'Text' data type
ADO: adWChar ('a null-terminated Unicode character string')
Note that this is distinct from:
Jet (MS Access UI): 'Memo' data type
ADO: adLongVarWChar ('a long null-terminated Unicode string value')
ImportMixedTypes=Text will curtail text at 255 characters as 'Memo' is
cast as 'Text'. For a column to be recognized as 'Memo', majority type
must be detected, meaning the majority of rows detected must contain
256 or more characters.
But how many rows are scanned for each column before is decided that
mixed and/or what the majority type is? There is a second registry
Key, TypeGuessRows. This can be a value from 0-16 (decimal). A value
from 1 to 16 inclusive is the number of rows to scan. A value of zero
means all rows will be scanned.
There is one final twist. A setting of IMEX=1 in the connection
string's extended property determines whether the ImportMixedTypes
value is honored. IMEX refers to IMport EXport mode. There are three
possible values. IMEX=0 and IMEX=2 result in ImportMixedTypes being
ignored and the default value of 'Majority Types' is used. IMEX=1 is
the only way to ensure ImportMixedTypes=Text is honored. The resulting
connection string might look like this:
Data Source=C:\ db.xls;
Extended Properties='Excel 8.0;HDR=Yes;IMEX=1'
Finally, although it is mentioned in MSDN articles that MAXSCANROWS
can be used in the extended properties of the connection string to
override the TypeGuessRows registry keys, this seems to be a fallacy.
Using MAXSCANROWS=0 in this way never does anything under any
circumstances. Put another way, is has just the same effect as putting
ONEDAYWHEN=0 in the extended properties, being none (not even an
error!). The same applied to ImportMixedTypes i.e. can't be used in
the connection string to override the registry setting.
In summary, use TypeGuessRows to get Jet to detect whether a 'mixed
types' situation exists or use it to 'trick' Jet into detecting a
certaint data type as being the majority type. In the event of a
'mixed types' situation being detected, use ImportMixedTypes to tell
Jet to either use the majority type or coerce all values as 'Text'
(max 255 characters).
Thanks to 'onedaywhen' for distributing this information.
"He's not the Messiah. He's a very naughty boy!" - Brian's mum