Often we find ourselves doing directory scans and accumulating the results in an array. As often as not though this snowballs on us. Once we have our list of filenames we want to sort them. Once we have the sorted filenames we realize we need their sizes in bytes or other properties.
As we work through this we're faced with growing complexity. Our quick solution starts to bite us as every enhancement leads to another round of squashing subtle bugs.
Perhaps we should have paused to reflect on some decisions before diving in and coding:
How should we store these files' properties in memory? A number of options come to mind quickly:Alternative: ADO Recordset
A couple of obvious approaches to sorting come to mind as well:
- A two-dimensional Variant array.
- Concatenate the properties together as a String value and store it in a String array.
- An array of UDTs.
- A Collection or Dictionary of UDTs.
- Code and debug a sort.
- Write out a file, Shell a sort, and read it back.
- What if I want to sort on different keys at different times in my program?
- What if I need a list of all of the files in some places and just the .TXT files in others?
An ADO Recordset can help with all of these issues. This may seem a bit unconventional at first. We're used to thinking of ADO as a database connector technology, but Recordsets can be used in several powerful ways that don't involve an active database connection.
The strategy I'll discuss here is to use an constructed, unconnected Recordset. Like disconnected Recordsets where you make and break the database connections the work is done with client-side cursors. Unlike a disconnected Recordset, a constructed Recordset is never connected to a data source Provider.
A Recordset used in this manner can be thought of as a sort of Super Collection object. Unlike a Collection, each entry (or Record) can hold multiple values (or Fields). A Recordset also offers sorting, searching, and filtering with quite decent performance.
If you need to display your data, several standard controls can be data bound to an ADO Recordset.
Creating and Populating the Recordset
Creating such a Recordset is actually pretty easy.
Begin by creating an empty instance of the ADO Recordset class. Then add fields to the Recordset using the Append() method of the Recordset's Fields collection. If you haven't set the Recordset's CursorLocation property before your first Append(), CursorLocation is automatically set to adUseClient! Finally Open the Recordset, making it available for use.
At this point you can add one or more persistent indexes on your Recordset's fields. Persistent indexes, created by setting the field's "'Optimize" property to True, cause an index to be built for the field specified. This will greatly improve sort and search performance against the field in question, at the cost of some additional memory overhead:
Set oRS = New Recordset
.Append "FullName", adLongVarChar, 260
.Append "Size", adInteger
.Append "IsFolder", adBoolean
.Fields("FullName").Properties("Optimize") = True
.Fields("Size").Properties("Optimize") = True
Now we're ready to do a directory scan, recording the file and subdirectory properties as we go:
Working With the Recordset
Sub StoreFileInfo(ByVal FullPath As String)
!FullName = FullPath
!Size = FileLen(FullPath)
!IsFolder = (GetAttr(FullPath) And vbDirectory) <> 0
After loading data into the Recordset we can perform sorting, seaching, and filtering using the standard properties and methods. The Recordset can be navigated as we might any database Recordset. Here we make a single pass over the records and then we Close and dispose of the Recordset:
.Filter = "(IsFolder = False And FullName Like '*.TXT') Or " _
& "(IsFolder = False And FullName Like '*.LOG')"
.Sort = "Size DESC"
Do Until .EOF
Set oRS = Nothing
Note the quirky Filter expression used here. There are some limitations on how these expressions can be formed and you'll want to consult the ADO documentation for full details of the expression syntax available for Filter, Find, and Sort.
This example doesn't require sorting, but it was included as an illustration. Notice how trivial it was to add the sort! No awkward sorting logic had to be written or debugged.
Overhead and Limitations
The Recordset isn't free, but for most programs the overhead isn't serious compared to something like an array of UDTs. For specific situations you could probably hand-craft a better solution but the Recordset gives you a lot of power in an easy drop in package.
It does add another dependency in your program if it doesn't already use ADO for something else. This is probably not as bad as you might think.
Since ADO 2.1 the MDAC suite has included an "ADO 2.0" Type Library. If your program references this typelib it will work on any machine with ADO 2.0 or later installed. You need at least ADO 2.0 to get the required functionality we're looking at here and few machines do not have ADO 2.0 or later. Unless you are deploying to a fresh install of Windows 95 this approach adds nothing new to distribute with your program.
By referencing this typelib your program uses the newest MDAC installed on the target machine because each MDAC release installs a new ADO 2.0 Type Library that "points to" the new ADODB Library. One limitation is that new ADO functionality added since ADO 2.0 is not available, mostly features related to the Record and Stream objects introduced in ADO 2.5.
Of course you do not have SQL at your disposal when working with Recordsets this way.
Some field types are impractical in a Recorset even though an array, Collection, or Dictionary can handle them. Obvious cases include Variants and Objects.
Except for special cases it probably isn't worth trying to create a faster sort than that offered by ADO Recordsets. For general use the Recordset Sort, Filter, and Find operations are so quick it wouldn't matter if you could replace them with custom logic.
Some operations are simply slower done through ADO though, such as creating and then iterating through every record in the Recordset. This can be significantly faster for arrays and Collections.
Rob Macdonald has published several benchmarks. Here is one:
Data Structures Compared
Figures below are based on a 5000 row, 4 column data structure.
* Times are normalised on Variant Array performance
* Array sorting (Quicksort) is 400 times slower than iteration
Array RecordSet RecordSet Collection
Iterate 100 2,120 2,160 120
Find 1(1) 100 308 2.6 0.9/212(2)
Find n(1) 100 422 17 393
Sort(1) 100 3.2 3.0 7,076
(1) - timings include iterating through the results
(2) - the faster time is achieved if searching by collection
key is possible
This table shows a number of tradeoffs to consider. In summary, the Recordset's performance is strongest when sorting or searching (using an indexed recordset,) and weakest when you need to sequentially touch every record.
Those figures don't say anything about the performance and coding advantages of using data binding to present Recordset data. It can be significantly better than writing logic to pull every data element of interest out of memory, format the data, and insert it into a visible form control.
The attached VB6 sample program is based on one of mine, with several improvements and the addition of data binding by loquin and a little more of my own fiddling. It demonstrates how one might extract filesystem data into a Recordset and display it using data binding as well as some simple sorting and filtering.
ADO's Recordset can be used for many things besides database access. Here we've looked at how it can be used to replace arrays, Collections, or Dictionaries to process, organize, and present lists of information.
When your program needs to sort or search a list of data you might consider using a Recordset. It can save you the time required to write and debug sorts and searches written in VB. In many cases it offers a significant performance improvement over hand-coded logic. For multi-field data items a Recordset offers sort and seach versatility that would require even more complicated hand-written code.