Go Back  Xtreme Visual Basic Talk > Legacy Visual Basic (VB 4/5/6) > Knowledge Base > Tutors' Corner > Processing Text Databases


Reply
 
Thread Tools Display Modes
  #1  
Old 04-19-2008, 10:34 AM
dilettante's Avatar
dilettante dilettante is offline
Underclocked lifestyle

Forum Leader
* Guru *
 
Join Date: Feb 2005
Location: Michigan, USA
Posts: 4,184
Default Processing Text Databases


We often find ourselves working with tabular information presented to us in the form of text files. Each file has a series of lines of text, with each line having the same type of columns in each row. Something like a dump of a database table: for each different table we have a different file, each row having the same fields.

Data in this form can be processed by using the standard data access components we normally associate with databases. There is a standard Microsoft ODBC driver for Text as well as a Text Installable ISAM for the Jet OLE DB provider that work very similarly. They offer a good deal of flexibility, and within limits and with care it is possible for them to process a wide range of both delimited and fixed-field text.

I haven't found any single reference covering the nuances of this and I constantly find huge gaps in what I know. I'll try to share some of what I do know here though.

Text Databases

These text processing tools provide a view of text files as a sort of database. By doing this they make it possible for ADO and DAO to work with text data.

The text equivalent of a database is a directory. The equivalent of a database table is a text file within this directory. Beyond this we need metadata, or "data about the data" which is also called the schema.

Text Schemas

The standard text data access components can obtain schema information in three ways:
  • By scanning a text file for certain patterns.
  • Through hints in the extended properties passed from the data access connection string or extended property settings.
  • By loading a Schema.ini file stored in the same directory alongside the text data files.
The most flexible and comprehensive of these is the Schema.ini file.

Collected Notes

I have collected some of my own notes and a lot of excerpts from MSDN pages and various related Help files into a rather disorganized blob of text as an RTF file. It covers some of the details of working with MDAC Text components and in particular collects a lot more detail on schema.ini file syntax and semantics. Most of this is Microsoft's material though, and not mine.

I hope you find it useful (attached).

Script For Testing and Demos

One easy way to experiment with these things is via Windows scripts. We don't always appreciate how lucky we are to have such a powerful scripting system offering both the ability to use many of the same system components as Visual Basic as well as a language that so strongly parallels Visual Basic itself. You can develop and debug logic in VBScript that can be copied and pasted into a Visual Basic program and run with few if any changes.

By using the new .WSF file format for WSH scripts instead of the older "naked .VBS" format you can use declarative syntax to create many objects as well as referencing libraries to load their typelibs. The latter gets around manually defining constants already defined within these libraries, and is much like defining a library reference in VB.

I define a function ScriptPath() in most of the examples presented here. Think of it as the equivalent of VB's App.Path property.

Running the examples is as simple as double-clicking on the script file's icon in Windows Explorer.

Example 1

Note: The attached zip archive contains the sample code and data presented in this series of posts.

Common tabular text files we might want to process include simple Comma Separated Value (CSV) files. These are fairly easy to read in VB simply by using native file I/O with Input# statements. But by using data access components we can apply SQL to the data instead of writing lots of procedural logic.

The MDAC Text components can be used with SQL Server, ODBC, and other Providers. The examples here will make use of the Jet Provider which is easy to use in a standalone manner.

Here the data files (input and output) are stored in the same directory as the script.

Customers.txt
Code:
James,Raymond,,4263 Evergreen Terrace,TallTree,VA,23666
Williams,Sandra,Eversharp Paper,8 Corporate Plaza,Dallas,TX,75299
Ash,Random,Spork Industries,2121 Industrial Way,Boise,ID,83707
Crurae,Julie,,1 Fashionable Place,Beverly Hills,CA,90210
Williams,John,,1311 Main,Dallas,TX,75013
CreateML.wsf
Code:
<JOB id="CreateML">
  <REFERENCE object = "ADODB.Recordset"/>
  <OBJECT id = rsCust progId = "ADODB.Recordset"/>
  <REFERENCE object = "Scripting.FileSystemObject"/>
  <OBJECT id = FSO progId = "Scripting.FileSystemObject"/>
  <SCRIPT language = "VBScript">
    Option Explicit
    Private tsLabels, blnCompany

    Function ScriptPath()
      ScriptPath = Left(WScript.ScriptFullName, _
                        InStrRev(WScript.ScriptFullName, "\") - 1)
    End Function

    rsCust.Open "SELECT * FROM [Customers.txt] ORDER BY F1, F2 ASC", _
                "Provider=Microsoft.Jet.OLEDB.4.0;" _
              & "Data Source='" & ScriptPath() & "';" _
              & "Extended Properties='Text;FMT=Delimited;HDR=No'", _
                adOpenStatic
    Set tsLabels = FSO.CreateTextFile(ScriptPath() & "\Labels.txt", _
                                      True, False)
    Do Until rsCust.EOF
      With tsLabels
        .WriteLine rsCust("F2").Value & " " & rsCust("F1").Value
        blnCompany = Len(rsCust("F3").Value) > 0
        If blnCompany Then .WriteLine rsCust("F3").Value
        .WriteLine rsCust("F4").Value
        .WriteLine rsCust("F5").Value & ", " & rsCust("F6").Value _
                 & "  " & rsCust("F7").Value
        If Not blnCompany Then .WriteLine
        .WriteLine
      End With
      rsCust.MoveNext
    Loop

    tsLabels.Close
    MsgBox CStr(rsCust.RecordCount) & " labels formatted"
    rsCust.Close
  </SCRIPT>
</JOB>
Labels.txt
Code:
Random Ash
Spork Industries
2121 Industrial Way
Boise, ID  83707

Julie Crurae
1 Fashionable Place
Beverly Hills, CA  90210

Raymond James
4263 Evergreen Terrace
TallTree, VA  23666

John Williams
1311 Main
Dallas, TX  75013

Sandra Williams
Eversharp Paper
8 Corporate Plaza
Dallas, TX  75299
This is the simplest case. As your requirements become more complicated you may need to get slightly more sophisticated.
Attached Files
File Type: rtf SchemaNotes.rtf (66.2 KB, 48 views)
File Type: zip Examples.zip (13.3 KB, 44 views)
Reply With Quote
  #2  
Old 04-19-2008, 10:43 AM
dilettante's Avatar
dilettante dilettante is offline
Underclocked lifestyle

Forum Leader
* Guru *
 
Join Date: Feb 2005
Location: Michigan, USA
Posts: 4,184
Default

So far we've looked at the simplest case of using Microsoft Data Access Components (MDAC) to work with text files containing tabular data. We had a simple CSV file and treated it as a sort of database table.

There are many common variations on this theme however, and here I'll go into some of the simple ones.

Header Row (Example 2-1)

It is pretty typical for an exported CSV file to contain a header row. This is the first line of the file and in place of column data each column contains a field name instead.

Customers.txt
Code:
Last Name,First Name,Company,Street,City,State,Zip
James,Raymond,,4263 Evergreen Terrace,TallTree,VA,23666
Williams,Sandra,Eversharp Paper,8 Corporate Plaza,Dallas,TX,75299
Ash,Random,Spork Industries,2121 Industrial Way,Boise,ID,83707
Crurae,Julie,,1 Fashionable Place,Beverly Hills,CA,90210
Williams,John,,1311 Main,Dallas,TX,75013
It is possible to use the Extended Properties to inform MDAC about this embedded bit of metadata. You just set HDR=Yes in the connection string. Of course now we won't get the default column (field) names of Fn anymore, so we need to update every reference to the field names in the script:

CreateML.wsf
Code:
<JOB id="CreateML">
  <REFERENCE object = "ADODB.Recordset"/>
  <OBJECT id = rsCust progId = "ADODB.Recordset"/>
  <REFERENCE object = "Scripting.FileSystemObject"/>
  <OBJECT id = FSO progId = "Scripting.FileSystemObject"/>
  <SCRIPT language = "VBScript">
    Option Explicit
    Private tsLabels, blnCompany

    Function ScriptPath()
      ScriptPath = Left(WScript.ScriptFullName, _
                        InStrRev(WScript.ScriptFullName, "\") - 1)
    End Function

    rsCust.Open "SELECT * FROM [Customers.txt]" _
              & " ORDER BY [Last Name], [First Name] ASC", _
                "Provider=Microsoft.Jet.OLEDB.4.0;" _
              & "Data Source='" & ScriptPath() & "';" _
              & "Extended Properties='Text;FMT=Delimited;HDR=Yes'", _
                adOpenStatic
    Set tsLabels = FSO.CreateTextFile(ScriptPath() & "\Labels.txt", _
                                      True, False)
    Do Until rsCust.EOF
      With tsLabels
        .WriteLine rsCust("First Name").Value & " " _
                 & rsCust("Last Name").Value
        blnCompany = Len(rsCust("Company").Value) > 0
        If blnCompany Then .WriteLine rsCust("Company").Value
        .WriteLine rsCust("Street").Value
        .WriteLine rsCust("City").Value & ", " & rsCust("State").Value _
                 & "  " & rsCust("Zip").Value
        If Not blnCompany Then .WriteLine
        .WriteLine
      End With
      rsCust.MoveNext
    Loop

    tsLabels.Close
    MsgBox CStr(rsCust.RecordCount) & " labels formatted"
    rsCust.Close
  </SCRIPT>
</JOB>
By default the text of a header or data column can also be quoted with quotation marks. This allows data to contain commas even when the comma is otherwise treated as the field delimiter.

Other Formatting

At this point we have just about reached the limit of the metadata that can be supplied via the connection string however. If your text files vary much from the patterns already described it is time to look at providing a schema metadata file.

Schema.ini

As the name suggests, our schema information takes the form of an INI file. More than that the schema file must be called exactly schema.ini but as usual in Windows the name is not case-sensitive.

This file must be present in the same folder as the text file (or files) we want to process. When this isn't the case it may be a simple matter to create a temporary schema.ini file in the target folder while processing the text data and then clean it up afterward. One way to make this less painful might be to store your schema.ini text (or a template for it) as a custom resource in your program.

When your text source is a read-only location such as a CD you may need to get more clever, perhaps copying the data to a temporary folder location on a read/write drive.

Schema Sections

Each INI Section in schema.ini is the simple filename of the data text file being described. As always Section headers in INI files are bracketed, as in:

[customers.txt]

Section Contents

Each schema.ini Section contains a series of lines of text consisting of a Key Name, an equals sign, and a Value String. Once again this is the generic form of Key lines in any INI file.

The set of Keys and Values used is pretty consistent across the MDAC Text components (Text ODBC Driver, Jet Text IISAM, etc.). Some of the common ones include Format, MaxScanRows, and ColNameHeader. Example:
Code:
Format=CSVDelimited
MaxScanRows=1
ColNameHeader=True
MaxScanRows declares how many lines of the text data file should be examined for content in trying to guess at what type each column contains: Text, DateTime, Long, Single, etc. If ColNameHeader=True the column names row is not included in this count. Setting MaxScanRows to 0 tells the MDAC component to scan every row in the file to make this determination, which could be costly for large files. I haven't found any value that means "don't scan."

The purpose of these settings is to override default values, which are stored in the registry. Programs could alter the registry settings to accomplish the desired results but the risk of impact on other applications becomes quite high. Schema.ini gives you a safer and easier way to specify how you want the MDAC Text components to operate.

Comments

Schema.ini can also contain comment lines, signified by a leading semicolon ";" character.

What you can't do is use ";" comments at the ends of Key/Value lines.

Wrong: MaxScanRows=1 ;Scan only first row.

Tab Delimited Data (Example 2-2)

So one of the things we can handle now is text data delimited by symbols other than a comma.

While you could alter the default for delimited text via a registry change we can use a schema.ini file to set the delimiter type to the horizontal tab character. If our text looks like:
Code:
Last Name{TAB}First Name{TAB}Company{TAB}Street{TAB}City{TAB}State{TAB}Zip
James{TAB}Raymond{TAB}{TAB}4263 Evergreen Terrace{TAB}TallTree{TAB}VA{TAB}23666
Williams{TAB}Sandra{TAB}Eversharp Paper{TAB}8 Corporate Plaza{TAB}Dallas{TAB}TX{TAB}75299
Ash{TAB}Random{TAB}Spork Industries{TAB}2121 Industrial Way{TAB}Boise{TAB}ID{TAB}83707
Crurae{TAB}Julie{TAB}{TAB}1 Fashionable Place{TAB}Beverly Hills{TAB}CA{TAB}90210
Williams{TAB}John{TAB}{TAB}1311 Main{TAB}Dallas{TAB}TX{TAB}75013
... where {TAB} is a tab character.

We can place a schema.ini file such as:
Code:
[Customers.txt]
;We have Tab delimiters.
Format=TabDelimited
MaxScanRows=1
ColNameHeader=True
... next to the file in its folder and change our connection string's Extended Properties substring just slightly as in:
Code:
    rsCust.Open "SELECT * FROM [Customers.txt]" _
              & " ORDER BY [Last Name], [First Name] ASC", _
                "Provider=Microsoft.Jet.OLEDB.4.0;" _
              & "Data Source='" & ScriptPath() & "';" _
              & "Extended Properties='Text'", _
                adOpenStatic
This results in the same sort of simple mailing label output as in Example 1 and Example 2-1.
Reply With Quote
  #3  
Old 04-19-2008, 10:50 AM
dilettante's Avatar
dilettante dilettante is offline
Underclocked lifestyle

Forum Leader
* Guru *
 
Join Date: Feb 2005
Location: Michigan, USA
Posts: 4,184
Default

We've used extra parameters in the connection string's Extended properties substring to gain some control over the processing of tabular text data files using the MDAC Text components. We've also seen how a schema.ini file can offer us a little more control, such as selecting a non-default column delimiter symbol (tab instead of comma).

We can do a lot more using schema.ini files though.

Alternate Delimiters

Comma and tab are fine, but what if we have some really arbitrary column delimiter in our text files? Within limits the schema.ini definitions can help us here too.

You can define almost any normal "printable" character except the double quote mark as your column delimiter by using the Format string value. Besides the special values TabDelimited and CSVDelimited the Format string value can be Delimited(x) where x is the desired symbol. Example:

Format=Delimited(|)

Going Beyond MaxScanRows (Example 3-1)

The automatic "typing" of columns by scanning text rows can be convenient but may present headaches at times. There are instances where you need your data more strongly typed than this might result in too.

Another important consideration is that you can use the MDAC Text components for output as well as input and you might want more control over what happens during the conversion.

When relying on row scanning, types are chosen based on the "best fit" type found in the "majority of the rows" included in the first MaxScanRows of data rows in the text file. If this gets set to a numeric type then subsequent rows with non-numeric strings typically get set to Null or Error values.

Consider the following case based on the previous tab-delimited example:

Customers.txt
Code:
Last Name{TAB}First Name{TAB}Company{TAB}Street{TAB}City{TAB}State{TAB}Zip
Filbert{TAB}Euronymous{TAB}{TAB}123 Main St{TAB}Plaistow{TAB}NH{TAB}03865
James{TAB}Raymond{TAB}{TAB}4263 Evergreen Terrace{TAB}TallTree{TAB}VA{TAB}23666
Williams{TAB}Sandra{TAB}Eversharp Paper{TAB}8 Corporate Plaza{TAB}Dallas{TAB}TX{TAB}75299
Ash{TAB}Random{TAB}Spork Industries{TAB}2121 Industrial Way{TAB}Boise{TAB}ID{TAB}83707
Crurae{TAB}Julie{TAB}{TAB}1 Fashionable Place{TAB}Beverly Hills{TAB}CA{TAB}90210
Williams{TAB}John{TAB}{TAB}1311 Main{TAB}Dallas{TAB}TX{TAB}75013
Labels.txt
Code:
Random Ash
Spork Industries
2121 Industrial Way
Boise, ID  83707

Julie Crurae
1 Fashionable Place
Beverly Hills, CA  90210

Euronymous Filbert
123 Main St
Plaistow, NH  3865

Raymond James
4263 Evergreen Terrace
TallTree, VA  23666

John Williams
1311 Main
Dallas, TX  75013

Sandra Williams
Eversharp Paper
8 Corporate Plaza
Dallas, TX  75299
What went wrong here?

Mr. Filbert's Zip Code has the wrong number of digits in the Labels output because in the input it has a leading zero. We want to preserve the exact value.

Explicit Column Typing (Example 3-2)

By relying on the scan the Zip field got typed as Long, not Text.

We can fix this by providing explicit typing via our schema.ini file:

Schema.ini
Code:
[Customers.txt]
;We have Tab delimiters.
Format=TabDelimited
MaxScanRows=1
ColNameHeader=True
;We require defined field types.
;Quotes are only req'd around names with spaces.
Col1="Last Name" Text
Col2="First Name" Text
Col3="Company" Text
Col4="Street" Text
Col5="City" Text
Col6="State" Text
Col7="Zip" Text
Labels.txt
Code:
Random Ash
Spork Industries
2121 Ind
Boise, ID  83707

Julie Crurae
1 Fashio
Beverly Hills, CA  90210

Euronymous Filbert
123 Main
Plaistow, NH  03865

Raymond James
4263 Eve
TallTree, VA  23666

John Williams
1311 Mai
Dallas, TX  75013

Sandra Williams
Eversharp Paper
8 Corpor
Dallas, TX  75299
What about MaxScanRows? Well I still set it to 1 myself. Leaving it out or setting it to 0 (or nothing after the equals sign) risks a full-file scan and who wants that overhead? I'd like to think that defining the Coln Key values eliminates this, but I'm cautious and suspicious.

Note that the field definitions do not need to be in sequence, but they can't be defined sparsely or you'll find either they are all ignored or a runtime exception will be raised! In other words if you have seven fields be sure to define all seven, and personally I always arrange them in ascending order.

If there are additional fields (i.e. columns) in the text data beyond those you define they will be scan-typed without error. This is why I suspect the type-scan always takes place.

Other Goodies

Beyond this you can use schema.ini to select an alternate text delimiter. These normally default to the double quote, to allow commas within text fields in a CSV file. This example overrides the default " to an apostrophe ' instead:

TextDelimiter='

I often find it useful when creating output files to override the use of text delimiters entirely:

TextDelimiter=none

You can also define fixed-field text data. It can be very handy for importing things like mainframe data files. There are settings for both "ragged" and "true" fixed-field files.

You can set the output formatting for DateTime values, Currency values, non-integer numeric values, etc. (i.e. handling thousands-commas in numeric values or selecting date and time formatting). Many of these modify the text-to-type conversion process on input as well.

The Width modifer of a column definition isn't just for fixed-field data. It can also be used to truncate Text type data on input:

Col1="Last Name" Text Width 16

Output Issues

I've mentioned output in addition to input a few times. The most important consideration here is that when you use ADO with the Text components you normally can only append new rows to an existing file. So often you'll want to create an empty file before creating output rows... but there is an exception to this!
Reply With Quote
  #4  
Old 04-19-2008, 10:53 AM
dilettante's Avatar
dilettante dilettante is offline
Underclocked lifestyle

Forum Leader
* Guru *
 
Join Date: Feb 2005
Location: Michigan, USA
Posts: 4,184
Default

As I mentioned previously, you can only append rows to an existing text file. Thus it is usually necessary to start with (or create) an empty file before inserting output.

One exception is creating a new "table" file via SELECT INTO statements in Jet SQL.

Reformat Text Data (Example 4-1)

Here is a sample showing how a simple reformat operation might be done. Notice that you can easily rearrange the fields as well as change the formatting.

Reformat.wsf
Code:
<JOB id="Reformat">
  <REFERENCE object = "ADODB.Connection"/>
  <OBJECT id = cnCust progId = "ADODB.Connection"/>
  <SCRIPT language = "VBScript">
    Option Explicit
    Dim lngRecords

    Function ScriptPath()
      ScriptPath = Left(WScript.ScriptFullName, _
                        InStrRev(WScript.ScriptFullName, "\") - 1)
    End Function

    cnCust.Open    "Provider=Microsoft.Jet.OLEDB.4.0;" _
                 & "Data Source='" & ScriptPath() & "';" _
                 & "Extended Properties='Text'"
    cnCust.Execute "SELECT * INTO [FixedCustomers.txt]" _
                 & " FROM [Customers.txt]", _
                   lngRecords
    cnCust.Close

    MsgBox CStr(lngRecords) & " records reformatted"
  </SCRIPT>
</JOB>
Schema.ini
Code:
[Customers.txt]
Format=TabDelimited
MaxScanRows=1
ColNameHeader=True
Col1="Last Name" Text
Col2="First Name" Text
Col3="Company" Text
Col4="Street" Text
Col5="City" Text
Col6="State" Text
Col7="Zip" Text

[FixedCustomers.txt]
Format=FixedLength
FixedFormat=TrueFixedLength
MaxScanRows=1
ColNameHeader=True
Col1="Company" Text Width 16
Col2="Street" Text Width 18
Col3="City" Text Width 12
Col4="State" Text Width 2
Col5="Zip" Text Width 6
Col6="Last Name" Text Width 12
Col7="First Name" Text Width 12
If you run this you'll find there is one flaw in the concept: when you specify Format=FixedLength for an output file ColNameHeader=True will not respect it. In other words you will usually get a header row with field names that are quoted and comma delimited!

FixedCustomers.txt
Code:
"Company","Street","City","State","Zip","Last Name","First Name"
                123 Main St       Plaistow    NH03865 Filbert     Euronymous  
                4263 Evergreen TerTallTree    VA23666 James       Raymond     
Eversharp Paper 8 Corporate Plaza Dallas      TX75299 Williams    Sandra      
Spork Industries2121 Industrial WaBoise       ID83707 Ash         Random      
                1 Fashionable PlacBeverly HillCA90210 Crurae      Julie       
                1311 Main         Dallas      TX75013 Williams    John
As far as I know this is a bug and not a feature, but it has never been changed or fixed so we must live with the limitation. The best thing is to avoid using header rows in Fixed files.

Aside from that it works great.

One more limitation: if you run the script twice it'll raise an exception. When you use SELECT INTO the result table (file) must not already exist. This is an odd mirror image of the problem of trying to open a non-existant file as a table and then appending rows to it.

Jet Expression Service (Example 4-2)

Don't forget that Jet SQL processing can invoke the Expression Service which makes a large number of VBA functions available within Jet SQL. This makes it handy to perform a number of simple data transformations as required without writing code to iterate over a Recordset.

Here is an example with the requirements:
  • Combine the [Last Name] and [First Name] into a single [Name] field on output.
  • New [Name] field must be limited to 15 characters.
  • First and last names must be separated by an "@" symbol.
  • Truncate the [First Name] as required to make it fit within 15 characters.
  • But if the [Last Name] itself is more than 13 characters, use the first letter of [First Name] and the first 13 characters of [Last Name].
Contrived example? You wish! I've had some funny things come along as requirements and I'll bet you've seen a few yourself.

To make the code slightly easier to read I've put the SQL into a .WSF string resource. You might find that SQL is often more easily written correctly when you don't have to force it into a bunch of concatenated quoted literals in VB6 too.

Note that due to the complexity of this text I had to use an XML CDATA section within the resource block. The use of a < symbol caused the CDATA block to be necessary here.

You could just as easily use the other method of building up a SQL string though.

Reformat.wsf
Code:
<JOB id="Reformat">
  <REFERENCE object = "ADODB.Connection"/>
  <OBJECT id = cnCust progId = "ADODB.Connection"/>
  <RESOURCE id = SQLText>
    <![CDATA[
      SELECT Company, Street, City, State, Zip,
        IIf(Len([Last Name]) <= 13,
            Left$([First Name], 14 - Len([Last Name])) & "@" & [Last Name],
            Left$([First Name], 1) & "@" & Left$([Last Name], 13)
           ) AS Name
        INTO [FixedCustomers.txt] FROM [Customers.txt]
    ]]>
  </RESOURCE>
  <SCRIPT language = "VBScript">
    Option Explicit
    Dim lngRecords

    Function ScriptPath()
      ScriptPath = Left(WScript.ScriptFullName, _
                        InStrRev(WScript.ScriptFullName, "\") - 1)
    End Function

    cnCust.Open    "Provider=Microsoft.Jet.OLEDB.4.0;" _
                 & "Data Source='" & ScriptPath() & "';" _
                 & "Extended Properties='Text'"
    cnCust.Execute getResource("SQLText"), lngRecords
    cnCust.Close

    MsgBox CStr(lngRecords) & " records reformatted"
  </SCRIPT>
</JOB>
Schema.ini
Code:
[Customers.txt]
Format=TabDelimited
MaxScanRows=1
ColNameHeader=True
Col1="Last Name" Text
Col2="First Name" Text
Col3="Company" Text
Col4="Street" Text
Col5="City" Text
Col6="State" Text
Col7="Zip" Text

[FixedCustomers.txt]
Format=FixedLength
FixedFormat=TrueFixedLength
MaxScanRows=1
ColNameHeader=False
Col1="Company" Text Width 16
Col2="Street" Text Width 18
Col3="City" Text Width 12
Col4="State" Text Width 2
Col5="Zip" Text Width 6
Col6="Name" Text Width 15
Reply With Quote
  #5  
Old 04-19-2008, 10:57 AM
dilettante's Avatar
dilettante dilettante is offline
Underclocked lifestyle

Forum Leader
* Guru *
 
Join Date: Feb 2005
Location: Michigan, USA
Posts: 4,184
Default

One more option that seems widely overlooked is the Text IISAM's ability to import and export HTML. Sadly these abilities are even less well documented than processing more conventional delimited and fixed text data.

HTML Table Reporting (Example 5-1)

Simple HTML output is easily created using Jet on a table by table basis. It's a little bit like using a Recordset bound to a grid control. Someday I want to experiment with HTML output of hierarchical (shaped) Recordsets too. Who knows what capabilities are buried inside of these tools?

Report.wsf
Code:
<JOB id="Report">
  <REFERENCE object = "ADODB.Connection"/>
  <OBJECT id = cnCust progId = "ADODB.Connection"/>
  <SCRIPT language = "VBScript">
    Option Explicit
    Dim lngRecords

    Function ScriptPath()
      ScriptPath = Left(WScript.ScriptFullName, _
                        InStrRev(WScript.ScriptFullName, "\") - 1)
    End Function

    cnCust.Open    "Provider=Microsoft.Jet.OLEDB.4.0;" _
                 & "Data Source='" & ScriptPath() & "';" _
                 & "Extended Properties='Text'"
    cnCust.Execute "SELECT * " _
                 & " INTO [Customers.htm] FROM [Customers.txt]", _
                   lngRecords
    cnCust.Close

    MsgBox CStr(lngRecords) & " records reformatted"
  </SCRIPT>
</JOB>
Schema.ini
Code:
[Customers.txt]
Format=TabDelimited
MaxScanRows=1
ColNameHeader=True
Col1="Last Name" Text
Col2="First Name" Text
Col3="Company" Text
Col4="Street" Text
Col5="City" Text
Col6="State" Text
Col7="Zip" Text

[Customers.htm]
ColNameHeader=True
Format=HTML
Col1="First Name" Text
Col2="Last Name" Text
Col3="Company" Text
Col4="Street" Text
Col5="City" Text
Col6="State" Text
Col7="Zip" Text
Note the Format=HTML line used above! I didn't find that documented anywhere and I haven't even added it to my notes in SchemaNotes.rtf yet.

It was discovered by using an exotic Jet SQL IN clause and observing the resulting Schema.ini that Jet created.

Importing Text into an MDB (Example 5-2)

No discussion of using Jet with text data would be complete without a little bit about text import and export from a native Jet database.

This can be done in a number of ways, most of which involve the tricky Jet SQL IN clause. The approach I'll demonstrate here involves opening the MDB as the primary database and the text directory as an external database.

Import.wsf
Code:
<JOB id="Report">
  <REFERENCE object = "ADODB.Connection"/>
  <OBJECT id = cnCust progId = "ADODB.Connection"/>
  <SCRIPT language = "VBScript">
    Option Explicit
    Dim lngRecords

    Function ScriptPath()
      ScriptPath = Left(WScript.ScriptFullName, _
                        InStrRev(WScript.ScriptFullName, "\") - 1)
    End Function

    cnCust.Open    "Provider=Microsoft.Jet.OLEDB.4.0;" _
                 & "Data Source='" & ScriptPath() & "\sample.mdb'"
    cnCust.Execute "SELECT * " _
                 & " INTO [Customers]" _
                 & " FROM [Customers.txt] IN '" & ScriptPath() & "' 'Text;'", _
                   lngRecords
    cnCust.Close

    MsgBox CStr(lngRecords) & " records imported"
  </SCRIPT>
</JOB>
This example is built along the same lines as the previous ones. Both sample.mdb and Customers.txt are in the same directory as the script itself. The semicolon after the Text keyword is required here. In cases where we didn't have a Schema.ini file we might add modifiers such as HDR=Yes.

Wrapping Up

I hope you found this introduction to processing tabular text data files with ADO and Jet useful.

In special cases one might resort to contortions involving reading the text into Strings and using Split() on them instead. But this approach can be more concise, and more importantly allows you to apply some of the power of Jet's SQL engine to the processing of the data. Many of Jet SQL's aggregation capabilities work on text table data as easily as native tables, and you also have access to the Jet Expression Service's ability to perform simple transformations.

Simple changes to the Schema.ini file make it very easy to produce customized results.

While the examples are written in VBScript almost all of the syntax is the same in VB 5 or 6. I doubt you'll have any problem translating the concepts into your own real programs.

SchemaNotes.rtf should prove a useful reference when you need to look up some of the details of using the Text IISAM.
Reply With Quote
  #6  
Old 03-13-2009, 04:30 PM
dilettante's Avatar
dilettante dilettante is offline
Underclocked lifestyle

Forum Leader
* Guru *
 
Join Date: Feb 2005
Location: Michigan, USA
Posts: 4,184
Default

Reply With Quote
  #7  
Old 04-05-2009, 10:35 PM
dilettante's Avatar
dilettante dilettante is offline
Underclocked lifestyle

Forum Leader
* Guru *
 
Join Date: Feb 2005
Location: Michigan, USA
Posts: 4,184
Default Crosstab Queries

I came across another good example recently.

I local VB6 user was struggling with what he thought was a two day project. He had some weather measurement data from reporting stations around the area and wanted to view it with simple line graphs using the MSChart control.

The raw accumulated data looked something like:
Code:
247,2007-01-01,16
163,2007-01-01,21
412,2007-01-01,19
247,2007-02-01,21
163,2007-02-01,24
412,2007-02-01,27
163,2007-03-01,45
247,2007-03-01,33
412,2007-03-01,41
In this stripped down example there are just three fields: reporting Station, SampleDate, and Temp (averaged to a monthly value).

He simply wanted to be able to graph this with a line for each different Station, with station numbers changed to names and month names as "MMM YY" format. After coming up with over 200 lines of convoluted Line Input#, Split(), loops, and array logic he found it awkward to get the results he wanted.


This turned out to be easy given a second file as a lookup table for the station names he wanted (basically City names):
Code:
23,Albion
163,Detroit
203,Grand Rapids
247,Houghton
412,Lansing
Then a Schema.ini to define everything:
Code:
[Samples.txt]
MaxScanRows=1
ColNameHeader=False
CharacterSet=1252
DateTimeFormat=yyyy-mm-dd
TextDelimiter=none
Format=CSVDelimited
Col1=StationID Integer
Col2=SampleDate Date
Col3=Temp Integer

[Stations.txt]
MaxScanRows=1
ColNameHeader=False
CharacterSet=1252
TextDelimiter=none
Format=CSVDelimited
Col1=StationID Integer
Col2=City Char Width 255
Then one Jet SQL statement did the rest:
Code:
TRANSFORM First(Temp)
  SELECT Format$(SampleDate, 'MMM YY') FROM
    (SELECT City, SampleDate, Temp FROM [Samples.txt] AS SAMPS
      INNER JOIN [Stations.txt] AS STAS ON SAMPS.StationID = STAS.StationID)
  GROUP BY SampleDate ORDER BY SampleDate, City
  PIVOT City
The resulting Recordset was ready to be databound to an MSChart control!


By the way, if you have a late version of Office or Access installed, you might look in the Jet SQL help for other tidbits like this. It can usually be found as:

C:\Program Files\Common Files\microsoft shared\OFFICE11\1033\JETSQL40.CHM

Your Office version number will vary, as will your locale (1033 is US English).
Attached Images
File Type: png sshot.png (12.6 KB, 27 views)
Attached Files
File Type: zip Crosstab Chart.zip (4.6 KB, 29 views)
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Advertisement:





Free Publications
The ASP.NET 2.0 Anthology
101 Essential Tips, Tricks & Hacks - Free 156 Page Preview. Learn the most practical features and best approaches for ASP.NET.
subscribe
Programmers Heaven C# School Book -Free 338 Page eBook
The Programmers Heaven C# School book covers the .NET framework and the C# language.
subscribe
Build Your Own ASP.NET 3.5 Web Site Using C# & VB, 3rd Edition - Free 219 Page Preview!
This comprehensive step-by-step guide will help get your database-driven ASP.NET web site up and running in no time..
subscribe
 
 
-->