View Single Post
 
Old 04-13-2004, 07:54 AM
Mike Rosenblum's Avatar
Mike Rosenblum Mike Rosenblum is offline
Microsoft Excel MVP

Forum Leader
* Guru *
 
Join Date: Jul 2003
Location: New York, NY, USA
Posts: 7,848
Default Automating Office Programs with VB.Net / COM Interop

Automating Office Programs with VB.NET

Introduction

This tutorial is designed to cover the basics required to control Microsoft Office programs when automated from a VB.NET application. The examples utilized here involve Microsoft Excel, but the principles are the same when automating other MS Office programs such as Access or Word.

Using .NET affords a number of advantages:
  1. The main advantage for the MS Office developer is the strong typing that the language affords. The .NET languages such as VB.NET and C# enforce strong-typing and dissuade the use of weak typed Object or Variant data types. Similarly, the use of late-binding is not encouraged, although is natively enabled by VB.NET. (C# can also perform late binding, but does so through reflection code; it is not a C# language feature.) By enforcing strong typing and discuraging the use of late binding, many more errors can be caught at compile-time, instead of at run-time, when it is often too late.
  2. Managed memory (garbage collection) afforded by the .NET Framework is another key reason why a programmer should consider using .NET languages, although in the case of automating Microsoft Office applications, this can actually cause some complexities. Helping the developer overcome the issues encountered when releasing COM objects from .NET is a key goal of this tutorial.
  3. The .NET Framework also supports advanced programming concepts such as inheritance, delegates, generics, collection classes and reflection coding. Although perhaps not of immediate need by the typical MS Office developer, these features become of greater value as the programmer's skills improve and as a project's complexity increases.
For these reasons and more the VBA or Visual Basic 6.0 developer should consider the .NET language platform.

Do be aware, however, that execution speed when using .NET Automation will be similar to, or slightly slower than, it was under Visual Basic 6.0 Automation, which is considerably slower than VBA executes natively. We discuss some ways of dealing with this within the "Execution Speed" section of the tutorial. Another issue to consider is deployment; VB.NET can be tricky to implement for MS Office versions below Office 2002 (version 10.0). We discuss some deployment options within the "Deployment Issues" section later in the tutorial.

Ok, with the background and caveats behind us, let's get started...


Adding The Project Reference

The first step in Automating a Microsoft Office Program with VB.NET is to have the required Project Library referenced by your VB.NET project.

Here we will be Automating Excel, so one would need to add a reference to the Microsoft Excel Object Library. The procedure to do this is almost the same as it was in legacy VB 6.0:

(1) Within the Visual Studio .NET IDE, choose the Project > Add Reference... menu item. Next you will want to click on the COM tab at the top because the Excel object model is a Component Object Model (COM) library. Now scroll down until you find the "Microsoft Excel 11.0 Object Library" (which could be 10.0, 9.0, 8.0 or even 12.0, depending on your system).

(2) Next, click the [Select] button and then click [OK]. The reference to the Excel Object Library is now loaded.


Basic Shell

The next step is to write your code. The basic shell for Automating Excel could look something like this:
Code:
' Include the following at the top of the Module: Imports Excel = Microsoft.Office.Interop.Excel
Ok, then our routine:
Code:
Sub MySub() Dim oApp As New Excel.Application Dim oWB As Excel.Workbook = oApp.Workbooks.Add() Dim oWS As Excel.Worksheet = CType(oWB.Worksheets(1), Excel.Worksheet) ' Your Code Goes Here ' Your Code Goes Here ' Your Code Goes Here ' Your Code Goes Here ' Cleanup: GC.Collect() GC.WaitForPendingFinalizers() GC.Collect() GC.WaitForPendingFinalizers() Marshal.FinalReleaseComObject(oWS) oWB.Close(SaveChanges:=False) Marshal.FinalReleaseComObject(oWB) oApp.Quit() Marshal.FinalReleaseComObject(oApp) End Sub
The code above serves as a basic outline for creating and then safely disposing of your Excel Application object. What one might place in the "Your Code Goes Here" block is up to you, but is also discussed in the "Your Code Goes Here" section later on in this tutorial. But before we move on to that, let's discuss the 'cleanup' section first...

Cleanup

Cleanup, as it pertains to .NET Automation, can be a tricky business. The difficulty derives from the fact that .NET is calling the Excel object model through the COM Interop, which is an interconnection between two technically incompatible worlds: (1) the Component Object Model (COM) and (2) the .NET Framework. COM project libraries are created by programs such as C++ or Visual Basic 6.0 and rely on an object reference counting scheme for maintaining and releasing object references. When an object's reference count reaches zero, it is immediately released. The .NET Framework however, uses a managed memory scheme whereby live references are periodically traced from the roots, to other referenced objects, in a tree like structure; when completed, all unreachable objects are released. While the .NET managed memory approach is generally superior and prevents memory leaks caused by circularity-references (with which the COM reference counting approach can have great difficulties), the fact that .NET uses non-deterministic finalization adds complexity for us when we are trying to get our MS Office application to release.

There are many complex techniques for releasing a COM Object in .NET, but the example shell, above, is showing what we believe is the cleanest arrangement. Let's repeat the cleanup section here:
Code:
' Cleanup: GC.Collect() GC.WaitForPendingFinalizers() GC.Collect() GC.WaitForPendingFinalizers() Marshal.FinalReleaseComObject(oWS) oWB.Close(SaveChanges:=False) Marshal.FinalReleaseComObject(oWB) oApp.Quit() Marshal.FinalReleaseComObject(oApp) End Sub
Note that the Marshal.FinalReleaseComObject() method is available when using Framework 2.0 and above, and therefore is accessible by Visual Studio versions 2005 and higher. If using the .NET 1.0 or 1.1 Framework (for example, VB.NET or C# 2002/2003) then you have to make use of the Marshal.ReleaseComObject() method instead, calling it in a repeated loop until the reference count reaches zero.

Admittedly, this all looks a bit complicated, for this much effort was not required in VB 6.0 and is certainly not necessary when using VBA. But the issue is that .NET uses non-deterministic Garbage Collection, which is incompatible with COM's reference-counting approach. In short, we need to have .NET tell COM explicitly that we are done with the COM object and there are basically only two ways to do this:

(1) Set the variable = Nothing and then call GC.Collect(). This works 100% fine, but calling GC.Collect() is a time consuming process. You would think, then, that you can set all your variables = Nothing and then call GC.Collect() at the end, and this does work sometimes. However, Microsoft Office applications are sensitive to the order in which the objects are released and, unfortunately, setting your variables = Nothing and then calling GC.Collect() does not guarantee the release order of the objects. (That is, unless you made a separate GC.Collect() call after every single = Nothing setting -- which would be very expensive.)

(2) Call Marshal.FinalReleaseComObject() or Marshal.ReleaseComObject(). This allows us to explicitly control the order in which our COM objects are released. However, this approach does not allow us to release objects to which we do not have a named variable. For example, when programming in Excel, there are usually loops where one iterates through the cells of a Range. It might not be obvious, but the result of this is that many Range objects are created and discarded rapidly. In VBA or VB 6.0, these objects are garbage collected immediately; however, in .NET, they sit, awaiting Garbage Collection.

So the result is that for proper cleanup we must call GC.Collect() and GC.WaitForPendingFinalizers() first in order to release minor objects that we are not referencing by variables we control. In fact need to call GC.Collect() and GC.WaitForPendingFinalizers() twice because the runtime callable wrapper ("RCW") for each COM object has a finalizer that does the actual work of releasing the COM object from .NET. And .NET objects that have finalizers (the RCW in this case) need a second round of garbage collection to be fully removed from memory.

The last step is to go through our named Excel object variables in reverse-order of importance calling Marshal.FinalReleaseComObject() or Marshal.ReleaseComObject(). It is important that the Excel.Application variable gets released last, or else the application's process is likely to hang.

[Continued next post...]

Last edited by webbone; 04-30-2007 at 04:06 PM. Reason: updated
Reply With Quote