Unless your document is
very large, I suggest the DOM-based parsing capabilities of the
XmlDocument class.
XmlTextReader supports the SAX-based parsing in .NET, and I find it to be really unweildly and overly verbose (the name is silly sounding, but I believe it's based on one of the first parsing toolkits for XML; it used the same kind of model and anything that follows the pattern is known as a "SAX-like" parser. Technically .NET isn't truly following the SAX model because it is supposed to be event-based. I've done SAX parsing in Java and I'm very glad MS chose to stay away from an event-based parsing interface. More information about this and why you might want a SAX parser can be found on Wikipedia and other internet sites.)
The only problem is DOM-based parsers must store the file (or, more accurately, a parse tree) in memory to work. If this is a multi-hundred-megabyte file it's going to hurt to use a DOM-based parser.
Let's look at how you might implement this both ways, because it's just as bad to only know the DOM-based parsing as it is to only know the SAX-based parsing. Seeing implementations in both styles will also help you understand what using each one is like.
First, I'm not certain what your question is. I'm assuming you mean by "I wanna pass "Account_Name" and get "my account #1", you mean you are saying, "I want to pass the name of an element and get the value of that element." This is easy to do in DOM and much tougher in SAX. We'll start by defining the interface for a function that might do this; it's going to take some changes in either method but this is basically what we want:
Code:
Function GetElementValue(ByVal elementName As String) As String
To simplify parameters to the function, let's make XML parsing the responsibility of a class. We'll start with DOM because it's my favorite and it's the easiest.
NOTE: Some people are having issues with my example style lately, so I feel the need to explain something. If my code calls a method that isn't implemented, it probably means I'm saying, "You need to implement a method that does something here." The code samples in this post come from classes that do the work, so if you see a variable named like _variable that means it's a class instance of a variable, and odds are its name will tell you what its type is. This saves several thousand characters per week with the way I'm posting.
DOM-based parsing
DomParser will be the name of the class, and its operation is fairly straightforward. It loads the document into an
XmlDocument. The methods
SelectNodes and
SelectSingleNodes are used to pull certain nodes out of the file using a special query language named XPath. You can also use methods such as
GetElementsByName and properties such as
FirstChild to manually iterate over nodes.
With the document loaded into an
XmlDocument, implementing
GetElementValue is fairly easy: we just need to write an XPath query to find the first element that matches the element name that is specified:
Code:
Public Function GetElementName(ByVal elementName As String) As String
Dim xPathQuery As String = String.Format("//{0}", elementName)
Dim matchingNode As XmlNode = _document.SelectSingleNode(xPathQuery)
If matchingNode Is Nothing Then
Return Nothing
End If
Return matchingNode.InnerText
End Function
It's straightforward (
if you understand XPath, unfortunately). When we pass "Account_Name", the query ends up as "
//Account_Name". In XPath, this means, "Match all elements with the name 'Account_Name' no matter where they are in the document." I call
SelectSingleNode because I only want the first node that matches. The only tricky part comes when I call
InnerText. My instinct is to go for the
Value property, but this is wrong. Technically in XML DOM,
<Account_Name>test</Account_Name> looks like this:
Code:
(Element) Account_Name
(TextNode) Value="test"
(Element) /Account_Name
Basically, elements do not have values. However, this element does have a child TextNode with a value. I could go to more trouble to extract the text node and use its value, but it's easier to just use the element's
InnerText property. This is going to be plenty obvious when working with the SAX parser. Speaking of which...
SAX-based Parsing
(I havent touched this in something like 3 years! It was fun to look it up again.)
XmlTextReader provides forward-only, read-only access to the XML. At a high level, here's how it works. First, you call one of the
Read methods; but there's a catch to this. Some
Read methods return a value, and others don't. Methods like
Read and
ReadElement don't return values; you use these to move the reader to different nodes. Methods like
ReadInnerText return a value: they read something from the current node but do not move to a different node. There are also properties such as
Value and
Name that get information about the current node.
There's really two approaches to this kind of parsing, but only one will work in this method. If you're reading an entire file, you'll usually want to use
ReadStartElement overloads that take a string so you can validate that you're reading the nodes you expect. This won't work in the case you asked about because you're looking for an arbitrary element; the code would get too long. The other method is to use
Read to advance over every single node, then examine the
NodeType property to determine if we're interested in the node. That's the approach the SaxParser class takes:
Code:
Public Function GetElementValue(ByVal elementName As String) As String
Using file As New FileStream(FileName, FileMode.Open)
Using reader As New XmlTextReader(file)
While reader.Read()
Select Case reader.NodeType
Case XmlNodeType.Element
If reader.Name = elementName Then
reader.Read()
Return reader.Value
End If
End Select
End While
End Using
End Using
Return Nothing
End Function
Note that when an element with the name that matches is found, we advance the reader to its text node then return the value of the text node.
Personally, I think SAX-based code looks ugly and tough to maintain. This method displays the "arrowhead" shape of deeply nested control structures that tend to be a sign that your logic is complicated and hard to follow. In fact, in the attached code files I altered the logic to reduce the levels of nesting, at the cost of an increased chance that I will forget to close the file stream when I'm done with it. Still, the actual logic is 5 levels of indentation deep; that's annoying.
I recommend using a DOM parser in practically all cases for small XML files.
I've attached enough files to make a console application to play around with these classes. I somehow suspect that a question you didn't ask is, "What's the best way to get the information for an entire account out of the XML file?" That example's going to be fun, but I'll wait until the question is asked because I'm already over 7500 characters and I cannot adequately explain myself in what remains of my limit.