Go Back  Xtreme Visual Basic Talk > Visual Basic .NET (2002/2003/2005/2008, including Express editions) > .NET General > Read XML, Newbie


Reply
 
Thread Tools Display Modes
  #1  
Old 10-17-2008, 12:31 PM
SIMIN SIMIN is offline
Centurion
 
Join Date: Mar 2008
Posts: 103
Default Read XML, Newbie

hello all,
This is my first time I wanna read from a XML file!
I have a XML file like this:
Code:
<?xml version="1.0" encoding="utf-16" ?>
<MessageAccount>
    <Account_Name type="SZ">my account #1</Account_Name>
    <Connection_Type type="DWORD">00000003</Connection_Type>
    <POP3_Server type="SZ">pop.domain.com</POP3_Server>
    <POP3_User_Name type="SZ">info</POP3_User_Name>
    <POP3_Password2 type="BINARY">gfdjkgdfghdfjkghdfgjk</POP3_Password2>
    <POP3_Use_Sicily type="DWORD">00000000</POP3_Use_Sicily>
    <POP3_Prompt_for_Password type="DWORD">00000000</POP3_Prompt_for_Password>
    <SMTP_Server type="SZ">smtp.domain.com</SMTP_Server>
    <SMTP_Display_Name type="SZ">My Name</SMTP_Display_Name>
    <SMTP_Email_Address type="SZ">info@domain.com</SMTP_Email_Address>
</MessageAccount>
Well, I know I should use XMLReader for this:
Code:
Dim reader As XmlTextReader = New XmlTextReader (URLString)
Also, I know I can read it in a loop:
Code:
Do While (reader.Read()) Console.WriteLine(reader.Name) Loop
But the data are returned in a strange sort! Even if I split the reader.NodeType.
However, I am new to this.

I just wanna pass the TEXT and get the VALUE. And don't know how. Please help me. For example in this XML file:

<Account_Name type="SZ">my account #1</Account_Name>

I wanna pass Account_Name and get my account #1.
...

I searched and worked a lot but could not find it.
Please help me,
Thank you.
Reply With Quote
  #2  
Old 10-17-2008, 01:36 PM
AtmaWeapon's Avatar
AtmaWeapon AtmaWeapon is online now
Ultimate Contributor

Forum Leader
* Guru *
 
Join Date: Feb 2004
Location: Austin, TX
Posts: 7,598
Default

Unless your document is very large, I suggest the DOM-based parsing capabilities of the XmlDocument class. XmlTextReader supports the SAX-based parsing in .NET, and I find it to be really unweildly and overly verbose (the name is silly sounding, but I believe it's based on one of the first parsing toolkits for XML; it used the same kind of model and anything that follows the pattern is known as a "SAX-like" parser. Technically .NET isn't truly following the SAX model because it is supposed to be event-based. I've done SAX parsing in Java and I'm very glad MS chose to stay away from an event-based parsing interface. More information about this and why you might want a SAX parser can be found on Wikipedia and other internet sites.)

The only problem is DOM-based parsers must store the file (or, more accurately, a parse tree) in memory to work. If this is a multi-hundred-megabyte file it's going to hurt to use a DOM-based parser.

Let's look at how you might implement this both ways, because it's just as bad to only know the DOM-based parsing as it is to only know the SAX-based parsing. Seeing implementations in both styles will also help you understand what using each one is like.

First, I'm not certain what your question is. I'm assuming you mean by "I wanna pass "Account_Name" and get "my account #1", you mean you are saying, "I want to pass the name of an element and get the value of that element." This is easy to do in DOM and much tougher in SAX. We'll start by defining the interface for a function that might do this; it's going to take some changes in either method but this is basically what we want:
Code:
Function GetElementValue(ByVal elementName As String) As String
To simplify parameters to the function, let's make XML parsing the responsibility of a class. We'll start with DOM because it's my favorite and it's the easiest.

NOTE: Some people are having issues with my example style lately, so I feel the need to explain something. If my code calls a method that isn't implemented, it probably means I'm saying, "You need to implement a method that does something here." The code samples in this post come from classes that do the work, so if you see a variable named like _variable that means it's a class instance of a variable, and odds are its name will tell you what its type is. This saves several thousand characters per week with the way I'm posting.

DOM-based parsing
DomParser will be the name of the class, and its operation is fairly straightforward. It loads the document into an XmlDocument. The methods SelectNodes and SelectSingleNodes are used to pull certain nodes out of the file using a special query language named XPath. You can also use methods such as GetElementsByName and properties such as FirstChild to manually iterate over nodes.

With the document loaded into an XmlDocument, implementing GetElementValue is fairly easy: we just need to write an XPath query to find the first element that matches the element name that is specified:
Code:
Public Function GetElementName(ByVal elementName As String) As String
    Dim xPathQuery As String = String.Format("//{0}", elementName)
    Dim matchingNode As XmlNode = _document.SelectSingleNode(xPathQuery)

    If matchingNode Is Nothing Then
        Return Nothing
    End If

    Return matchingNode.InnerText
End Function
It's straightforward (if you understand XPath, unfortunately). When we pass "Account_Name", the query ends up as "//Account_Name". In XPath, this means, "Match all elements with the name 'Account_Name' no matter where they are in the document." I call SelectSingleNode because I only want the first node that matches. The only tricky part comes when I call InnerText. My instinct is to go for the Value property, but this is wrong. Technically in XML DOM, <Account_Name>test</Account_Name> looks like this:
Code:
(Element) Account_Name
    (TextNode) Value="test"
(Element) /Account_Name
Basically, elements do not have values. However, this element does have a child TextNode with a value. I could go to more trouble to extract the text node and use its value, but it's easier to just use the element's InnerText property. This is going to be plenty obvious when working with the SAX parser. Speaking of which...

SAX-based Parsing
(I havent touched this in something like 3 years! It was fun to look it up again.)

XmlTextReader provides forward-only, read-only access to the XML. At a high level, here's how it works. First, you call one of the Read methods; but there's a catch to this. Some Read methods return a value, and others don't. Methods like Read and ReadElement don't return values; you use these to move the reader to different nodes. Methods like ReadInnerText return a value: they read something from the current node but do not move to a different node. There are also properties such as Value and Name that get information about the current node.

There's really two approaches to this kind of parsing, but only one will work in this method. If you're reading an entire file, you'll usually want to use ReadStartElement overloads that take a string so you can validate that you're reading the nodes you expect. This won't work in the case you asked about because you're looking for an arbitrary element; the code would get too long. The other method is to use Read to advance over every single node, then examine the NodeType property to determine if we're interested in the node. That's the approach the SaxParser class takes:
Code:
Public Function GetElementValue(ByVal elementName As String) As String
    Using file As New FileStream(FileName, FileMode.Open)
        Using reader As New XmlTextReader(file)
            While reader.Read()
                Select Case reader.NodeType
                    Case XmlNodeType.Element
                        If reader.Name = elementName Then
                            reader.Read()
                            Return reader.Value
                        End If
                End Select
            End While
        End Using
    End Using

    Return Nothing
End Function
Note that when an element with the name that matches is found, we advance the reader to its text node then return the value of the text node.

Personally, I think SAX-based code looks ugly and tough to maintain. This method displays the "arrowhead" shape of deeply nested control structures that tend to be a sign that your logic is complicated and hard to follow. In fact, in the attached code files I altered the logic to reduce the levels of nesting, at the cost of an increased chance that I will forget to close the file stream when I'm done with it. Still, the actual logic is 5 levels of indentation deep; that's annoying.

I recommend using a DOM parser in practically all cases for small XML files.

I've attached enough files to make a console application to play around with these classes. I somehow suspect that a question you didn't ask is, "What's the best way to get the information for an entire account out of the XML file?" That example's going to be fun, but I'll wait until the question is asked because I'm already over 7500 characters and I cannot adequately explain myself in what remains of my limit.
Attached Files
File Type: vb XmlDomParser.vb (1,002 Bytes, 4 views)
File Type: vb SaxParser.vb (976 Bytes, 3 views)
__________________
.NET Resources
My FAQ threads | Tutor's Corner | Code Library
I would bet money 2/3 of .NET questions are already answered in one of these three places.
Reply With Quote
  #3  
Old 10-17-2008, 04:23 PM
SIMIN SIMIN is offline
Centurion
 
Join Date: Mar 2008
Posts: 103
Default

Hello AtmaWeapon,
How can I thank you for your very great help and taking your valuable time to provide me such a good article?
In fact, I am going to import and read Windows Mail email accounts in Windows Vista, which are stored in XML files in each user's folder.
But I chose to use SAX-based code
Just one think I cannot find out is that in my real data:
Code:
<?xml version="1.0" encoding="utf-16" ?>
<MessageAccount>
    <Account_Name type="SZ">my account #1</Account_Name>
    <Connection_Type type="DWORD">00000003</Connection_Type>
    <POP3_Server type="SZ">pop.domain.com</POP3_Server>
    <POP3_User_Name type="SZ">info</POP3_User_Name>
    <POP3_Password2 type="BINARY">gfdjkgdfghdfjkghdfgjk</POP3_Password2>
    <POP3_Use_Sicily type="DWORD">00000000</POP3_Use_Sicily>
    <POP3_Prompt_for_Password type="DWORD">00000000</POP3_Prompt_for_Password>
    <SMTP_Server type="SZ">smtp.domain.com</SMTP_Server>
    <SMTP_Display_Name type="SZ">My Name</SMTP_Display_Name>
    <SMTP_Email_Address type="SZ">info@domain.com</SMTP_Email_Address>
</MessageAccount>
Some fields are type="SZ", or string.
While some others are in this format:

<SMTP_Port type="DWORD">00000019</SMTP_Port>

I am not sure how is this. But I think it's in Hexadecimal format?!!!

Which I should convert to Decimal format to get the real value?
In this sample, SMTP port is definitely 25!

Do you have any idea how can I do this ?
Thank you again for your kindness.
Reply With Quote
  #4  
Old 10-18-2008, 12:05 AM
Roger_Wgnr's Avatar
Roger_Wgnr Roger_Wgnr is offline
CodeASaurus Hex

Forum Leader
* Expert *
 
Join Date: Jul 2006
Location: San Antonio TX
Posts: 2,335
Default

Quote:
Originally Posted by AtmaWeapon View Post
XmlTextReader supports the SAX-based parsing in .NET, and I find it to be really unweildly and overly verbose (the name is silly sounding, but I believe it's based on one of the first parsing toolkits for XML; it used the same kind of model and anything that follows the pattern is known as a "SAX-like" parser. Technically .NET isn't truly following the SAX model because it is supposed to be event-based.
Just thought I would add a comment about the SAX parser. SAX stands for Simple API for XML. And in it's original form was very easy to use since it was an event based parser. However, as AtmaWeapon pointed out it is a bit unweildly when used in .NET since the XmlTextReader does not raise events leading to the deeply nested structure.
__________________
Code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. ~Martin Golding
The user is a peripheral that types when you issue a read request. ~Peter Williams
MSDN Visual Basic .NET General FAQ
Reply With Quote
  #5  
Old 10-18-2008, 10:01 AM
AtmaWeapon's Avatar
AtmaWeapon AtmaWeapon is online now
Ultimate Contributor

Forum Leader
* Guru *
 
Join Date: Feb 2004
Location: Austin, TX
Posts: 7,598
Default

Oh yeah, much easier to use in it's event based form

Compare a select-case to:

Code:
Private Sub HandleReadElement(...)

Private Sub HandleReadTextNode(...)
 
Private Sub HandleReadCData(...)

Private Sub HandleReadAttribute(...)

' and so on...
That's why I said I was glad MS strayed from the event-based approach. I did some work with the Java SAX parser and it is basically impossible unless you implement a class due to the fact that "ReadAttribute" needs to know what element's attributes it is reading, but you don't have a method to call so you have to use class-level variables.

The problem's similar with .NET's method-based approach (any non-DOM approach will have the problem), but it can be contained in a local block without the need for class-level variables.
__________________
.NET Resources
My FAQ threads | Tutor's Corner | Code Library
I would bet money 2/3 of .NET questions are already answered in one of these three places.
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump

Advertisement:

Powered by liquidweb