Trying to Understand This Code Segment (XML)

Agrajag27
11-21-2009, 08:19 PM
Guys, I'm just getting back into VB and was never a pro. The last environment I was in, briefly, was VB6 but spent a good amount of time coding in early VB's back to QB.

I'm trying to read RSS feeds and found a nice demo reader but now I want to understand the code I'm seeing:


'Check to see if the user put anything in the text box. If not then just go back to the main form.
If tbxFeedURL.Text <> "" Then

'Go and verify the feed and retrieve it's title.

'Create a new XML doc that has all the elements we need from the one provided
Dim doc As New XmlDocument()
doc.Load(tbxFeedURL.Text)

'Need a check here to be sure the above URL tooks us to a valid feed.

Dim navigator As XPathNavigator = doc.CreateNavigator()

' look for the path to the rss item titles navigate through the nodes to get all titles
Dim nodes As XPathNodeIterator = navigator.Select("/rss/channel/item/title")

While nodes.MoveNext

' clean up the text for display
Dim node As XPathNavigator = nodes.Current
Dim tmp As String = node.Value.Trim()
tmp = tmp.Replace(ControlChars.CrLf, "")
tmp = tmp.Replace(ControlChars.Lf, "")
tmp = tmp.Replace(ControlChars.Cr, "")
tmp = tmp.Replace(ControlChars.FormFeed, "")
tmp = tmp.Replace(ControlChars.NewLine, "")

End While

' set a position counter
Dim position As Integer = 0

' Get the links from the RSS feed
Dim nodesLink As XPathNodeIterator = navigator.Select("/rss/channel/item/link")

While nodesLink.MoveNext

' clean up the link
Dim node As XPathNavigator = nodesLink.Current
Dim tmp As String = node.Value.Trim()
tmp = tmp.Replace(ControlChars.CrLf, "")
tmp = tmp.Replace(ControlChars.Lf, "")
tmp = tmp.Replace(ControlChars.Cr, "")
tmp = tmp.Replace(ControlChars.FormFeed, "")
tmp = tmp.Replace(ControlChars.NewLine, "")


' increment the position counter
position += 1

End While


1. Why does the demo create a new XML to read through an existing XML or is this all in memory as I can't find an actually file? Is this even needed as what I want to do is read the RSS feed, grab some info from it (Source, Title of each story, Date stamp, author and the story text) and then process that in various ways (not actually view it... I want to e-mail specific stories directly to me that fit my search criteria). Or is this just opening the XML out on the web? That would explain why sending it "test" gives me an error saying it can't find "test" in the bin folder as without an HTTP address it goes local.

2. doc.load... what check can I perform here to be certain that the data entered by the user is actually a valid feed? It should, for example, have a first line header that is reliable but is that the best method?

3. Where or how are the elements stored that I can then refer back to pull the data out I want?

4. I don't understand how this part works: navigator.Select("/rss/channel/item/title") Can someone shed some lite on the 4 pieces at the end?

5. I'm assuming the clean-up isn't really needed for e-mail but I didn't remove it as it doesn't seem to harm anything.

Thanks for any help on this.

Again, ultimately I want to read through a given feed, grab only items that fit my criteria and then send those to an e-mail address (I'll code all that later and will likely have more questions then).

AtmaWeapon
11-21-2009, 09:26 PM
1) It's not creating a new XML file. XmlDocument is a class that represents a special in-memory representation of an XML file in a way that makes it easy to search for specific XML elements. The Load() method as used in this snippet reads the XML file from a URL. When you send it "test", it thinks you mean "open a file named 'test'" and this is why it complains.

For reading RSS feeds XmlDocument is a pretty good choice.

2) There's not much you can do unless you want to take a different, more difficult approach to reading XML or if there's a schema that can be used to validate your feed. If the XML is *really* wrong it will throw an exception you can catch, but it will take something like missing end elements, missing brackets, or invalid characters. If it's something as minor as a <foo> tag where you expect <bar>, you'll need something more precise. That's a big enough topic for its own thread; make one if it's important.

3) I don't see anywhere in the code that stores information for later. Somewhere after each of the "tmp.Replace()" calls is where you'd want to do so. It looks like the first one stores article titles, and the second one stores links to the articles.

4) Select accepts a query in a special language called XPath that is used to specify parts of an XML file similar to how the filesystem is accessed. I'm not an expert, but it looks like "/rss/channel/item/title" would select all "title" elements in a document like this:
<rss>
<channel>
<item>
<title></title>
...
</item>
...

5) You're correct; the cleanup code isn't vital. It will make the strings more predictable but I doubt it matters in the end.

Agrajag27
11-22-2009, 10:52 AM
3) I don't see anywhere in the code that stores information for later. Somewhere after each of the "tmp.Replace()" calls is where you'd want to do so. It looks like the first one stores article titles, and the second one stores links to the articles.

That's correct but what from the code told you the first one was the Title contents? The second one I can see that from nodesLINK.Current. The first one is just nodes.Current so I'm trying to understand why it's the title. Is it the "Dim nodes As XPathNodeIterator = navigator.Select("/rss/channel/item/title")" line above the loop?

As far as storing it, I saw something about using a special XML array type to do that. Need to go looking around some more for that.

I'm not an expert, but it looks like "/rss/channel/item/title" would select all "title" elements in a document like this:

Ah, now it's coming clear. This is essentially a drill-down string. It's not asking for all titles. It's asking for all titles that only appear under those pre-set levels. Gotcha.

The next big challenge for me is that I need the author of the document and that's tricky because it's not required. What's more is that it's done with EITHER an <author> tag or a <dc:creator> tag. I'll have to do some sort of check to see if there one of the tags, if not then the other and if not then just fake it by using the Title of the feed as the author.

EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum