Using webclient to check if url exists
Using webclient to check if url exists
Using webclient to check if url exists
Using webclient to check if url exists
Using webclient to check if url exists
Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists
Using webclient to check if url exists Using webclient to check if url exists
Using webclient to check if url exists
Go Back  Xtreme Visual Basic Talk > > > Using webclient to check if url exists


Reply
 
Thread Tools Display Modes
  #1  
Old 01-06-2009, 06:38 AM
Sm00t Sm00t is offline
Newcomer
 
Join Date: Jan 2009
Posts: 2
Default Using webclient to check if url exists


Hello everybody.

I am using this function to get web page page source to my textbox.
Code:
Public Function OpenURL(ByVal strURL As String) As String
        Dim client As New System.Net.WebClient
        Dim data As System.IO.Stream = client.OpenRead(strURL)
        Dim reader As New System.IO.StreamReader(data)
        Dim s As String = reader.ReadToEnd()
        data.Close()
        reader.Close()
        Return s
    End Function
Functions works fine, but if i pass wrong url to function (that does not exist),
visual basic throws WebException was unhandled error. So how can i check if url exists before i open url to read?
Reply With Quote
  #2  
Old 01-06-2009, 08:29 AM
AtmaWeapon's Avatar
AtmaWeaponUsing webclient to check if url exists AtmaWeapon is offline
Fabulous Florist

Forum Leader
* Guru *
 
Join Date: Feb 2004
Location: Austin, TX
Posts: 9,500
Default

There's not really a way around it; you're going to have to handle the exception. I thought that perhaps you could use the HttpWebRequest class since HttpWebResponse has a Status property, but it turns out that HttpWebRequest.GetResponse() can still throw WebException.

It turns out it wouldn't even be useful to know whether the page exists in advance. The internet is volatile, just like the file system. Imagine that you somehow had code that looked like this:
Code:
If PageExists(url) Then
    Console.WriteLine(GetPageSource(url))
End If
Now, imagine what happens if, after PageExists() returns True, there's a lightning strike that causes a power outage that brings down the server on the site you're looking for. GetPageSource() is going to fail even though you checked PageExists() first! It doesn't even have to be a lightning strike, sometimes your route to a site just gets blocked and you can't access it. So what I'm trying to say is it's basically useless to know whether the page exists; it's better to just download the page and handle the exception yourself.

In this case, I'd recommend rewriting OpenURL to look more like this:
Code:
    Private Function GetPageSource(ByVal url As String) As String
        Dim client As New WebClient()
        Return client.DownloadString(url)
    End Function
Now, when you call it, be prepared for a WebException:
Code:
        Dim pageSource As String

        Try
            pageSource = GetPageSource("http://www.example.com/doesnotexist.html")
            Console.WriteLine(pageSource)
        Catch ex As WebException
            Console.WriteLine("Could not fetch page.  Status: {0}", ex.Status)
        End Try
__________________
.NET Resources
My FAQ threads | Tutor's Corner | Code Library
I would bet money 2/3 of .NET questions are already answered in one of these three places.
Reply With Quote
  #3  
Old 01-06-2009, 09:03 AM
Sm00t Sm00t is offline
Newcomer
 
Join Date: Jan 2009
Posts: 2
Default

It works, but i noticed strange problem when i tested your example.
When i pass url "http://www.google.com" to function, it shows the page source, but when i try "http://www.youtube.com", it throws me error.

I guess that that is because YouTube is bigger site and it takes more time to load, right? And when it loads too long, timeout expires. If i am right, how can i fix this?
Reply With Quote
  #4  
Old 01-06-2009, 10:56 AM
AtmaWeapon's Avatar
AtmaWeaponUsing webclient to check if url exists AtmaWeapon is offline
Fabulous Florist

Forum Leader
* Guru *
 
Join Date: Feb 2004
Location: Austin, TX
Posts: 9,500
Default

If you investigate the exception that is thrown, you'll see that youtube is returning "400 Bad Request". Odds are it's doing some kind of user-agent validation to ensure that a web browser and not a robot is accessing it, since automating YouTube is against their terms of service.

Compare this GET request, which is what Firefox sends to YouTube:
Code:
GET / HTTP/1.1
Host: www.youtube.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: UTF-8,*
Keep-Alive: 300
Connection: keep-alive
Cookie: (cookie info redacted)
To the request sent by the application:
Code:
GET / HTTP/1.1
Host: www.youtube.com
Connection: Keep-Alive
You can hack at the request and trick YouTube into thinking you aren't a robot; it's quite easy but it's not the best way to go about things. When you're working with the source of a site you don't control, you're jumping into a pit where you're constantly changing your application every time they make a minor change to their web site. If your robot misbehaves and they happen to take notice, you might find your account disabled or your ISP's IP range banned. Take a look at the YouTube developer API and work with YouTube the correct way.
__________________
.NET Resources
My FAQ threads | Tutor's Corner | Code Library
I would bet money 2/3 of .NET questions are already answered in one of these three places.
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump

Advertisement:





Free Publications
The ASP.NET 2.0 Anthology
101 Essential Tips, Tricks & Hacks - Free 156 Page Preview. Learn the most practical features and best approaches for ASP.NET.
subscribe
Programmers Heaven C# School Book -Free 338 Page eBook
The Programmers Heaven C# School book covers the .NET framework and the C# language.
subscribe
Build Your Own ASP.NET 3.5 Web Site Using C# & VB, 3rd Edition - Free 219 Page Preview!
This comprehensive step-by-step guide will help get your database-driven ASP.NET web site up and running in no time..
subscribe
Using webclient to check if url exists
Using webclient to check if url exists
Using webclient to check if url exists Using webclient to check if url exists
Using webclient to check if url exists
Using webclient to check if url exists
Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists Using webclient to check if url exists
Using webclient to check if url exists
Using webclient to check if url exists
 
Using webclient to check if url exists
Using webclient to check if url exists
 
-->