Getting <Title> out of HTML source code

mlarsen
04-24-2008, 08:14 AM
Hello

Kind of new to VBA and need some tips on how to get what is in between <Title>....</Title> tags in a text file containing HTML source code into column 3. The VBA code searches through many folders and subfolders finding any file that is .html and reading it. I tried parsing but can't seem to get it to work. Any ideas? Here is the code that I have (that doesn't work)

Sub CheckTextFilesForHREFs()
MsgBox "Press OK to begin report"
Dim WholeLine As String
Dim myPath As String
Dim workfile As String
Dim myR As Long

myPath = "C:\Exelon\"
workfile = Dir(myPath & "*.html")
'sLine = WholeLine

Set fs = Application.FileSearch
With fs
.LookIn = "C:\Exelon"
.Filename = ".html"
.SearchSubFolders = True
'.FileType = mosFileTypeAllFiles
If .Execute(SortBy:=msoSortByFileName, _
SortOrder:=msoSortOrderAscending) > 0 Then
MsgBox "There were " & .FoundFiles.Count & _
" file(s) found."
For i = 1 To .FoundFiles.Count
ParseURL .FoundFiles(i)
ParseTitle .FoundFiles(i)
ParseLink .FoundFiles(i)
Next i

Else
MsgBox "There were no files found."
End If
End With

Sub ParseTitle(strFile As String)
Dim strTxt As String, lngTxt As Long, i As Long, oMatches
Dim ws As Worksheet, j As Long, k As Long, m As Long, oMatches2
Dim reg, oMatches3, reg2
i = FreeFile
'strFile = "c:\Users\Richard\Documents\Htmltest.html"
lngTxt = FileLen(strFile)
strTxt = Space(lngTxt)
Open strFile For Binary Access Read As #i
Get #i, , strTxt
Close #i
Debug.Print strTxt
With CreateObject("vbscript.regexp")
.Global = True
.ignorecase = True
.Pattern = vbCrLf & ".*?title.*?(?=" & vbCrLf & ")"
If .test(strTxt) Then
Set oMatches = .Execute(strTxt)
For i = 0 To oMatches.Count - 1
Set reg = CreateObject("vbscript.regexp")
With reg
.Global = True
.ignorecase = True
.Pattern = "<title>\""(.*?)\"""
k = Cells(Rows.Count, 1).End(xlUp).Offset(1).Row
Cells(k, 1).Value = strFile
If .test(oMatches(i)) Then
Set oMatches2 = .Execute(oMatches(i))
For j = 0 To oMatches2.Count - 1
Cells(k, j + 3) = .Replace(oMatches2(j), "$1")
Next j
End If
End With
Next i
End If
End With
End Sub

the master
04-24-2008, 08:33 AM
Regular expressions can be very confusing. There is an alternative method to do what you want. You can use instr() to find the tags and left() and right() or mid() to split the string.

Something like this should work

dim temp as string

'For this example your HTML would be in the variable 'temp'

temp = Right(temp, Len(temp) - (InStr(1, temp, "<title>", vbTextCompare) + Len("<temp>")))
temp = Left(temp, InStr(1, temp, "</title>", vbTextCompare) - 1)

mlarsen
04-24-2008, 09:36 AM
Thanks very much. How would I then within that code you provided get the results into column C?

the master
04-24-2008, 03:32 PM
That depends. The way i do it is to access each cell using numbers. Sheet1.Cells(RowNumber,ColumnNumber).value. You would put 3 for the column number (C=3) and im not sure about the row number. You can put 1 if you want and it will always put the value in C1 but it sounds like you want a list. If you have a variable that says which row you are on then use that as the row number.



'Assuming the variable 'intRowNum' is the current row...

Sheet1.Cells(intRowNum, 3).value = temp

EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum