jensludwig
09-28-2009, 08:48 AM
hi guys,
I would like to process xml files, around 1 giga to sql. Basically I would like to know which is the best perfomance especially about speed, to processing this(transfers the values and tags to one table).
Does anyone know something about?
thk!!!
Jens
vb5prgrmr
09-28-2009, 08:08 PM
SQL itself can process properly formated XML files but if these files are informational files you will have to pull the information yourself and this can be done via the DOM or the XML ?.? objects, which you can find under references.
Good Luck
jensludwig
09-29-2009, 10:14 AM
Hey,
I have to process using vb extracting the values and so on...
I heard about SAX is one code to do this but actually i dont know if its good or not. I did not understand well how it works.
By DOM i think will make a big temporary file i dont know.
How to do by xml objects?
thanks!
dilettante
09-29-2009, 08:32 PM
Using a SAX parser is much more practical than building a DOM if you are processing large XML documents and only need to make a single pass over them to extract information. MSXML contains a SAX parser, and some or all of it is probably used internally when populating the relatively heavyweight DOM object in the same library.
Since most XML use involves very short documents most of the copy/paste code you'll see on the Web is DOM-centric. Despite the fact that XML documents usually contain 50% to 90% overhead (markup) this penalty isn't a problem when the documents are brief. The same is true of the tremendous overhead of a DOM hierarchy - which eats memory prodigiously.
So most people using SAX tend to rely on the documentation, and few publish example code or tutorials. The SAX process is more fine-grained so writing a generalized tutorial is more work.
Roger_Wgnr
09-29-2009, 08:50 PM
There is a sample of a SAX Parser for VB6 in post 27 of this thread (http://www.xtremevbtalk.com/showthread.php?t=307417&page=2)
jensludwig
10-14-2009, 07:28 AM
Apologize to answer kind of late....
I read the post 27, but I didnt understand well. :whoops:
I made one code but is too slow to process something around 6 minutes. My xml has around 500Mb.
'------------- Get the current date/time of the XML been imported ---------------
If Trim(myTextLine) = "<header>" Then
isHeader = True
End If
If Trim(myTextLine) = "</header>" Then
isHeader = False
End If
If isHeader Then
If Trim(myTextLine) <> "<header>" And Trim(myTextLine) <> "</header>" Then
'<log dateTime="2007-12-26T06:00:11"
Dim v_myIndex As Integer = myTextLine.Trim().IndexOf("dateTime")
Dim v_myTime As String
If v_myIndex <> -1 Then
v_myTime = myTextLine.Trim().Substring(v_myIndex + 10, 19).Replace("T", " ")
If IsDate(v_myTime) Then
xmlDate = CDate(v_myTime)
v_myDate = xmlDate.ToString("yyyy-MM-dd")
MainForm.SetAddMsg_lbl_main("Importing parameters for day " & v_myDate)
End If
End If
End If
End If
'------------- End of Get the current date/time of the XML been imported ---------------
'--------------------------- New Managed Object Found ------------------------------
If InStr(myTextLine, "<managedObject") >= 1 Then
MainForm.g_moRead = MainForm.g_moRead + 1
isItem = False
isList = False
isListStoredInSeparatedTable = False
'--------- Designer Code --------
'MainForm.g_command.CommandText = "INSERT INTO _nokiaOSS_managedClasses (ManagedClass) VALUES ('" & myTextLine & chr(34) & ")"
'MainForm.g_command.ExecuteNonQuery()
If getManagedObject(myTextLine, myManagedObject) Then
nbrManagedObjects = nbrManagedObjects + 1
Call getManagedObjectID(myManagedObject(3), myManagedObjectIDname, myManagedObjectIDvalue, myNbrIDs)
myParameterID = 0
End If
Select Case myManagedObject(1)
Case "COCO"
v_insertNames(0) = "RncId,COCOId,"
v_insertValues(0) = Chr(34) & Val(myManagedObjectIDvalue(0)) & Chr(34) & "," & _
Chr(34) & Val(myManagedObjectIDvalue(1)) & Chr(34) & ","
Case "RNC"
'A_RNC
v_insertNames(IVI_RNC) = "RncId,"
v_insertValues(IVI_RNC) = Chr(34) & Val(myManagedObjectIDvalue(0)) & Chr(34) & ","
'A_RNC_AC
v_insertNames(IVI_RNC_AC) = "RncId,"
v_insertValues(IVI_RNC_AC) = Chr(34) & Val(myManagedObjectIDvalue(0)) & Chr(34) & ","
'A_RNC_PS
v_insertNames(IVI_RNC_PS) = "RncId,"
v_insertValues(IVI_RNC_PS) = Chr(34) & Val(myManagedObjectIDvalue(0)) & Chr(34) & ","
'A_RNC_SIB - indexes 3,4,5 are already in use by IurItem, IuItemCS, IuItemPS
v_insertNames(IVI_RNC_SIB) = "RncId,"
v_insertValues(IVI_RNC_SIB) = Chr(34) & Val(myManagedObjectIDvalue(0)) & Chr(34) & ","
There are "case" for all the tags for this reason I think its too slow. I attached the XML, with this 2 cases just to let this better to evaluate. What I would like to do is to put each p name as a column of a table class and the values. Actually this part I did and I think the problem is to parse the xml and not to drop the values on the table.
Thanks!