Ok, the idea is simple: I want to edit the contents of some .txt files but some of them are using vbLf for new-lines. So the first thing I have to do is to replace vbLf with vbCrLf. Leaving aside the unnecessary code, I do this:
Code:
Open App.Path & "/" & txtPath.Text & ".txt" For Binary As #1 'I open the file as binary
Dim strBuff As String: strBuff = Space(LOF(1)) 'I create a string variable and put as many spaces as the size of the .txt
For m = 1 To Len(strBuff) 'I go through the string char-by-char
DoEvents
If (Mid(strBuff, m, 1) = vbLf) Then 'finding vbLf
strBuff = Left(strBuff, m - 1) & vbCrLf & Mid(strBuff, m + 1, Len(strBuff)) 'and trying to replace them. Here is the problem
m = m + 1
End If
Next m
Everytime I am trying to do anything in the strBuff variable, I am getting Out Of Memory error.
I have tried the replace() function
Code:
strBuff = replace(strBuff, vbLf, vbCrLf)
I have tried using different variables for storing the processed data
I have tried a different approach to replace, by getting the text on the left and right side of the string (ignoring the vbLf in the middle) and adding vbCrLf between them
Code:
strBuff = Left(strBuff, m - 1) & vbCrLf & Mid(strBuff, m + 1, Len(strBuff))
or
strTemp = Left(strBuff, m - 1) & vbCrLf & Mid(strBuff, m + 1, Len(strBuff))
strBuff = strTemp
Everything gives me Out Of Memory error. Any ideas why is this happening?
This should work as long as you have enough memory to hold the entire file.
Code:
Dim sLines() as string
Dim AllText As String
Dim FF As Integer
FF=FreeFile()
Open("MyFile.txt") For Binary As FF
AllText=Space(LOF(FF))
Get FF,,AllText
Close FF
' Create a zero based array of strings.
sLines = split(AllText,vbLF)
' Clear Alltext
AllText = ""
' Show First Line of text.
msgbox sLines(0)
' Show Last Line of text
msgbox sLines(ubound(sLines)
__________________ Burn the land and boil the sea
You can't take the sky from me
Thank you very much. From a quick test, it seemed to work (altho it took 5-10mins to split a 136mb .txt into 14.344.391 array elements with an i7 @ 4.5GHz and 32GB RAM). There are still 2 problems:
1) I can't control split() to add some DoEvents so the program appears as Not Responding during the process. I would also like to add a progress bar since the process takes so long.
2) When I stop the program and re-run it, I get Out Of Memory error in the line where I store the entire .txt to a string:
Code:
Dim strBuff As String: strBuff = Space(LOF(1))
Get #1, , strBuff <----HERE
If i close visual studio and open it again, I don't get Out Of Memory error. Any ideas why is that happening?
No Idea. Perhaps 20 year old VB6 has memory limitations when using Get on binary reading.
You are not going to have much luck using doevents when reading the the file.
Your main thread is fully occupied reading the file.
You do not say what you are doing with the file data once you read it.
Are you in control of populating the file?
Are you processing the line and outputting it to another file?
Are you trying to store the data in a control on a form?
Are you just searching for particular lines?
Does your file grow and grow continuously?
What is the purpose of your program?
Depending on the above questions there may be better solutions.
As far as locking up goes with your current code.... You could read your file in chunks.
and process each chunk before you load the next chunk.
This may take a bit longer than loading the entire file but should aleviate lockup and out of memory errors. You could possibly have a working progress bar in this scenario.
Calculate your chunk size and set your progress bar min and max based on how many chunks are in your total file size. After reading a chunk adjust your progress bar index.
__________________ Burn the land and boil the sea
You can't take the sky from me
Ok. I am collecting .txt files containing lists of names of various things like all names from shakespeare books, name of cities, towns, drinks etc. From the .txt I have collected so far, I have noticed some things that I want to change:
1) remove comments (most of the time it's the source of the file and they all start with # symbol)
2) remove records that contain white spaces (for example Sex on the Beach drink should be removed)
3) remove white spaces from the beginning or the end of a record
4) sort records
5) remove duplicates
6) save the records into a new .txt file
So the idea is:
open .txt
store each line of .txt into an array
do all the edits mentioned above to the array
save array into a new .txt
All my problems are in step 2.
My program works perfectly with almost all my .txt, no matter if they're using vbCrlf or vbLf. All the problems arise when I try to edit a 136MB file which uses vbLf. I created a 150MB .txt with random data but with vbCrLf as line-seperator and my program was working perfectly.
vbLf is my arch enemy!
I use Vim to open large .txt files and it has a find-and-replace function. I used it to replace Lf (\n) with CrLf (\r\n) and it gave me an Out Of Memory error too!!!
If Flyguy can find the library he is talking about you should be able to read items line by line just as you would with a file containing vbCrLf line ends.
Otherwise the chunk method I mentioned earlier would definitely work.
One option would be to read in a chunk, process the lines of text, output the cleaned up chunk to your new file, read in a chunk, ...
Basically once you read a chunk you split it by vbLf. Process all but the last line.
Read the next chunk and append it to the last line. This handles partial lines in the last line in the chunk.
Code:
Dim Hold as string
Dim Chunk as string
Dim sLines() as string
Do
Dim i as integer
' Read Chunk here
Hold = Hold & Chunk
sLines = split(Hold,vbLf)
For i = 0 to uBound(sLines) - 1
' Process lines
Next i
Hold = sLines(ubound(sLInes))
Until ChunkCount = ChunkTotal
' Process last line here
BTW you might want to consider writing your final output to a true database.
That way you could slice, dice, sort, search your end data with ease using simple Queries.
__________________ Burn the land and boil the sea
You can't take the sky from me
I wish Flyguy will find the code! The output should be in .txt because it is gonna be used as feed to another program. Funny thing is that the other prorgam is in Linux so vblf would be perfect!
At some point i considered the chunk method because I found a function in FSO that can read a specific amount of data from a file. The first problem I faced when I tried it is that if I read chunks of, let's say, 1000 chars and the .txt uses vbCrLf (which means asc(13) & asc(10)), who tells me that the 1000th char isn't asc(13) and accidentally chop a vbCrLf in two? The length of the chunks was hard-coded so I couldn't increase/decrease it or put if-statements in it.
This is so frustrating I'm considering learning a new programming language just to make it happen! I'm using VB just because it's easy to draw the form...
'---------------------------------------------------------------------------------------
' Module : clsFile
' DateTime : 18-11-2005
' Author : Will Barden
' Purpose :
'---------------------------------------------------------------------------------------
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (pDst As Any, pSrc As Any, ByVal ByteLen As Long)
Private m_bIsUnix As Boolean
Private m_lBufferSize As Long
Private m_iID As Integer
Private m_bFileOpen As Boolean
Private m_lFileSize As Long
Private m_lBytesRead As Long
Private m_bEOF As Boolean
Private m_lBlockPointer As Long
Private m_lNofBlocks As Long
Private m_lBlock As Long
Public Property Get Progress() As Double
Progress = 100# * m_lBytesRead / m_lFileSize
If Progress > 100 Then Progress = 100
End Property
'---------------------------------------------------------------------------------------
' Procedure : EndOfFile
' DateTime : 18-11-2005
' Author : Will Barden
' Purpose :
'---------------------------------------------------------------------------------------
Public Property Get EndOfFile() As Boolean
If m_bFileOpen Then
If m_bIsUnix Then
EndOfFile = m_bEOF
Else
EndOfFile = EOF(m_iID)
End If
End If
End Property
'---------------------------------------------------------------------------------------
' Procedure : CloseTextStream
' DateTime : 18-11-2005
' Author : Will Barden
' Purpose :
'---------------------------------------------------------------------------------------
Public Sub CloseTextStream()
If m_bFileOpen Then
Close m_iID
m_bFileOpen = False
End If
End Sub
'---------------------------------------------------------------------------------------
' Procedure : OpenTextStream
' DateTime : 18-11-2005
' Author : Will Barden
' Purpose :
'---------------------------------------------------------------------------------------
Public Function OpenTextStream(sFilename As String, Optional bUnix As Boolean) As Boolean
On Error GoTo errHandler
If Not m_bFileOpen Then
m_bIsUnix = bUnix
If Len(Dir$(sFilename)) > 0 Then
If bUnix Then
OpenTextStream = OpenUnixFile(sFilename)
Else
OpenTextStream = OpenDosFile(sFilename)
End If
End If
If Not OpenTextStream Then
Close #m_iID
m_bFileOpen = False
End If
End If
Exit Function
errHandler:
End Function
'---------------------------------------------------------------------------------------
' Procedure : ReadLine
' DateTime : 18-11-2005
' Author : Will Barden
' Purpose :
'---------------------------------------------------------------------------------------
Public Function ReadLine() As String
If m_bFileOpen Then
If m_bIsUnix Then
ReadLine = ReadUnixLine
m_lBytesRead = m_lBytesRead + Len(ReadLine) + 1
Else
ReadLine = ReadDosLine
m_lBytesRead = m_lBytesRead + Len(ReadLine) + 2
End If
End If
End Function
'---------------------------------------------------------------------------------------
' Procedure : OpenDosFile
' DateTime : 18-11-2005
' Author : Will Barden
' Purpose :
'---------------------------------------------------------------------------------------
Private Function OpenDosFile(sFilename As String) As Boolean
On Error GoTo errHandler
m_iID = FreeFile
Open sFilename For Input As m_iID
m_bFileOpen = True
OpenDosFile = m_bFileOpen
Exit Function
errHandler:
End Function
'---------------------------------------------------------------------------------------
' Procedure : OpenUnixFile
' DateTime : 18-11-2005
' Author : Will Barden
' Purpose :
'---------------------------------------------------------------------------------------
Private Function OpenUnixFile(sFilename As String) As Boolean
On Error GoTo errHandler
m_iID = FreeFile
Open sFilename For Binary As m_iID
m_lFileSize = LOF(m_iID)
m_lNofBlocks = m_lFileSize \ m_lBufferSize
m_lBlock = 0
m_lBlockPointer = -1
m_bFileOpen = True
OpenUnixFile = True
Exit Function
errHandler:
End Function
'---------------------------------------------------------------------------------------
' Procedure : ReadDosLine
' DateTime : 18-11-2005
' Author : Will Barden
' Purpose :
'---------------------------------------------------------------------------------------
Private Function ReadDosLine() As String
Dim sLine As String
If Not EOF(m_iID) Then
Line Input #m_iID, sLine
ReadDosLine = sLine
End If
End Function
Private Function ReadUnixLine() As String
Dim bFound As Boolean
Dim lUbound As Long, lStart As Long, lLen As Long
Static bBuffer() As Byte
' Get a new block of data
If m_lBlockPointer = -1 Then
If m_lBlock > m_lNofBlocks Then
m_bEOF = True
Exit Function
End If
bBuffer = ReadBlock(m_lBlock)
m_lBlock = m_lBlock + 1
End If
' Start of the new string
lStart = m_lBlockPointer + 1
lUbound = UBound(bBuffer)
Do Until m_lBlockPointer = lUbound Or bFound
m_lBlockPointer = m_lBlockPointer + 1
bFound = (bBuffer(m_lBlockPointer) = 10)
Loop
If bFound Then
' End of line found, build string
lLen = m_lBlockPointer - lStart
If lLen > 0 Then ReadUnixLine = BytesToString(bBuffer, lStart, lLen)
Else
' No EOL, 1st part from current buffer, 2nd part from second buffer
m_lBlockPointer = -1
lLen = lUbound - lStart + 1
If lLen > 0 Then
ReadUnixLine = BytesToString(bBuffer, lStart, lLen) & ReadUnixLine()
Else
ReadUnixLine = ReadUnixLine()
End If
End If
End Function
Private Function ReadBlock(lBlock As Long) As Byte()
Dim lLen As Long
Dim bBuffer() As Byte
If lBlock = m_lNofBlocks Then
lLen = m_lFileSize Mod m_lBufferSize
ReDim bBuffer(lLen - 1)
Else
ReDim bBuffer(m_lBufferSize - 1)
End If
Get #m_iID, , bBuffer
ReadBlock = bBuffer
End Function
'---------------------------------------------------------------------------------------
' Procedure : BytesToString
' DateTime : 24/7/02
' Author : Will Barden
' Purpose : converts a part of a byte array to a string
'---------------------------------------------------------------------------------------
Private Function BytesToString(ByRef bArr() As Byte, ByVal StartIndex As Long, ByVal Length As Long) As String
BytesToString = Space$(Length)
CopyMemory ByVal BytesToString, bArr(StartIndex), Length
End Function
Private Sub Class_Initialize()
m_lBufferSize = 102400
End Sub
Private Sub Class_Terminate()
' Just to be sure
If m_bFileOpen Then Close m_iID
End Sub
When you're processing a text file in chunks (assuming that the length of any line is less thant he chunk size) the final array element resulting from the SPLIT (i.e. last line of the first chunk) could contain
the entire line
an empty line
The entire line plus the vbCR character
Just the vbCR character
The first line in the subsequent chunk (and subsequent array) could be
the remainder of the prior line (the most likely occurrence)
an empty string, meaning that the vbNewLine was the first two characters in the chunk, and that the preceeding text line was, in fact, a complete line
vblf
and, in all three of these cases, IF you you have saved the prior partial line (in all cases except the first chunk) and you concatenate the prior chunk's last line to the subsequent chunk's first line, you have a complete line to work with. And, on the first chunk, it doesn't matter of you concatenate the prior line, because it would be. by default, an empty string anyway.
What you do is to add a variable (call it sTemp for this description) to hold the final array element of the chunk. Of course, initially, it will be an empty string which is what you want. Inside your chunk loop, read the chunk, concatenate the new string variable to the beginning of the chunk (sChunk = sTemp & sChunk,) then split the chunk on vbNewLine.
After the chunk is read, split and processed (except for the final array element,) assign that last array element to the temp variable, then set the last array element equal to an empty string. (You now JOIN the array into a single string variable using vbLF as your delimiter (since it's going to a Linux box for further processing,) and save the chunk out to your target output file.)
The next chunk is read, the [usually] partial trailing line from the prior chunk is inserted at the beginning of the next chunk (which will reconstruct the broken line, if there was one.) Now, Split on vbNewLine and repeat.
On the finalchunk, after (and outside) the chunk loop, Chunksize is reduced to fit the remaining characters left in the source file; and the last line in the resulting array is processed, rather than being saved in sTemp.
At the beginning, I like to calculate the number of chunks to process and the remainder chunk size by an integer division of filesize by chunksize, and Modulus of filesize by chunksize, respectively. That way, you can use a for/next loop for processing all but the final (partial) chunk. Obviously, if the remainder chunksize is 0, sTemp contains the entire last line of the file, so only it needs to be saved to the end of the target file.
__________________ Lou
"I have my standards. They may be low, but I have them!" ~ Bette Middler
"It's a book about a Spanish guy called Manual. You should read it." ~ Dilbert
"To understand recursion, you must first understand recursion." ~ unknown
Last edited by loquin; 04-07-2014 at 10:52 AM.
Reason: clarification
The ASP.NET 2.0 Anthology
101 Essential Tips, Tricks & Hacks - Free 156 Page Preview. Learn the most practical features and best approaches for ASP.NET. subscribe