Calculating a chunk size

Jason88
01-05-2008, 09:23 AM
Hi, I'm creating a small tool that takes a chunk of data from a file, reverses the data and writes it to a new file, etc, until the end of the file.

This all works perfectly fine, but I don't want to hardcode the chunk size. I'd like to create a function that calculates a chunk size based on the total file size, but I'm not really sure how to do this.

For example, if the file size is above 5mb, the chunk size has to be between 40.000 bytes and 70.000 bytes

If a file is 32.471.657 bytes (30.9mb), then how can I calculate a number between 40000-70000 by using that file size?

OnErr0r
01-05-2008, 09:40 AM
I really think a set chunk size is the better way to go. 64K is a good round number in that range and just happens to be pretty optimized code-wise.

But, to answer your question.. I suppose you could just subtract the low limit from the high. Then just calculate the ratio of the size to the max size.

I suppose you'd be handling sizes from 1 to 2GB (larger requires API). Let's say you are, for argument's sake.

2GB = 2147483647 bytes


dblScale = lSize / 2147483647
lChunk = 40,000 + (30,000 * dblScale)


Only problem with that algo is, average size files might be relatively small, which means you probably won't have chunks that approach the maximum. You might consider a logarithmic scale.

dilettante
01-05-2008, 10:15 AM
It looks like a simple scaling problem:
Option Explicit

Private Sub Form_Activate()
Const MinLength As Single = 0#
Const MaxLength As Single = 35000000#
Const MinChunk As Single = 40000#
Const MaxChunk As Single = 70000#
Const Ratio As Single = (MaxChunk - MinChunk) / (MaxLength - MinLength)
Dim Length As Long, Chunk As Long

'Test some Lengths:

For Length = 10000 To 35000000 Step 1500000
Chunk = (CSng(Length) * Ratio) + MinChunk
Print Length, Chunk
Next
End Sub
As already mentioned however, large or small files may result in values you don't really want.

I think there must be more information we don't have.

dilettante
01-05-2008, 10:17 AM
If the goal is I/O optimization I'd also have to agree that sizes between 16K and 64K are usually optimal. Lately I opt for 32K after much experimentation some time ago.

Jason88
01-05-2008, 12:09 PM
Thanks. Perhaps I can add some more constants with other numbers for smaller files.

And what about this?

Private Sub Command1_Click()
Dim lFileSize As Long

lFileSize = FileLen("C:\testfile.bin")

If lFileSize >= 1024 And lFileSize <= 102400 Then 'Between 1kb and 100kb
MsgBox GetChunkSize(lFileSize, 1, 50)
ElseIf lFileSize >= 102401 And lFileSize <= 1048576 Then 'Between 100kb and 1mb
MsgBox GetChunkSize(lFileSize, 1000, 7000)
ElseIf lFileSize >= 1048577 And lFileSize <= 5242880 Then 'Between 1mb and 5mb
MsgBox GetChunkSize(lFileSize, 20000, 35000)
ElseIf lFileSize >= 5242881 Then 'More than 5mb
MsgBox GetChunkSize(lFileSize, 40000, 70000)
End If

End Sub


Private Function GetChunkSize(ByVal filesize As Long, ByVal min As Long, ByVal max As Long) As Long

Rnd -1
Randomize filesize

GetChunkSize = CLng(Val(Rnd() * min) + (max - min))

End Function

OnErr0r
01-05-2008, 12:41 PM
Thanks. Perhaps I can add some more constants with other numbers for smaller files.

I suppose you could, but it's not really necessary. Chunking code should handle all full chunks (if any) first and then a partial chunk (if any) last. If the file is < 64K (or chosen power of two) then it is a single partial chunk.

dilettante
01-05-2008, 06:22 PM
I had assumed there was something more to the problem that required scaled chunks. Perhaps it's just an I/O optimization issue then after all?

Jason88
01-05-2008, 08:07 PM
I'm corrupting binary data by reading a chunk, reversing the data and writing it to a new file, etc and in the end an encrypted password is written to the file. The file is unusable then, but later when entering the correct password, the corrupt file is read in chunks and the data reversed in order to get the original file again.

Instead of using a hardcoded chunk size, I'd like to use a chunk size based on the file size, but in such a way there are always at least 100 or so 'reversed chunks'. I don't really care about speed, but I'm not going to corrupt a 200mb file by reversing 20 byte chunks and a 1mb file by reversing 400kb chunks.

DougT
01-05-2008, 09:12 PM
Have you thought that through ?

Basing the chunk size on the size of the file, corrupting it and then adding something to it (ie a Password) will change the size of the new file. So when you come to de-corrupt it you may pick up a different 'chunk size' and end up reversing (some of) the password as part of the data.

dilettante
01-05-2008, 09:22 PM
Why not just RC4 the file (or better)? Use the CryptoAPI or CAPICOM? Or roll your own ARC4, it isn't hard and the algorithm is all over the web in many programming languages.

Jason88
01-06-2008, 07:16 AM
DougT, yes I've thought that through. I'm already keeping the size difference in mind when de-corrupting the file. It doesn't matter whether I use a fixed chunk size or a chunk size based on the file size. In both cases I need to keep in mind that there's 32 bytes of extra data (MD5 password hash) and ignore it when reading the file and reversing the data.


dilletante, personally I think it's much more fun to write something myself, than to use encryption made by someone else or make my own encryption based on another encryption.

The app is already more or less finished and working fine. I was only looking for a simple (an a bit more secure) function that calculates a chunk size much smaller than the file size, because there's no point in reversing the data of a 20kb file with a 64kb chunk size.

EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum