Pulling bytes from a file

tw607
08-15-2007, 12:19 PM
Hi everyone,
I have a seemingly simple problem which has been driving me crazy. I have a file consisting of a number of bytes and I want to take out eight bytes at a time, combine all eight bytes into a string, convert the new string to decimal and output the decimal to a text file. Here is my code that I've been working with:




Dim index As Integer = 8
For i = 0 To s1.Length() - 1
bytes(i) = s1.ReadByte
If (i < index) Then
s2.WriteByte(bytes(i))
End If
index = index + 8
Next



Here s1 is my filestream reader and s2 is my filestream writer. So far this prints out to the writer file the same thing as the reader file. I am not familiar with working with bytes and this has been driving me crazy. Can someone please help me out. Thanks!

AtmaWeapon
08-15-2007, 12:37 PM
Well there's some design questions I have first but I think I can give you a push in the right direction.

First, what is the purpose of index? Since it starts at 8 and, for every byte read is incremented by 8, the condition i < index will always be true; so long as Byte is 8 bits in size, index will always be i + 8. I see no need for the if statement and can't figure out why index is even needed.

Second, where's the attempt to do anything other than write what you just read? Byte has a ToString() method that does the string conversion, then Decimal's Parse or TryParse methods could be used to go from string to decimal. From there, one of StreamWriter.Write's many overloads accepts a decimal parameter, so after putting those pieces together you should be done.

tw607
08-15-2007, 12:49 PM
Hi, thanks for the quick response. I should have never included the index loop in my code, that was my failed attempt at breaking s1 (streamreader) down into 8 byte sections.
My main problem is pulling out eight bytes (which are in hex) and placing them into a string then converting them into decimal (using Val("&H" & HEXstring)) and outputting them to s2 (streamwriter). I'm pretty sure I can do everything else, my problem is getting eight bytes into a string variable without any spaces between so I can use the conversion method mentioned earlier.
Earlier today i was using the ToString() method which was pretty cool but it only converted each single hex value into a decimal where I need every sequence of eight, due to the negative sign (ie: ffffffe6) is a negative number, which the Val("&H"& ) method takes care of. If there is any other information that may help you help me please let me know, most times I do not explain myself thoroughly. Thanks again for all your help!

AtmaWeapon
08-15-2007, 02:24 PM
Byte is an implicitly unsigned type so handling negative values isn't going to be built in without some kind of hack. My guess is the Val method you mentioned actually upcasts to Integer or some other type before returning a negative value.

This was a booger to test since code like:
For i As Byte = 0 To 10
writer.Write(-i)
End Forwas actually upcasting i to a Short before writing it, since the negative value wasn't allowed. This is a 16-bit value and I didn't notice it right away so my test file was all wrong. Finally I just used the binary editor to make it myself, though Byte.MaxValue - i would probably have done the trick.

No matter what you do it's kind of ugly but I wrote this method that converts a Byte to a Decimal, assuming a signed range from -128 to 127. I'm pretty sure it just does what Two's Complement does in binary but I haven't exactly error checked it either. Private Function GetDecimalFromByte(ByVal byteValue As Byte) As Decimal
Dim result As Decimal = byteValue

If (result > Byte.MaxValue / 2) Then
result -= Byte.MaxValue + 1
End If

Return result
End Function

tw607
08-15-2007, 02:38 PM
Thank you so much for trying to help me out with the function but what I'm really trying to do is just read in 8 bytes from a file using streamreader and somehow input these bytes into a string without any spaces between the bytes so I can convert them. I guess what I'm trying to say is that each byte is not a decimal number, every eight bytes comprises one decimal number. I need some sort of a loop to read a file containing the bytes, place them into a string, convert them, place them into a text file line by line then end the loop and do the same to the next eight bytes within the file until the file ends. I'm sorry if I did not explain that well enough earlier. Thank you very much for trying to help me. I'll still be trying to work it out so if you have any more ideas please let me know, this thing is driving me crazy!

AtmaWeapon
08-15-2007, 03:22 PM
I believe one of us is confused by terminology.

A bit, or Binary digIT, is represented as a 0 or a 1.

A byte is 8 bits. The number "1" is represented as a byte in binary as 0000 0001, and hex 0x01.

My interpretation based on your original code was that you want to read 8 bits, store them in a Byte value, then convert this value to a Decimal. One problem you mentioned was negative values, which the Byte type does not intrinsically support. I posted a method that takes a single Byte value and returns the appropriate Decimal value, since the string conversion is not really necessary.

The source of confusion here is you keep mentioning 8 bytes; this corresponds to a 64-bit data value. Decimal itself is of a weird structure; it is a 1-bit sign + 96-bit number and a scaling factor that is not documented but seems to suggest 6 bits.

In this case, I'd suggest using BinaryReader.ReadInt64 to read the 8-byte integer from the file, then simply cast this value to Decimal using CType. This should handle negative values. The string conversion is completely unnecessary and really would end up being a source of errors in my opinion.

The following snippet reads 8-byte integer values from a file, then stores them in a list. I believe from this point it should be trivial to output the list as you wish. Dim decimals As New List(Of Decimal)()

Using reader As New BinaryReader(File.OpenRead("test.bin"))
Dim input As Int64
Dim finalValue As Decimal

While Not reader.PeekChar() = -1
input = reader.ReadInt64()
finalValue = CType(input, Decimal)

decimals.Add(finalValue)
End While
End Using

tw607
08-16-2007, 05:43 AM
That is why i have been having so much trouble with this, I'm not really sure which data types to use at certain times. Also, I have been using a hex editor to view my data, which makes the data seem as though it is split up into sections. For example, the last 2 lines of one of my files looks like this in the hex editor:

ff ff fc 0d ff ff f3 94 00 00 03 9f ff ff f2 2d
00 00 09 b9 00 01 06 4c 00 00 0d 49 00 01 06 52

Whereas ff ff fc 0d is one negative number and 00 00 09 b9 is one positive number. I was looking at it all wrong. I'm beginning to understand what i should have understood from the beginning.
I used the snippet you posted to see what I would get and when I place a MsgBox just outside of the loop it gives me the number 1376125184 from the file that I posted the hex from (the last two lines of the file). Any idea where this number comes from or what I would do to get 2489 from 00 00 09 b9. I'm sorry to not have explained this thoroughly before, this is the first time I have had to work with raw data like this and the data types and conversions are pretty hard to swallow but I have learned an incredible amount over the past few days and it seems that I will learn so much more today. Thanks again for helping me out!

tw607
08-16-2007, 06:02 AM
Actually, I changed the Int64 to Int32, with Int64 it gives me the number 38444252242180.

tw607
08-16-2007, 07:35 AM
I placed the MsgBox(finalValue) within the loop and ran through the entire file and I have found that the number of results displayed is equal to how many numbers are located within the file. For example, if I were to take the previous two lines I posted before and ran the loop through a file that consisted of just those two lines then 8 numbers would be displayed in its own MsgBox, so the loop is finding the numbers, I think itís only the conversion factors that misrepresent the hex number as it is in Decimal.
I have tried to change the finalValue variable from Decimal to every other umber data type and the same results are displayed. I'm not sure but I think the conversion is becoming unclear when it deals with the reader but when I attempt to reader in something other than .ReadInt32 then it gives me some type of error depending on what I change it to.
I just thought maybe this would help to figure out why the actual Decimal representation is not being displayed.

jo0ls
08-16-2007, 07:49 AM
"I want to take out eight bytes at a time, combine all eight bytes into a string, convert the new string to decimal and output the decimal to a text file"

So, original plan (fixed a bit)

Read 4 bytes, convert to hex string, convert hex string to Integer, convert Integer to a String that represents the value in decimal notation, Write String to file.

There was some confusion as you thought FFFFFFE6 was 8 bytes, where it is really 4. Also you said you wanted to write decimals to the file, and there is a Decimal Type, but really you want to write the decimal representation of the value rather than the hexadecimal representation of the value.

Instead of that plan, all you need to do is:

Read 1 Integer into a variable. Write the variable.ToString into a text file.


What do the numbers represent? Are they supposed to be signed or unsigned? You can read the same 32 bits into an Integer and a UInteger and get different values.

Maybe:

'
Dim values As New List(Of Integer)

Using reader As New BinaryReader(File.OpenRead("test.bin"))
While Not reader.PeekChar() = -1
values.Add(reader.ReadInt32)
End While
End Using

' Write the decimal representation to a text file.
' One per line.
Using writer As New StreamWriter("test.txt")
For Each int As Integer In values
writer.WriteLine(int.ToString)
Next
End Using

If you don't want negatives then use UInteger and UInt32 where appropriate.

If it is still wrong, then the file was probably written on a system that writes files using a different Endianness (http://en.wikipedia.org/wiki/Endianness) to the way we are reading them. If that is the case, then you will need to use another approach.

tw607
08-16-2007, 08:01 AM
Maybe another problem I have is that I do not understand the declaration
Dim values As New List(Of Integer)
From the previous code I changed it to
Dim decimals As New ListItemCollection()

Is List() a datatype?

jo0ls
08-16-2007, 08:06 AM
Are you using 2005? It's a Generic List (http://msdn2.microsoft.com/en-us/library/6sh2ey19.aspx), which is like an ArrayList that only stores Integers. An ArrayList (http://msdn2.microsoft.com/en-us/library/system.collections.arraylist.aspx) is like an Array, only it can grow as you add items - there is no need to use ReDim all the time.

If you have an earlier version of VB.Net, then you could store the values in an ArrayList:

Dim values As New ArrayList.

tw607
08-16-2007, 08:20 AM
The ArrayList worked but I still get the same results as the previous code. I have attached 2 files: Test1.txt is my source file and Test.txt is the output from the program. I think my source file is in a different format than normal. Maybe I need to switch the encoding for it? I have been using a hex editor to view the contents.

AtmaWeapon
08-16-2007, 08:38 AM
The trouble is endian-ness; it seems like the .NET classes work in little-endian but your file is big-endian.

I'll try and write this to be compatible with .NET 1.x code, but you should really consider moving to .NET 2.0 since it is free (unless you want the tools that come with the Professional version or better). The ability to use generic lists will help you greatly, since ArrayList has to box the integer values we are storing and this greatly affects performance. In a high-performance scenario, the .NET 1.1 developer would have no choice but to implement their own collection in this scenario, but the .NET 2.0 developer can use Generic collections and avoid this roadblock.

(I thought it was going to be trouble, but actually conversion won't be that bad; I just discovered the BitConverter class!)

Dim numbers As New ArrayList()

Using reader As New BinaryReader(File.OpenRead("test.bin"))
Dim input As Int32
Dim byteArray As Byte()

While Not reader.PeekChar() = -1
byteArray = reader.ReadBytes(4)
If BitConverter.IsLittleEndian Then
Array.Reverse(byteArray)
End If
input = BitConverter.ToInt32(byteArray, 0)

numbers.Add(input)
End While
End Using

Using writer As New StreamWriter("test.txt")
For Each value As Integer In numbers
writer.WriteLine(value)
writer.WriteLine("---")
Next
End Using

It's a real shame you can't instantiate a BitConverter and use IsLittleEndian to specify whether the class should do a little-endian or big-endian conversion, but this is just as good. The program could be made more generic by using a method like this: Private Sub FormatArray(ByVal desiredFormat As Endian, ByRef byteArray As Byte())
Dim machineFormat As Endian = Endian.Big

If (BitConverter.IsLittleEndian) Then
machineFormat = Endian.Little
End If

If machineFormat <> desiredFormat Then
Array.Reverse(byteArray)
End If
End SubIt's actually cleaner to implement in C# due to the tertiary operator and case-sensitivity but oh well. You'd just have to remember to always pass an array to this method after reading or before writing (if this is done in a lot of places it's almost worth deriving a new BinaryWriter/BinaryReader that does this automatically.)

tw607
08-16-2007, 08:49 AM
That worked!! Wow, that was difficult. I need to spend some time reading up on the endian. Thank you guys so much, this really really made my day!

AtmaWeapon
08-16-2007, 09:00 AM
I just learned a neat trick after years of not understanding endianness, I want to share it with you.

I couldn't figure out what the name means, but I figured it out today and will highlight the important part. end[/c]ian is a hint that the name has to do with what "end" of the bits comes first. There is a "big" end and a "little" end, usually known as "most significant" and "least significant". In general, our number system is big-endian, that is the most significant bit comes first. In little endian, the least significant bit comes first.

So, if we consider the number 48879, we can see that:

[b]Big-endian (msb first): 0xBEEF
Little-endian (lsb first): 0xEFBE

tw607
08-16-2007, 09:36 AM
Thanks so much for the explanation. It would have taken me a year to figure all that out by myself. I think I'm going to have to use this type of data very much in the near future so you made my life much easier with this explanation of dealing with endian. Thanks Again!

EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum