Append character to string

Slow-Mo
12-30-2004, 09:23 AM
I have run into a problem when appending a character to a string.

First of all, it took a while to find the bug in my program, because this code worked fine before I switched to another language for non unicode programs (different default text encoding)

So here is a piece of code that is causing problems:

str &= Chr(h)

If "h" is for example 209, character with ascii code 78 is appended to my string. I see it when using ?asc(str.Chars(str.length-1)) command in Immediate window after the character has been appended to a string.

So what could be the cause for that? The only thing I can do now is to use a byte array instead of string, but still, maybe there is a solution for my problem?

AFterlife
12-30-2004, 09:35 AM
I havent used it before. But try messing around with

system.text.encoding


Ive heard you can convert text with this. although i havent tried it out as of yet.

AFterlife
12-30-2004, 12:29 PM
did you try my idea. I think thats what your looking for.
try this
http://dotgnu.org/pnetlib-doc/System/Text/Encoding.html

Slow-Mo
12-30-2004, 01:49 PM
did you try my idea. I think thats what your looking for.
try this
http://dotgnu.org/pnetlib-doc/System/Text/Encoding.html

Well, I'm not sure how to use it in my situation.

If I append a character using chrW then I can get the correct ansii code using ascW function. But this is just a temporary solution. I want this string to be converted back to char array using System.Text.Encoding.Default.GetBytes(stringtoconvert)

So I have determined that asc(chr(209)) returns 78 instead of 209.

lebb
12-30-2004, 02:36 PM
As you pointed out, AscW and ChrW are needed instead of Asc and Chr in this situation. I don't understand what you mean about it being a temporary solution, however. When I test this with GetBytes, it works fine. Can you give some more details about what you are trying to do?

Slow-Mo
12-30-2004, 06:57 PM
As you pointed out, AscW and ChrW are needed instead of Asc and Chr in this situation. I don't understand what you mean about it being a temporary solution, however. When I test this with GetBytes, it works fine. Can you give some more details about what you are trying to do?

What I am trying to do is to build a string from another string using certain algorythm. I'm buliding this string character by character and these characters can not only be alphanumeric characters. In fact, they can be almost any ANSI characters (0..255). But when I try to convert this string to byte array using
StringToBytes = System.Text.Encoding.Default.GetBytes(Data)
bytes with wrong values are returned.
As I said, the string is built using str &= Chr(h), but Chr(h) is appending character with wrong ANSI code to string. Chr(209) is actually Chr(78) and some more characters are switched. And this only happens after I have changed the default ANSI encoding of my system. The code worked perfectly before.

So now instead of Chr(h) use ChrW(h) to build a string and instead of System.Text.Encoding.Default.GetBytes(Data) I use a loop like this:
Dim i As Integer
Dim resarr(Data.Length) As Byte
For i = 0 To (Data.Length - 1)
resarr(i) = AscW(Data.Chars(i))
Next
Return resarr
to get the byte array.

So I am trying to understand, why there is a different byte value when converting this byte to char using Chr(byt) and then back to byte using Asc(ch) than it should be.

lebb
12-30-2004, 07:15 PM
You'll definitely need to use ChrW() instead of Chr() to build the string, but it seems to me that System.Text.Encoding.Default.GetBytes should still work. If not, have you tried System.Text.Encoding.Unicode.GetBytes instead?

Slow-Mo
12-31-2004, 02:40 AM
Yes I have tried but .Unicode.GetBytes returns byte array that is twice as big as it should be because every character is converted to two bytes as in unicode encoding.

I'll try to test the code on some other machines and on my pc in other default codepages to see what happens.
Or maybe I should get rid of Chr and ChrW functions and build a byte array instead of string and only then convert a whole array to string. I'll test if that works.

Mike Rosenblum
12-31-2004, 09:00 AM
It sounds like the Encoding that you want, then, is 'Text.Encoding.ASCII'.

To what, may I ask, is your current 'System.Text.Encoding.Default' set?

Iceplug
12-31-2004, 08:54 PM
How about using Convert.ToChar and Convert.ToInt16 instead of Chr and Asc for a full .NET solution? :)
Or, try .UTF8 instead of Default.

lebb
12-31-2004, 09:03 PM
Whoops, thanks for pointing out the obvious, Iceplug. :o

Slow-Mo
01-01-2005, 07:52 PM
Still no luck :(
Convert.ToChar does exactly as chrW and
System.Text.Encoding.Default.GetBytes(convert.ToChar(209)) returns 78
System.Text.Encoding.UTF8.GetBytes(convert.ToChar(209)) returns two bytes 195 and 145
System.Text.Encoding.ASCII.GetBytes(convert.ToChar(209)) returns 63 because 209 is upper ASCII

As I now understand, string variables in VB are unicode and that's why I have this problem. When converting unicode string to byte array, a convention table is used so some characters change their code.

Is there some kind of non-unicode string variable in VB just like in good old C/C++, where string was treated as an array of bytes?

I now use chrW to append a character to a string and a loop with ascW to get character codes back from the built string. The code works fine, but I would rather use System.Text.Encoding.Default.GetBytes() instead of the loop I have created. I have run out of ideas.

Mike Rosenblum
01-01-2005, 09:28 PM
Hmmm... no you are doing it right. It *should* be working. It VB6 uses 8 bit extended ASCII, using Chars 0 through 255. Surprising (and a bit distressing) is that .Net seems to be using 7-bit ASCII. Anything above 127 seems to be mapped to Char #63.

And what is Char #63?

Well, it's the "?". :(

To prove it, I ran this in VB.Net: Sub Tester_DotNet()
Dim str As String = "Hello_" & Convert.ToChar(209)
Dim aryString As String
Dim bytes() As Byte

bytes = System.Text.Encoding.ASCII.GetBytes(str)
aryString = "{"
For i As Integer = 0 To bytes.Length - 1
aryString &= bytes(i).ToString
If i < bytes.Length - 1 Then aryString &= ", "
Next i
aryString &= "}"

MessageBox.Show(aryString) ' <-- Returns {72, 101, 108, 111, 95, 63}
End Sub Notice that the final character is getting mapped to #63 (which you had already shown us.) But then running the equivalent in VB6 gives the correct result:Sub Tester()
Dim str As String
Dim aryString As String
Dim bytes() As Byte
Dim i As Integer

str = "Hello_" & Chr(209)
str = VBA.StrConv(str, vbFromUnicode)
bytes = str

aryString = "{"
For i = LBound(bytes) To UBound(bytes)
aryString = aryString & CStr(bytes(i))
If i < UBound(bytes) Then aryString = aryString & ", "
Next i
aryString = aryString & "}"

MsgBox aryString ' <-- Returns {72, 101, 108, 111, 95, 209}
End Sub Notice that this time Char 209 remains 209.

So how the heck do we do this in .Net? I don't know. I tried ASCII, UTF7, UDF8 and all failed. You could, of course, use Unicode and just ignore every other byte (those being zero).

The other way to deal with it is to loop through a Character Array, instead of a Byte Array. You are effectively working in Unicode in this manner, but you do not really have to worry about the unused byte this way. Well, you don't have to worry about it as long as you don't go over 255. Use Mod 255 to protect yourself. The following will add one two each character, including correctly changing Char 209 to be Char 210:Dim str As String = "Hello_" & Convert.ToChar(209)
Dim chars() As Char = str.ToCharArray
For i As Integer = 0 To chars.Length - 1
chars(i) = Convert.ToChar((Convert.ToInt32(chars(i)) + 1) Mod 255)
Next i
str = chars
MessageBox.Show (str) ' <-- Returns "Ifmmp`"
Ok, the final result "Ifmmp`" looks pretty funny, but it is the result you get when you add one to each character in the string "Hello_". (Yes, char 209 is "".)

Hopefully someone else can do better. I can't help but feel (as you do) that there must be some sort of String --> Byte encoding style that works correctly. But, alas, I could not figure that out. Iterating through the Character array is the best that I could come up with.

-- Mike

Iceplug
01-02-2005, 09:47 AM
Well, apparently none of the conversions are obvious.

I tried out this example:

Dim B As Byte() = {255, 254, 255, 254}
Dim S As String = System.Text.Encoding.UTF7.GetString(B)
MessageBox.Show(S)
Dim C As Byte() = System.Text.UTF7Encoding.UTF7.GetBytes(S)
If C(0) = 255 Then
MessageBox.Show("Success.")
Where, of course, 255 is and 254 is :p
The conversion to string goes well: is the result.
The oddities happen when this string is converted back to an array of bytes...
where C now contains 13 elements!
None of these elements are any higher than 128, for some reason.
C = {43, 65, 80, 56, 65, 47, 103, 68, 47, 65, 80, 52, 45}

Surprisingly, this same byte array converts to also!

None of this stuff happens when I use .Default instead of UTF7. :(

Mike Rosenblum
01-02-2005, 09:55 AM
Yeah, that's pretty crazy. And I also just tested using .Default, which seems to use 8-bit ASCII and works 100% fine.

Yet none of the choices seem to be 8-bit ASCII? Only 7-bit ASCII and Unicode. So what is "Default" actually using? And how does one change it? (And more importantly, if you change it, how do you change it back!?!) Slow-Mo, do you know what code was used to change the PC'd default encoding?

Slow-Mo
01-02-2005, 02:41 PM
".Default" is using system's ANSI codepage. It uses character convention table for high-ascii characters as the character can be in more than one codepage, but with different character code.
To use different codepage System.Text.Encoding.GetEncoding() should be used. Very useful when for example reading a text file created in some DOS codepage and saving it to unicode file.

I finally managed to append a character with a code I want to a string, but I'm not sure if that's the best way to do it.
Here's what I have done:

Dim oneByte(0) As Byte
oneByte(0) = 209
str &= System.Text.Encoding.Default.GetString(oneByte))

And now System.Text.Encoding.Default.GetBytes(str)) returns correct array of bytes. So does asc function.

I don't know why, but chr(h) and Convert.ToChar(h) are screwing things up, so this is a workaround for that.

EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum