The CoCreateGUID API returns a 128 bit unique identifier in this type:
Private Type GUID
Data1 As Long
Data2 As Integer
Data3 As Integer
Data4(7) As Byte
End Type
There are code examples out there on how to turn this number into a CLSID (32 characters of Base16 Hex (0-9a-f)). It seems to me that we could represent this in less characters and still keep the result usable as a filename (for example) by converting to Base36 (i.e. 0-9a-z). Do any of you have any idea how I might go about this?
Thanks in advance!
JPB
loquin 02-20-2008, 11:10 PM convert it to a series of longs.
Each (8 hex character) long can be converted to a 7 character base36 character string. (36^6 = 2176782336, but 2^32 = 4294967296)
Oddly enough, if you break the characters down to a series of integer, you can guarantee that the 4 hex characters can be represented by 3 base36 characters.
Plus, by first converting to an array of integer, you can modify the functions I posted here for base 26 and base34 conversion. Do a site search, using Base34 as the search criteria. (Base34 is used more than base36, as the I and O are easily confused with 1 and 0...)
Hi loquin, thanks for the reply.
I've been able to cobble something together to convert GUIDs to Base32 (based on your code examples in another thread). I chose Base32 because I could then eliminate 0, 1, I and O from the list of allowable characters, so there would never be any ambiguity. I now realize though that there is still potential for problems because Base32 still includes vowels...meaning you could end up with a potentially embarrassing situation when a user is reading out a code :)
Using base32 means that the maximum string length for the GUID is now 16 characters instead of 32 (if my calculations/experiments are correct). Hey, we just doubled our GUID input/output efficiency :). In order to make this work, here's what I did:
1) Call CoCreateGUID with an array of 8 Integers
2) Since the returned Integers could be negative, I convert each pair of Integers to an unsigned currency value (which will have a maximum value of an unsigned long) so that they would work with Mod
3) Using your code to convert bases, I then convert the unsigned currency value to a characters from the base32 string.
4) Return the result.
Now I hope the unsigned conversions are the correct approach, but I am not entirely sure. Here is the code I am using:
Option Explicit
' Base Conversion code based on code by loquin at http://www.xtremevbtalk.com forum
Private Type GUID
Ints(7) As Integer
End Type
Private Declare Sub CoCreateGuid Lib "ole32.dll" (ByRef pguid As GUID)
Private Const mcstrReadableBaseChars As String = "23456789ABCDEFGHJKLMNPQRSTUVWXYZ"
Public Function BaseChr(pBaseChars As String, ByVal pChar As Long) As String
' This function returns the Base34 Conversion conversion of a Base10 number
Dim N As Integer
pChar = pChar Mod Len(pBaseChars) ' Make sure erronious data won't hose you
BaseChr = Mid$(pBaseChars, pChar + 1, 1)
End Function
Private Function GetReadableGUID() As String
Dim udtGUID As GUID ' Integer array for the Globally Unique ID
Dim i As Long ' Just a counter
Dim strFileBase As String ' The GUID represented in legal file characters
Dim strReadableBase As String ' The GUID represented in readable characters (2-9A-HJ-NP-Z)
Dim lngMod As Long ' Remainder of the base calculations
Dim curBase10(3) As Currency ' To store unsigned integer pairs to simulate unsigned longs
CoCreateGuid udtGUID ' Get a Globally Unique ID in an Integer array
For i = 0 To 3
' Convert integer pair to unsigned long
' Store in a currency array element because it has enough
' room for all usigned longs
curBase10(i) = UnsignedInt(udtGUID.Ints(i * 2))
curBase10(i) = curBase10(i) + UnsignedInt(udtGUID.Ints((i * 2) + 1))
' Convert the unsigned long to readable base
Do While curBase10(i) > 0
lngMod = curBase10(i) Mod Len(mcstrReadableBaseChars)
curBase10(i) = curBase10(i) \ Len(mcstrReadableBaseChars)
strFileBase = BaseChr(mcstrReadableBaseChars, lngMod) & strFileBase
Loop
Next i
GetReadableGUID = strFileBase
End Function
Private Function UnsignedInt(ByVal pSignedInt As Integer) As Long
' Convert signed integer (16-bit) to unsigned long (32-bit)
If pSignedInt < 0 Then
UnsignedInt = pSignedInt + 65536
Else
UnsignedInt = pSignedInt
End If
End Function
Private Sub Command1_Click()
MsgBox GetReadableGUID
End Sub
Now this is great for readable GUIDs, but I've been thinking it would be great to be able to convert the GUIDs to an even shorter representation. This representation would be for storing unique filenames that the user would never have to interact with. As such, the requirements would only be that the available base characters be usable in a Windows filename. Here's a routine I use to get these characters:
Private Function IllegalChars() As String
Static sstrIllegal As String
Dim i As Long
If LenB(sstrIllegal) = 0 Then
' Generate string of unprintable characters & space (so no leading or trailing spaces)
For i = 0 To 32
sstrIllegal = sstrIllegal & Chr$(i)
Next i
sstrIllegal = sstrIllegal & Chr$(127) ' Delete character (unprintable)
' Unusable/reserved and problematic filename characters
sstrIllegal = sstrIllegal & "?[]/\=+<>:;"",.*|^"
End If
IllegalChars = sstrIllegal
End Function
Private Function FileBaseChars() As String
Static sstrFileBaseChars As String
Dim i As Long
If LenB(sstrFileBaseChars) = 0 Then
For i = 0 To 255
If InStr(1, IllegalChars, Chr$(i)) < 1 Then
sstrFileBaseChars = sstrFileBaseChars & Chr$(i)
End If
Next i
End If
FileBaseChars = sstrFileBaseChars
End Function
Note that I don't allow some legal characters such as [Space] and "." to avoid potential problems (like leading and trailing spaces, bad code for extensions). What I end up with are 205 characters (or base205). Unfortunately, by representing the GUID as longs, I don't think I am getting the most out of the routine to convert bases. I *think* i could use CDec with a variant to get large unsigned numbers, but they are only 96-bit (and I don't know how they work with Mod) so I don't know if this is the best approach anyway. Just wondering if you had any more insights while I continue experimenting?
Thanks again,
JPB
Ok, I think there is a problem with my code to convert 2 unsigned integers to a currency....it's only adding them together which won't be using the full range of 32 bit numbers...I guess I should be doing something like LoWord/HiWord conversion?
Now that I am using properly converted unsigned longs in a currency variable, I am getting an overflow on Mod...I guess it only supports up to signed long values...Back to the drawing board!
loquin 02-21-2008, 12:57 PM Yes- that was the real issue. IF you handle the data an integer at a time, you end up with a "compression" rate of 75% - that is, the resulting string is 75% the length of a hexadecimal string. If you implemented a arbritary number length math library, you could achieve the 50% compaction that you describe.
IMO, so long as you've clearly defined which of the two characters that you are not using, base34 is fine. The default is dropping the letters I and O. (It actually pre-dates the computer era.)
Hi loquin, thanks again for your help. I have been able to find some arbitrary math routine online by a programmer name Andrija Radovic. I've still can't seem to get the compression rate higher for Base34, so I think I'm missing something.
What I am doing is getting the GUID as a 16 element byte array. I then use the Base conversion code I found to convert the base 256 byte array to a base34 number. The result is a 25-26 character string. I was just wondering if you had any more advice on how I might go about compacting the result further?
Here's the code I am using...GetGUIDString(basecharsReadableAll) returns the GUID in Base34.
Option Explicit
Private Type GUIDBytes
Data(15) As Byte
End Type
Public Enum BaseChars
' Standard bases character groups
basecharsBin = 2
basecharsOct = 8
basecharsDec = 10
basecharsHex = 16
' Special bases character groups
' Note: enum value does not match number of chars in special bases.
basecharsAll = 255
basecharsReadableSafe = 256
basecharsReadableAll = 257
basecharsFilename = 258
End Enum
Private Declare Function CoCreateGuid Lib "ole32.dll" (ByRef pGuid As Any) As Long
Private Const mcstrStandardBaseChars As String = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Public Function GetBaseChars(pBase As BaseChars) As String
Dim strBaseChars As String
Dim i As Long
Select Case pBase
Case 2 To 36
strBaseChars = Left$(mcstrStandardBaseChars, pBase)
Case basecharsReadableSafe
' No 0 or 1 to prevent confusion with O and I
' No vowels to prevent obscene words
strBaseChars = "23456789BCDFGHJKLMNPQRSTVWXYZ"
Case basecharsReadableAll
' No 0 or 1 to prevent confusion with O and I
strBaseChars = "23456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Case basecharsFilename
' No illegal or problematic characters.
' No uppercase characters
strBaseChars = GetFileBaseChars
Case basecharsAll
strBaseChars = GetAllChars
Case Else
Err.Raise 5
End Select
GetBaseChars = strBaseChars
End Function
Public Function GetAllChars() As String
Dim i As Long
Static strAllChars As String
If LenB(strAllChars) = 0 Then
For i = 0 To 255
strAllChars = strAllChars & Chr$(i)
Next i
End If
GetAllChars = strAllChars
End Function
Public Function GetFileBaseChars() As String
Static sstrFileBaseChars As String
Dim i As Long
If LenB(sstrFileBaseChars) = 0 Then
For i = 0 To 255
If InStr(1, IllegalFilenameChars, Chr$(i)) < 1 Then
sstrFileBaseChars = sstrFileBaseChars & Chr$(i)
End If
Next i
End If
GetFileBaseChars = sstrFileBaseChars
End Function
Private Function IllegalFilenameChars() As String
Static sstrIllegal As String
Dim i As Long
If LenB(sstrIllegal) = 0 Then
' Generate string of unprintable characters & space (so no leading or trailing spaces)
For i = 0 To 32
sstrIllegal = sstrIllegal & Chr$(i)
Next i
For i = 97 To 117
' a-z
sstrIllegal = sstrIllegal & Chr$(i)
Next i
For i = 127 To 255
sstrIllegal = sstrIllegal & Chr$(i) ' Delete character (unprintable)
Next i
' Unusable/reserved and problematic filename characters
sstrIllegal = sstrIllegal & "?[]/\=+<>:;"",.*|^"
End If
IllegalFilenameChars = sstrIllegal
End Function
Public Function GetGUIDString(ByVal pBase As BaseChars) As String
Dim udtGUID As GUIDBytes
Dim strGUID As String
Dim i As Long
If (CoCreateGuid(udtGUID) = 0) Then
GetGUIDString = BaseToBase(StrConv(udtGUID.Data, vbUnicode), GetBaseChars(basecharsAll), GetBaseChars(pBase))
End If
End Function
Function BaseToBase(ByVal pOrigNumber As String, ByVal pOrigBaseChars As String, ByVal pDestBaseChars As String) As String
'Author: Andrija Radovic, with minor modifications by Jason Peter Brown
Dim sg As String, s As String, m As String, i As Long, ppp As String
ReDim n(0 To Len(pOrigBaseChars)) As String
'If Left$(pOrigNumber, 1) = "-" Then
' sg = "-"
' pOrigNumber = Mid$(pOrigNumber, 2)
'Else
' sg = ""
'End If
'a = NoLead0(UCase$(a), da)
m = Mid$(pDestBaseChars, 2, 1)
s = Left$(pDestBaseChars, 1)
n(0) = s
For i = 1 To Len(pOrigBaseChars)
n(i) = ADDS(m, n(i - 1), pDestBaseChars)
Next
For i = Len(pOrigNumber) To 2 Step -1
s = ADDS(s, MULS(n(InStr(pOrigBaseChars, Mid$(pOrigNumber, i, 1)) - 1), m, pDestBaseChars), pDestBaseChars)
m = MULS(n(Len(pOrigBaseChars)), m, pDestBaseChars)
Next
BaseToBase = sg & ADDS(s, MULS(n(InStr(pOrigBaseChars, Mid$(pOrigNumber, i, 1)) - 1), m, pDestBaseChars), pDestBaseChars)
End Function
Function ADDS(ByVal b As String, ByVal c As String, d As String) As String
'Author: Andrija Radovic
Dim n As Long, of As Long, i As Long, f As Long, f1 As Long, a As String, ppp As String
n = Len(d)
of = Len(c) - Len(b)
If of < 0 Then
SWAP b, c
of = -of
End If
a = ""
f = 0
For i = Len(b) To 1 Step -1
f1 = f + InStr(d, Mid$(b, i, 1)) + InStr(d, Mid$(c, of + i, 1)) - 2
If f1 >= n Then
f = 1
f1 = f1 - n
Else
f = 0
End If
a = Mid$(d, 1 + f1, 1) & a
Next
If of Then
For i = of To 1 Step -1
If f Then
f1 = f + InStr(d, Mid$(c, i, 1)) - 1
If f1 >= n Then
f = 1
f1 = f1 - n
Else
f = 0
End If
a = Mid$(d, 1 + f1, 1) & a
Else
a = Mid$(c, 1, i) & a
Exit For
End If
Next
End If
If f Then ADDS = Mid$(d, 2, 1) & a Else ADDS = a
End Function
Function MULS(ByVal b As String, ByVal c As String, d As String) As String
'Author: Andrija Radovic
Dim i As Long, j As Long, n As Long, f As Long, f1 As Long
Dim m As Long, a As String, p As String, nul As String
n = Len(d)
If Len(b) > Len(c) Then SWAP b, c
a = ""
nul = ""
For i = Len(b) To 1 Step -1
m = InStr(d, Mid$(b, i, 1)) - 1
p = ""
f = 0
For j = Len(c) To 1 Step -1
f1 = f + m * (InStr(d, Mid$(c, j, 1)) - 1)
If f1 >= n Then
f = f1 \ n
f1 = f1 Mod n
Else
f = 0
End If
p = Mid$(d, 1 + f1, 1) + p
Next
If f Then p = Mid$(d, 1 + f, 1) + p
p = p + nul
nul = nul + Left$(d, 1)
a = ADDS(a, p, d)
Next
MULS = a
End Function
Sub SWAP(a As Variant, b As Variant)
'Author: Andrija Radovic
Dim c As Variant
c = a
a = b
b = c
End Sub
Thanks again for your insights...
I guess that 26-27 characters is the best I can hope for for Base34... I found this on Wikipedia about base 64 encoding to 22ish characters:
When printing fewer characters is desired, GUIDs are sometimes encoded into a base64 string of 22 to 24 characters (depending on padding). For instance:
7QDBkvCA1+B9K/U0vrQx1A
So I can't imagine base34 getting any smaller!
|