String Manipulation Question

JDYoder
02-28-2005, 02:34 PM
Does anyone have function, where if I would pass it any of these strings, I'd get the resulting output...

VendorID... Vendor ID
ThisIsAField... This Is A Field
HowAbout888... How About 888
76WasAGoodYear... 76 Was A Good Year
WhoI55Left... Who I 55 Left
ILoveNASASpace... I Love NASA Space
TheUSA2005Program... The USA 2005 Program

In other words, it puts logical spaces before words, with new words defined as starting with a capital letter or a numeric character. The rest of the word is then either all lowercase letters (until reach non lowercase letter), all numeric characters (until hit non-numeric), or all capital letters (until you hit a numeric character or the first capital letter of a new word).

To the eye it's easy to see, but coding it is another thing. I thought there'd be a fairly elegant answer, but I'm not seeing it. Any ideas?

OnErr0r
02-28-2005, 02:40 PM
Loop through one letter at a time and if the letter is capital or number, then insert a space. When a number is encountered you will need to go into a second loop (or function) to find the next letter, then return back to the outer loop. There is no pre-written function to do this.

meteo
02-28-2005, 02:57 PM
Loop through one letter at a time and if the letter is capital or number, then insert a space. When a number is encountered you will need to go into a second loop (or function) to find the next letter, then return back to the outer loop. There is no pre-written function to do this.

This is going to be your best bet for sure, but won't take care of all problems (just look at what it would do with your NASA example.... NASA = N A S A). This is some pretty high end logic you're looking for and will take time to make perfect, or even effecient for that matter.

waits77
02-28-2005, 02:58 PM
I think this would be pretty easy with the .Execute method and .SubMatches of Regular Expressions. The pattern would be something like:
(\d+)|([A-Z]+)|([A-Z][a-z]+)
The data matching the patterns would be stored in a collection of match objects that you could then loop through and build strings by placing spaces between the elements. Check out:
http://www.xtremevbtalk.com/showthread.php?t=203791

JDYoder
02-28-2005, 03:18 PM
OnError, I think you tried to oversimplify the solution as Meteo touched upon. I know there isn't a pre-written function in VB, but I hoped someone might have already done this. Again, it seems like there should a fairly elegant solution, but maybe not.

Waits77, what you mentioned seems different than what I'm talking about. Maybe not, but I'm not quite seeing it.

tomc2506
02-28-2005, 03:38 PM
Just a little For loop..

Function StringManip(strString as string) as string
Dim x As Integer, blFlag As Boolean, strFinal As String, Number As Boolean
For x = 1 To Len(strString)
If (Asc(Mid(strString, x, 1)) >= Asc("A") And Asc(Mid(strString, x, 1)) <= Asc("Z")) Then
If blFlag = True And Number = False Then
strFinal = strFinal & Mid(strString, x, 1)
Else
blFlag = True
Number = False
strFinal = strFinal & " " & Mid(strString, x, 1)
End If
ElseIf (Asc(Mid(strString, x, 1)) >= Asc("0") And Asc(Mid(strString, x, 1)) <= Asc("9")) Then
If blFlag = True And Number = True Then
strFinal = strFinal & Mid(strString, x, 1)
Else
blFlag = True
Number = True
strFinal = strFinal & " " & Mid(strString, x, 1)
End If

Else
blFlag = False
strFinal = strFinal & Mid(strString, x, 1)
End If

Next
StringManip = Trim(strFinal)
End Function

stevo
02-28-2005, 03:47 PM
i would love to see a solution to this myself because its looks pretty impossible to determine if "I" or "A" are part of an abbreviation or not.

JDYoder
02-28-2005, 03:56 PM
Not bad, tomc2506, but it doesn't account for single characters. Example...

ILoveNASASpace should be "I Love NASA Space" rather than "ILove NASASpace"

Nor does it account for lowercase letters after a "numeric word." Example...

Love8you should be "Love 8 you" rather than "Love 8you"

waits77
02-28-2005, 04:19 PM
Waits77, what you mentioned seems different than what I'm talking about. Maybe not, but I'm not quite seeing it.

I don't have VB here to test it, but I made a few edits to the example from the tutorial:
Option Explicit

Const FORMAT_1 = "(\d+)|([A-Z]+)|([A-Z][a-z]+)"
Const FORMAT_2 = "CIDist[^\n]+ +dx= +([\d\.\-]+)[,\.] +dy= +([\d\.\-]+)[,\.] +dz= +([\d\.\-]+)[,\.]"
Const FORMAT_3 = " +X +([\d\.\-]+)\n +Y +([\d\.\-]+)\n +Z +([\d\.\-]+)\n(?:[^\n]{0,}\n){4}[^\n]+ RESULT [A-Z\d]{5}"

'Private Type dblPoint
'X As Double
'Y As Double
'Z As Double
'End Type

'Dim Points() As dblPoint

Dim regXYZ As RegExp

Private Sub Command1_Click()
Dim intFNum As Integer
Dim intFlag As Integer
Dim i As Long
dim j As Integer
Dim strFName As String
Dim strFile As String
Dim strTemp As String
Dim varXYZ As Variant

CommonDialog1.ShowOpen ' get the filename
strFName = CommonDialog1.FileName
intFNum = FreeFile ' next available file number
Open strFName For Binary As #intFNum
strFile = Space(LOF(intFNum)) ' size the string to hold the file
Get #intFNum, , strFile ' read the file
Close intFNum

Set regXYZ = New RegExp ' establish the expression
With regXYZ
.Pattern = FORMAT_3 ' set the .Pattern
If .Test(strFile) Then ' test to see if the pattern is present
' got a match so FORMAT_3 is the pattern
intFlag = 1
Else
.Pattern = FORMAT_2
If .Test(strFile) Then
' got a match so FORMAT_2 is the pattern
intFlag = 1
Else
.Pattern = FORMAT_1
If .Test(strFile) Then
' got a match so FORMAT_1 is the pattern
intFlag = 1
Else
MsgBox "Did not match pattern."
intFlag = 0 ' not needed, but reads better
End If
End If
End If
.Global = True ' defaults to looking for first match only
End With
If intFlag Then ' if we got a pattern match
Set varXYZ = regXYZ.Execute(strFile) ' extract the data
Set regXYZ = Nothing ' reclaim the object
'ReDim Points(varXYZ.Count - 1) ' size the array to hold the data
For i = 0 To varXYZ.Count - 1 ' loop through and assign array values
strTemp = ""
For j = 0 To varXYZ(i).SubMatches.Count
strTemp = strTemp & CStr(varXYZ(i).SubMatches(j)) & " "
Next j
strTemp = Left(strTemp, Len(strTemp) - 1)
Debug.Print strTemp
'With varXYZ(i)
' Debug.Print .SubMatches(0)
' Debug.Print .SubMatches(1)
' Debug.Print .SubMatches(2)
'Points(i).X = CDbl(.SubMatches(0))
'Points(i).Y = CDbl(.SubMatches(1))
'Points(i).Z = CDbl(.SubMatches(2))
'End With
Next i
MsgBox "Done with " & CStr(varXYZ.Count) & " matches."
Set varXYZ = Nothing ' reclaim the object
End If
End Sub
Place your text in a file and try this.

OnErr0r
02-28-2005, 04:50 PM
I really didn't notice NASA and USA in your example. That just adds a little more complexity in checking for multiple upper case letters in a row.

OnErr0r
02-28-2005, 05:41 PM
So, now that you have several possible solutions, or at least good starts. Let's work on helping you with your code and not writing it for you, JDCoder.

waits77
02-28-2005, 06:41 PM
Turned out to be a little more trouble than I thought, but
Option Explicit

Const FORMAT_1 = "(?:([A-Z]{1,})([A-Z][a-z]{1,}))|(?:\d{1,})|(?:[A-Z][a-z]{1,})|(?:[A-Z]{1,})"

Dim regXYZ As RegExp

Private Sub Command1_Click()
Dim intFNum As Integer
Dim i As Long
Dim j As Long
Dim strFName As String
Dim strFile As String
Dim strFileArray() As String
Dim strTemp As String
Dim varXYZ As Variant

CommonDialog1.ShowOpen ' get the filename
strFName = CommonDialog1.FileName
intFNum = FreeFile ' next available file number
Open strFName For Binary As #intFNum
strFile = Space(LOF(intFNum)) ' size the string to hold the file
Get #intFNum, , strFile ' read the file
Close intFNum

strFileArray = Split(strFile, vbCrLf)

Set regXYZ = New RegExp ' establish the expression
regXYZ.Pattern = FORMAT_1
regXYZ.Global = True ' defaults to looking for first match only
For j = 0 To UBound(strFileArray)
Set varXYZ = regXYZ.Execute(strFileArray(j)) ' extract the data
strTemp = ""
For i = 0 To varXYZ.Count - 1
If varXYZ(i).SubMatches(0) <> "" Then
strTemp = strTemp & varXYZ(i).SubMatches(0) & " " & varXYZ(i).SubMatches(1) & " "
Else
strTemp = strTemp & varXYZ(i) & " "
End If
Next i
strTemp = Left(strTemp, Len(strTemp) - 1)
MsgBox strTemp
Next j
Set varXYZ = Nothing ' reclaim the object
Set regXYZ = Nothing ' reclaim the object
End Sub

Vendor ID
This Is A Field
How About 888
76 Was A Good Year
Who I 55 Left
I Love NASA Space
The USA 2005 Program

I keep forgetting that this forum's editor trashes some expressions. See the attached for the proper expression.

MikeJ
02-28-2005, 06:52 PM
If you disable smilies in your post, you won't have that problem.

stevo
03-01-2005, 12:47 AM
good job waits77 but back to my point in #7. what if the line is something like this.

I Managed To Get A GCSE
IManagedToGetAGCSE

JDYoder
03-01-2005, 03:54 AM
waits77 -- Good stuff, though I can't say I completely understand what's going on since I've not messed with those particular functions, but it appears to be some powerful stuff. However, it didn't work for "Love8you" to be "Love 8 you" (a lowercase word starting after a numeric word).

Stevo -- a string like "IManagedToGetAGCSE" would actually be "I Managed To Get AGCSE" since I'm not taking real words like "A" into account. By my rules in the first post, "AGCSE" would be it's own word.

waits77
03-01-2005, 06:09 AM
waits77 -- Good stuff, though I can't say I completely understand what's going on since I've not messed with those particular functions, but it appears to be some powerful stuff. However, it didn't work for "Love8you" to be "Love 8 you" (a lowercase word starting after a numeric word).

Thanks. I got the "Love 8 you", but I think you need a vocabulary to get I Managed To Get A GCSE
Option Explicit

Const FORMAT_1 = "(?:([A-Z]+)([A-Z][a-z]+))|(?:(\d+)([a-z]+))|(?:\d+)|(?:[A-Z][a-z]+)|(?:[A-Z]+)"

Dim regXYZ As RegExp

Private Sub Command1_Click()
Dim intFNum As Integer
Dim i As Long
Dim j As Long
Dim k As Long
Dim strFName As String
Dim strFile As String
Dim strFileArray() As String
Dim strTemp As String
Dim varXYZ As Variant

CommonDialog1.ShowOpen ' get the filename
strFName = CommonDialog1.FileName
intFNum = FreeFile ' next available file number
Open strFName For Binary As #intFNum
strFile = Space(LOF(intFNum)) ' size the string to hold the file
Get #intFNum, , strFile ' read the file
Close intFNum

strFileArray = Split(strFile, vbCrLf)

Set regXYZ = New RegExp ' establish the expression
regXYZ.Pattern = FORMAT_1
regXYZ.Global = True ' defaults to looking for first match only
For j = 0 To UBound(strFileArray)
Set varXYZ = regXYZ.Execute(strFileArray(j)) ' extract the data
strTemp = ""
For i = 0 To varXYZ.Count - 1
For k = 0 To varXYZ(i).SubMatches.Count - 1
Debug.Print varXYZ(i).SubMatches(k), k
Next k
If varXYZ(i).SubMatches(0) <> "" Then
strTemp = strTemp & varXYZ(i).SubMatches(0) & " " & varXYZ(i).SubMatches(1) & " "
ElseIf varXYZ(i).SubMatches(2) <> "" Then
strTemp = strTemp & varXYZ(i).SubMatches(2) & " " & varXYZ(i).SubMatches(3) & " "
Else
strTemp = strTemp & varXYZ(i) & " "
End If
Next i
strTemp = Left(strTemp, Len(strTemp) - 1)
MsgBox strTemp
Next j
Set varXYZ = Nothing ' reclaim the object
Set regXYZ = Nothing ' reclaim the object
End Sub

Thanks MikeJ for the tip about smileys.

JDYoder
03-01-2005, 07:51 AM
Thanks waits77. That seems to work. This morning I dabbled with a different version than the one I'd been working on. I think my new one works, but it's quite long and semi-ugly (involving two functions and a nested Select Case). If I run thru 1000 iterations, mine's much faster, but I think I'll use yours since it's much more compact and elegant, and because speed's not an issue with only the few calls I'll be making.

Now I just need to figure out what you did since that's all new to me. Thanks again.

waits77
03-01-2005, 08:56 AM
Now I just need to figure out what you did since that's all new to me. Thanks again.
I had a little trouble with this one too.
"(?:([A-Z]+)([A-Z][a-z]+))|(?:(\d+)([a-z]+))|(?:\d+)|(?:[A-Z][a-z]+)|(?:[A-Z]+)"
when looking at this pattern, note that "()" are used both to group items and to delineate SubMatches. "(?:)" is used to group things without establishing a SubMatch. Also, "|" is the logical operator "Or". This pattern is actually 5 patterns each seperated by an "Or". The first 2 patterns have 2 submatches each. The patterns will be tested in the order shown from left to right. Here's where I got confused a little: The submatches are numbered 0 through n starting from the left independent of the "Or"s. Everytime any of the expressions between "Or"s are matched, we get 4 submatches. When the 3 rightmost patterns are matched, all the submatches are empty. When the leftmost pattern matches, submatches (0) & (1) are filled, but (2) & (3) are empty. When the second pattern is matched, (0) & (1) are empty, but (2) & (3) are filled.

The first pattern checks for the "ABCDelete" condition. Note that the pattern has two SubMatches: the first is 1 or more upper case letters, the second is an upper case letter followed by 1 or more lower case letters. So the Match would be "ABDDelete", SubMatch(0) = "ABC" and SubMatch(1) = "Delete"

The second pattern checks for the "8love" condition. Once again, it has two SubMatches: the first is 1 or more digits, the second 1 or more lower case letters. So the Match would be "8love", SubMatch(2) = "8" and SubMatch(3) = "love" Note that the index number is not relative to the match, but to the entire .Pattern.

The third pattern checks for "888", but only after the second pattern checked for "8love".

The fouth pattern checks for "Left", but only after the first pattern checks for "NASASpace". So it would not find "Space" but would find "Left"

The last pattern checks for "USA", but like the fourth pattern, only after the string has been checked for "NASASpace".

EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum