 |
|

01-24-2003, 07:58 PM
|
 |
Codeaholic
Preferred language: Retired Leader * Guru *
|
|
Join Date: Oct 2002
Location: Winnipeg, MB, Canada
Posts: 4,543
|
|
Sorting Algorithm
|
|
I got bored at work, so I decided to implement a sorting algorithm in VB6 and see how fast I could get it running. I actually implemented a whole bunch of common sorting algorithms, and tried to optimize them all. My optimized version of QuickSort turned out to be the fastest by a long shot. I pasted the code below, I was hoping somebody could try and optimize it a bit more, or come up with a sort algorithm that is faster. I am attempting to sort a large array of random integers.
Code:
Public Sub QuickSort(ByRef MyArray() As Long, ByVal lngLBound As Long, ByVal lngUBound As Long)
Dim lngPivot As Long
Dim k As Long
Dim p As Long
Dim lngTemp As Long
If lngLBound >= lngUBound Then
Exit Sub
End If
k = lngLBound + 1
If lngUBound = k Then
If MyArray(lngLBound) > MyArray(lngUBound) Then
'swap MyArray(lngLBound) and MyArray(lngUBound)
lngTemp = MyArray(lngLBound)
MyArray(lngLBound) = MyArray(lngUBound)
MyArray(lngUBound) = lngTemp
End If
Exit Sub
End If
lngPivot = MyArray(lngLBound)
p = lngUBound
Do Until (MyArray(k) > lngPivot) Or (k >= lngUBound)
k = k + 1
Loop
Do Until MyArray(p) <= lngPivot
p = p - 1
Loop
Do While k < p
'swap MyArray(k) and MyArray(p)
lngTemp = MyArray(k)
MyArray(k) = MyArray(p)
MyArray(p) = lngTemp
Do
k = k + 1
Loop Until MyArray(k) > lngPivot
Do
p = p - 1
Loop Until MyArray(p) <= lngPivot
Loop
'swap MyArray(p) and MyArray(lngLBound)
lngTemp = MyArray(p)
MyArray(p) = MyArray(lngLBound)
MyArray(lngLBound) = lngTemp
QuickSort MyArray, lngLBound, p - 1
QuickSort MyArray, p + 1, lngUBound
End Sub
The times I got for an array of 100,000 random integers (on a PII 233 computer) are:
Selection Sort: 7min 18sec
Insertion Sort: 7min 19sec
Bubble Sort: 15min 55sec
Merge Sort: 3min 28sec
Quick Sort: 0.2sec
|
|

01-24-2003, 08:38 PM
|
|
Senior Contributor
Preferred language: * Expert *
|
|
Join Date: Jun 2001
Location: Illinois
Posts: 865
|
|
wow, this blows away the bubble sort.
I think the bubble sort was always somewhat popular because it was so simple and easy to understand.
diver
|
__________________
Sometimes I do not get notification of your post from the forum, which causes a delay. I am apologizing in advance!
|

01-24-2003, 08:55 PM
|
 |
Contributor
|
|
Join Date: Aug 2002
Location: Kansas
Posts: 502
|
|
|
|
pretty much everything blows away the bubble sort. If you want to get fired from your job quickly, then implement a bubble sort to sort 500,000 items
I have my own quicksort class that i wrote in c++, maybe I'll benchmark yours beside mine.
Edit: Can you give me the driver you used for this so I don't have to write one...I'm very lazy
|
__________________
"As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality."--Albert Einstein
Last edited by noRulez; 01-24-2003 at 09:04 PM.
|

01-24-2003, 09:03 PM
|
|
Promising Talent
Preferred language: Retired Moderator * Guru *
|
|
Join Date: May 2002
Location: Brussels
Posts: 3,598
|
|
The function really is fast for an array of random integers, but it seems to get a whole lot slower when the array is already sorted or almost sorted. Is that normal?
|
__________________
I do not endorse any advertisements that appear in my contribution and detest their placement against my will.
|

01-24-2003, 09:21 PM
|
 |
Jedi Coder
Preferred language: * Expert *
|
|
Join Date: Aug 2002
Location: Abingdon, MD
Posts: 3,438
|
|
Here's some code I had from an old book called Ready-to-Run VB Algorithms.
The CountingSort came out fastest at .16 secs for 100K items on my PIII 600.
|
|

01-24-2003, 10:19 PM
|
 |
Code Meister
Preferred language: Retired Moderator * Guru *
|
|
Join Date: Aug 2000
Location: Vancouver, BC, Canada
Posts: 10,441
|
|
The basic algorithm behind quicksort is to pick a pivot value, then split the array in two parts: those values ABOVE the pivot value and those values BELOW the pivot value.
Ideally, the array should be split into two sub arrays of the same size for efficiency. If the numbers are random, then odds are fairly good that any value will be near the median. If the numbers are sorted, or nearly sorted, then picking a RANDOM number in the array as a pivot value should work well too.
But it seems to me from a quick glance at the code, that you use a value at the end of the array as a pivot value. If the array is already sorted, this will be the worst case performance for Quicksort; ie every split will be one number on one side and all the rest on the other. It would then have performance about equal to insertion sort (which it would resemble).
|
__________________
"I have a plan so cunning you could put a tail on it and call it a weasel!" - Edmund Blackadder
|

01-25-2003, 07:01 AM
|
 |
Codeaholic
Preferred language: Retired Leader * Guru *
|
|
Join Date: Oct 2002
Location: Winnipeg, MB, Canada
Posts: 4,543
|
|
I was trying to squeeze some more speed out of my function, and I thought that by making the 4 local variables (lngPivot, k, p, lngTemp) Static it should speed up performance. Because then each time it recurses there should be 4 less stack allocations. However, it ended up being more than twice as slow. Does anybody know why it would be slower? I also tried making them public in a code module (.bas) and it had the same effect.
|
__________________
There are 10 types of people in this world, those that understand binary, and those that don't.
|

01-25-2003, 07:40 AM
|
 |
Lost Soul
Preferred language: Super Moderator * Guru *
|
|
Join Date: May 2001
Location: Vorlon
Posts: 17,571
|
|
The sort algo you haven't tested is the ShellSort algo.
This is algorithm is also fast when the data is already (almost) sorted, which gives performance problems with Quicksort.
Code:
Public Sub ShellSortNumbers(vArray() As Long)
Dim lLoop1 As Long
Dim lHold As Long
Dim lHValue As Long
Dim lTemp As Long
Dim lUBound As Long, lLBound As Long
lUBound = UBound(vArray)
lLBound = LBound(vArray)
lHValue = lLBound
Do
lHValue = 3 * lHValue + 1
Loop Until lHValue > lUBound
Do
lHValue = lHValue / 3
For lLoop1 = lHValue + lLBound To lUBound
lTemp = vArray(lLoop1)
lHold = lLoop1
Do While vArray(lHold - lHValue) > lTemp
vArray(lHold) = vArray(lHold - lHValue)
lHold = lHold - lHValue
If lHold < lHValue Then Exit Do
Loop
vArray(lHold) = lTemp
Next lLoop1
Loop Until lHValue = LBound(vArray)
End Sub
|
__________________
Ego quoque nescio quid hic scriptum est
|

01-25-2003, 09:00 AM
|
 |
Regular
|
|
Join Date: Jan 2003
Location: Persian Gulf
Posts: 50
|
|
I guess it would be more optimized for space (not sure if it will speed up the sort ) if you tried to implement it in non-recursive (iterative) fashion
also, if you are sorting number, here is a way you SWAB two numbers without the need of a third variable
a = a + b
b = a - b
a = a - b
|
|

01-25-2003, 10:43 AM
|
 |
Obsessive OPtimizer
Preferred language: Administrator * Guru *
|
|
Join Date: Jun 2002
Location: Debug Window
Posts: 13,226
|
|
Another interesting sorting algo is the RadixSort. It takes a little more memory than the other routines as a temporary array the same size as the array to be sorted is needed. Its equally as fast QuickSort in C++ (slightly slower in VB due to lack of various operators)
Random Data (100K items)
QuickSort 33 ms
RadixSort 38 ms
ShellSort 93ms
Sorted Data (100K items)
RadixSort 39ms
ShellSort 48ms
QuickSort 68000ms
One nice thing about RadixSort is it performs almost the same with data in a sorted state. QuickSort ran for several minutes and I eventually killed it when testing with 1M items.
Since VB lacks true shifts, its makes the code a bit longer to be fast. A lookup table could probably be used with decent performance to combine radix0-3 into one prodecure.
Code:
Option Explicit
Public Declare Sub RtlZeroMemory Lib "KERNEL32" (dest As Any, ByVal numBytes As Long)
Const ITEMS As Long = 256
Private count(ITEMS - 1) As Long
Private index(ITEMS - 1) As Long
Private Sub ZeroArrays()
RtlZeroMemory count(0), ITEMS * 4
RtlZeroMemory index(0), ITEMS * 4
End Sub
Public Sub RadixSort(ByRef lSrc() As Long, ByRef lDest() As Long, ByVal lNumber As Long)
radix0 lNumber, lSrc, lDest
radix1 lNumber, lDest, lSrc
radix2 lNumber, lSrc, lDest
radix3 lNumber, lDest, lSrc
End Sub
Public Sub radix0(ByVal lNumber As Long, ByRef lSrc() As Long, ByRef lDest() As Long)
Dim i As Long
ZeroArrays
For i = 0 To lNumber - 1
count(lSrc(i) And &HFF) = count(lSrc(i) And &HFF) + 1
Next i
index(0) = 0
For i = 1 To 255
index(i) = index(i - 1) + count(i - 1)
Next i
For i = 0 To lNumber - 1
lDest(index(lSrc(i) And &HFF)) = lSrc(i)
index(lSrc(i) And &HFF) = index(lSrc(i) And &HFF) + 1
Next i
End Sub
Public Sub radix1(ByVal lNumber As Long, ByRef lSrc() As Long, ByRef lDest() As Long)
Dim i As Long
ZeroArrays
For i = 0 To lNumber - 1
count((lSrc(i) \ 256) And &HFF) = count((lSrc(i) \ 256) And &HFF) + 1
Next i
index(0) = 0
For i = 1 To 255
index(i) = index(i - 1) + count(i - 1)
Next i
For i = 0 To lNumber - 1
lDest(index((lSrc(i) \ 256) And &HFF)) = lSrc(i)
index((lSrc(i) \ 256) And &HFF) = index((lSrc(i) \ 256) And &HFF) + 1
Next i
End Sub
Public Sub radix2(ByVal lNumber As Long, ByRef lSrc() As Long, ByRef lDest() As Long)
Dim i As Long
ZeroArrays
For i = 0 To lNumber - 1
count((lSrc(i) \ 65536) And &HFF) = count((lSrc(i) \ 65536) And &HFF) + 1
Next i
index(0) = 0
For i = 1 To 255
index(i) = index(i - 1) + count(i - 1)
Next i
For i = 0 To lNumber - 1
lDest(index((lSrc(i) \ 65536) And &HFF)) = lSrc(i)
index((lSrc(i) \ 65536) And &HFF) = index((lSrc(i) \ 65536) And &HFF) + 1
Next i
End Sub
Public Sub radix3(ByVal lNumber As Long, ByRef lSrc() As Long, ByRef lDest() As Long)
Dim i As Long
ZeroArrays
For i = 0 To lNumber - 1
count((lSrc(i) \ 16777216) And &HFF) = count((lSrc(i) \ 16777216) And &HFF) + 1
Next i
index(0) = 0
For i = 1 To 255
index(i) = index(i - 1) + count(i - 1)
Next i
For i = 0 To lNumber - 1
lDest(index((lSrc(i) \ 16777216) And &HFF)) = lSrc(i)
index((lSrc(i) \ 16777216) And &HFF) = index((lSrc(i) \ 16777216) And &HFF) + 1
Next i
End Sub
There is good information about RadixSort at:
http://www.cubic.org/~submissive/sourcerer/radix.htm
(I ported his sample code to VB)
|
__________________
Quis custodiet ipsos custodues.
|

01-25-2003, 10:46 AM
|
 |
Obsessive OPtimizer
Preferred language: Administrator * Guru *
|
|
Join Date: Jun 2002
Location: Debug Window
Posts: 13,226
|
|
Btw, if you'd rather not use memory manipulation to clear the arrays, just Dim the count and index arrays in Radix0-3 instead of a module wide scope.
|
__________________
Quis custodiet ipsos custodues.
|

01-25-2003, 10:51 AM
|
|
Iron-Fisted Programmer
Preferred language: Retired Moderator * Guru *
|
|
Join Date: Jul 2001
Location: Fayetteville Arkansas USA
Posts: 18,127
|
|
Quote:
Originally posted by Optikal
I was trying to squeeze some more speed out of my function, and I thought that by making the 4 local variables (lngPivot, k, p, lngTemp) Static it should speed up performance. Because then each time it recurses there should be 4 less stack allocations. However, it ended up being more than twice as slow. Does anybody know why it would be slower? I also tried making them public in a code module (.bas) and it had the same effect.
|
IIRC, static variables are stored on the heap, which is slower
than the local stack. But I didn't think it would be that much
slower.
|
|

01-25-2003, 11:34 AM
|
|
Restricted
|
|
Join Date: Oct 2002
Location: UK
Posts: 682
|
|
how do you time how quick your Algorithm is ?
|
|

01-25-2003, 01:12 PM
|
 |
MetaCenturion
Preferred language: Retired Moderator * Guru *
|
|
Join Date: Aug 2001
Location: Hawaii, USA
Posts: 16,584
|
|
Using your favorite timing API function (whatever that might be)...
Code:
Dim T1 As Long
T1 = timeGetTime
'call whatever you want
Debug.Print timeGetTime - T1
'This will be the time length of whatever you called.
'if you called a function, sub, a function inside of a loop, or just
'decided to put a sleep call in here... this should be okay to some
'accuracy... however, [api]QueryPerformanceCounter[/api] works a bit
'faster.
|
|

01-25-2003, 08:29 PM
|
 |
Codeaholic
Preferred language: Retired Leader * Guru *
|
|
Join Date: Oct 2002
Location: Winnipeg, MB, Canada
Posts: 4,543
|
|
Daedalus: There is a change you can make to my qsort to make it run fast on already sorted (or almost sorted) lists. My array picks the first item in the array as the pivot. If you change it to pick the middle value in the array as the pivot, that will ensure good performance on already sorted arrays (but will add an extra divide operation to each recursion).
I did some benchmarks with OnErrors RadixSort and found that it runs almost twice as fast as my QSort. I made some code optimization to his code (just use some temp vars to precalculate stuff, and juggled around some of the for loops) and managed to squeeze another 30% speed boost out of it.
machaira: I looked at the counting sort in that zip, and the problem with that is, that unless the range of numbers to be sorted is relatively small, it is much too memory intensive. If I wished to use it on an array of longs, where there is no restrictions on the value of the array elements (anywhere from -2^31 to +2^31) then the CountingSort would need approx. 16GB of memory to operate. There are however some sorts in there that I forgot about, such as heapsort.
I'm gonna add HeapSort, and ShellSort to my little benchmark program, and see if I can optimize the hell out of them. I'll post my benchmarks results when I get it all done (yes, I know I'm a dork).
Also, I'm gonna try making a hybrid QSort that will call use another sort algorithm when the subarray size gets small (I looked at the code machaira posted and it does that, and it made me remember my prof mentioning that in my data structs. and algo. class). I'm also gonna try re-implementing my qsort as iterative and see if that is any faster (at the least it will prevent stack errors for large arrays).
|
__________________
There are 10 types of people in this world, those that understand binary, and those that don't.
|

01-25-2003, 08:32 PM
|
 |
Codeaholic
Preferred language: Retired Leader * Guru *
|
|
Join Date: Oct 2002
Location: Winnipeg, MB, Canada
Posts: 4,543
|
|
On a side note, I'm currently just using good ol' GetTickCount() to do my benchmarking. What api's do I need to look into so that I can get a true 'cpu time' for my code rather than just the elapsed time (the same time that taskmon in NT displays)?
|
__________________
There are 10 types of people in this world, those that understand binary, and those that don't.
|

01-25-2003, 08:48 PM
|
 |
Bit Flipper
|
|
Join Date: Feb 2002
Location: The Inner Loop
Posts: 5,550
|
|
|

01-25-2003, 09:20 PM
|
 |
Trust me, I'm an
Preferred language: * Expert *
|
|
Join Date: Apr 2001
Location: In ur base, pwnin d00dz
Posts: 1,961
|
|
If you use the Format function to set your random integers to fixed length text (Number(1) = Format(Number(1), "0000000000")), then this seems to be faster (it's also alot simpler):
Code:
Public Sub SortArray(ByRef iArray() As String)
On Error GoTo errs
Dim temp() As String, GreaterThans As Long
ReDim temp(UBound(iArray))
For i = 0 To UBound(iArray)
GreaterThans = 0
For j = 0 To UBound(iArray)
If StrComp(iArray(i), iArray(j), vbTextCompare) = 1 Then GreaterThans = GreaterThans + 1
Next
Do
If temp(GreaterThans) = "" Then
temp(GreaterThans) = iArray(i)
Exit Do
Else
GreaterThans = GreaterThans + 1
End If
Loop
Next
For i = 0 To UBound(temp)
iArray(i) = temp(i)
Next
errs:
End Sub
|
__________________
To err is human; to debug, divine.
|

01-25-2003, 09:22 PM
|
 |
Codeaholic
Preferred language: Retired Leader * Guru *
|
|
Join Date: Oct 2002
Location: Winnipeg, MB, Canada
Posts: 4,543
|
|
Orbity: yes I know about that, but that does not account for the cpu-time actually spent running my code. For example, if I have another process running at the same time as my benchmark, and this other process sucks up some of the cpu it will affect my benchmarks results. If this other process sucks up cpu-time only during my quicksort benchmark, but not during the bublesort benchmark, it will make it look as if the quicksort was slower than it actually was.
I know its possible to retrieve the actual cpu-time spent executing my code, because the taskmon in NT does just this. I just don't know the appropriate api's...
|
__________________
There are 10 types of people in this world, those that understand binary, and those that don't.
|

01-25-2003, 09:49 PM
|
 |
Codeaholic
Preferred language: Retired Leader * Guru *
|
|
Join Date: Oct 2002
Location: Winnipeg, MB, Canada
Posts: 4,543
|
|
darkforces: I benchmarked that algorithm you posted, and unfortunately it is much slower than the other ones.
With 10,000 random items on a PII 233 machine:
Your string algorithm: 154.2 secs
Bubble Sort: 8.7 secs
Insertion Sort: 4.2 secs
Selection Sort: 4.2 secs
Merge Sort: 2.2 secs
Quick Sort: 0.2 secs
Radix Sort: 0.1 secs
* the time for the string sort does not include the time needed to create and format the string array from the array of longs.
|
__________________
There are 10 types of people in this world, those that understand binary, and those that don't.
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
| |
|