Rockoon

03-08-2004, 02:49 PM

Per request, I am posting this short document I just cooked up describing a method of storing N-ary tree's.

----------------------------------------------------------------------

Static N-ary Trees

by Joseph Koss

The fundamental problem with implimenting dynamic trees in VB6 is its lack of pointers.

You are either forced to use classes (slow) or have a lot of maintenance code involved with allocating/deallocating array indexes in place of pointers. I will not discuss a complete replacement for traditional dynamic trees but instead look at a special case of trees where its structure is a fully populated STATIC tree rather than DYNAMIC tree which can change its depth/structure.

But first I will discuss a dynamic tree implimentation that has a fully populated tree for this is where you discover that you dont need pointers/indexes at all.

The traditional NODE of an array-based binary tree is something like:

type Node

value as WhateverType ' value contained at this node

Parent as long ' pointer/index to parent node

LeftChild as long ' pointer/index to left child

RightChild as long ' pointer/index to right child

end type

public TREE() as Node

With this structure you can navigate around the tree quite easily and this is one of the main powers of the tree structure. Another ability is to re-arrange nodes (often called tree rotation.) When all is said and done, static tree's lack the ability to be rearrange efficiently. This point must be stressed because many tree implimentations require this ability (splay trees and so forth) .. but those implimentation that do not can benefit from static trees.

An array-based dynamic tree implimentation will need a function to allocate a new array index for use in the tree. This function finds the first unused array entry from the pool of all array indexes and reserves it. A discussion of an efficient way to do this (and of freeing up a deleted node) is not going to be detailed here.

Suffice it to say that if you never delete nodes then for each call to AllocateNode() it will return a value equal to the number of times AllocateNode() has previously been called. On the first call it returns 0, on the next call it returns 1, then 2, and so forth.

Now when you build a fully populated tree of depth (D) there are several ways to go about doing it. The method we need, to realize the static case of a fully populated tree, is the one that builds a tree top->bottom rather than left->right or right->left. So Starting with the root we allocate a node for the root, then allocate both of its children (left then right, or right then left.. doesnt matter.. just be consistent):

0 root

/ \

/ \

/ \

/ \

left 1 2 right

The numbers indicate the "pointer" or "index" that has been assigned to the node.

This is a fully populated tree of depth = 2 (fully populated means there are no "missing" nodes)

We then create the next level of the tree by looping through all the existing nodes without children and allocating children to them:

0 root

/ \

/ \

/ \

/ \

1 2

/ \ / \

/ \ / \

3 4 5 6

This is a fully-populated tree of depth = 3.

You see the pattern that is developing. Each level of the tree is sequentialy allocated. You can see that the next level will contain nodes 7, 8, 9, 10, 11, 12, 13, and 14.

The key to STATIC tree's is that this ordered sequence of numbers allows for traversal of this tree without storing pointers or indexes. In the case of binary trees, the left and right childen of a node N can be calculated from the index of N:

LeftChild = N * B + 1

RightChild = N * B + 2

where B = the branching factor of the tree. For binary trees, B = 2. For Quad tree's, B = 4, and so forth.

In general, the i'th child of a node is:

i'thChild = N * B + i

The calculation works the other way as well, the parent of node N is:

Parent = (N - 1) \ B

note: that is integer division and that we did not need to know which child of the parent we were. We just needed to know our own node number and the branching factor of the tree.

Now that we see that we dont need to store pointers/indexes to children and parent, the Node type becomes:

type Node

value as WhateverType

end type

public TREE() as Node

What this means is that we dont really need a Node type at all. All the management involved with node traversal has been removed. If WhateverType was a bounding box, we can just use it as our Node type:

type BoundingBox

TopLeft as Vector

BottomRight as Vector

end type

public TREE() as BoundingBox

One point is that it may become important to know if a node contains a valid value. The tree model is fully populated but during implimentation we may not need/use all the nodes. So it may become important to add a boolean flag to the node type indicating if it is being used. This will allow us to traverse/impliment trees that are not fully populated.

Another point is that we must know how big the tree will get depending on what depth we will allow, so that we can allocate space for it:

NumNodes = (B ^ MaxDepth - 1) \ (B - 1)

ReDim TREE(NumNodes - 1) as Node

We may also want to know all the nodes at a particular depth (this is a big advantage to this tree format - try finding all the nodes at a given depth in a traditional pointer based tree format! You will have to walk the tree to do it)

FirstNode = (B ^ (Depth - 1) - 1) \ (N - 1)

LastNode = (B ^ Depth - 1) \ (B - 1) - 1

---

As you see, you do not need explicit pointers to handle trees.. you can calculate them implicitly. This allows for relatively fast traversal of the tree without storing pointers or the need to allocate nodes individualy when the tree is created.

For fully populated trees this layout is also more efficient (memory-wise) than the traditional tree structure including all its pointers. In the original Node type, 12 bytes were reserved for pointers PER NODE. For a depth=20 tree, this saves over 12 megs of memory.

The limitations of static trees are:

You cannot perform efficient reordering of the tree like you can when using pointers.

Sparsly populated trees are far less efficient memory-wise. Its not possible to store a depth=30 tree (1 billion nodes), for example because this system always allocates/requires space for a fully-populated tree.

---

In the case of quad tree/oct tree space partitioning, a large depth is not a problem at all (at depth 20, your space has been cut in half 20 times.. or approximately 1 millionth the size of the root space.. in practice you simply wont need more than depth 12 or so for standard quad/oct space partitioning)

Applying this to BSP tree's will be a little tricky - because often BSP trees are generated from 'polygon soup' - steps must be taken to try to retain the BSP tree with as little MaxDepth as possible (in english - you want to select a root node and order your other nodes such that the tree when generated is as close to fully populated as possible) - while this is probably a very slow processes to determe the best ordering for the BSP tree, it only needs to be done once for a given game world. You can ship your game with the BSP tree already ordered compactly.

----------------------------------------------------------------------

Static N-ary Trees

by Joseph Koss

The fundamental problem with implimenting dynamic trees in VB6 is its lack of pointers.

You are either forced to use classes (slow) or have a lot of maintenance code involved with allocating/deallocating array indexes in place of pointers. I will not discuss a complete replacement for traditional dynamic trees but instead look at a special case of trees where its structure is a fully populated STATIC tree rather than DYNAMIC tree which can change its depth/structure.

But first I will discuss a dynamic tree implimentation that has a fully populated tree for this is where you discover that you dont need pointers/indexes at all.

The traditional NODE of an array-based binary tree is something like:

type Node

value as WhateverType ' value contained at this node

Parent as long ' pointer/index to parent node

LeftChild as long ' pointer/index to left child

RightChild as long ' pointer/index to right child

end type

public TREE() as Node

With this structure you can navigate around the tree quite easily and this is one of the main powers of the tree structure. Another ability is to re-arrange nodes (often called tree rotation.) When all is said and done, static tree's lack the ability to be rearrange efficiently. This point must be stressed because many tree implimentations require this ability (splay trees and so forth) .. but those implimentation that do not can benefit from static trees.

An array-based dynamic tree implimentation will need a function to allocate a new array index for use in the tree. This function finds the first unused array entry from the pool of all array indexes and reserves it. A discussion of an efficient way to do this (and of freeing up a deleted node) is not going to be detailed here.

Suffice it to say that if you never delete nodes then for each call to AllocateNode() it will return a value equal to the number of times AllocateNode() has previously been called. On the first call it returns 0, on the next call it returns 1, then 2, and so forth.

Now when you build a fully populated tree of depth (D) there are several ways to go about doing it. The method we need, to realize the static case of a fully populated tree, is the one that builds a tree top->bottom rather than left->right or right->left. So Starting with the root we allocate a node for the root, then allocate both of its children (left then right, or right then left.. doesnt matter.. just be consistent):

0 root

/ \

/ \

/ \

/ \

left 1 2 right

The numbers indicate the "pointer" or "index" that has been assigned to the node.

This is a fully populated tree of depth = 2 (fully populated means there are no "missing" nodes)

We then create the next level of the tree by looping through all the existing nodes without children and allocating children to them:

0 root

/ \

/ \

/ \

/ \

1 2

/ \ / \

/ \ / \

3 4 5 6

This is a fully-populated tree of depth = 3.

You see the pattern that is developing. Each level of the tree is sequentialy allocated. You can see that the next level will contain nodes 7, 8, 9, 10, 11, 12, 13, and 14.

The key to STATIC tree's is that this ordered sequence of numbers allows for traversal of this tree without storing pointers or indexes. In the case of binary trees, the left and right childen of a node N can be calculated from the index of N:

LeftChild = N * B + 1

RightChild = N * B + 2

where B = the branching factor of the tree. For binary trees, B = 2. For Quad tree's, B = 4, and so forth.

In general, the i'th child of a node is:

i'thChild = N * B + i

The calculation works the other way as well, the parent of node N is:

Parent = (N - 1) \ B

note: that is integer division and that we did not need to know which child of the parent we were. We just needed to know our own node number and the branching factor of the tree.

Now that we see that we dont need to store pointers/indexes to children and parent, the Node type becomes:

type Node

value as WhateverType

end type

public TREE() as Node

What this means is that we dont really need a Node type at all. All the management involved with node traversal has been removed. If WhateverType was a bounding box, we can just use it as our Node type:

type BoundingBox

TopLeft as Vector

BottomRight as Vector

end type

public TREE() as BoundingBox

One point is that it may become important to know if a node contains a valid value. The tree model is fully populated but during implimentation we may not need/use all the nodes. So it may become important to add a boolean flag to the node type indicating if it is being used. This will allow us to traverse/impliment trees that are not fully populated.

Another point is that we must know how big the tree will get depending on what depth we will allow, so that we can allocate space for it:

NumNodes = (B ^ MaxDepth - 1) \ (B - 1)

ReDim TREE(NumNodes - 1) as Node

We may also want to know all the nodes at a particular depth (this is a big advantage to this tree format - try finding all the nodes at a given depth in a traditional pointer based tree format! You will have to walk the tree to do it)

FirstNode = (B ^ (Depth - 1) - 1) \ (N - 1)

LastNode = (B ^ Depth - 1) \ (B - 1) - 1

---

As you see, you do not need explicit pointers to handle trees.. you can calculate them implicitly. This allows for relatively fast traversal of the tree without storing pointers or the need to allocate nodes individualy when the tree is created.

For fully populated trees this layout is also more efficient (memory-wise) than the traditional tree structure including all its pointers. In the original Node type, 12 bytes were reserved for pointers PER NODE. For a depth=20 tree, this saves over 12 megs of memory.

The limitations of static trees are:

You cannot perform efficient reordering of the tree like you can when using pointers.

Sparsly populated trees are far less efficient memory-wise. Its not possible to store a depth=30 tree (1 billion nodes), for example because this system always allocates/requires space for a fully-populated tree.

---

In the case of quad tree/oct tree space partitioning, a large depth is not a problem at all (at depth 20, your space has been cut in half 20 times.. or approximately 1 millionth the size of the root space.. in practice you simply wont need more than depth 12 or so for standard quad/oct space partitioning)

Applying this to BSP tree's will be a little tricky - because often BSP trees are generated from 'polygon soup' - steps must be taken to try to retain the BSP tree with as little MaxDepth as possible (in english - you want to select a root node and order your other nodes such that the tree when generated is as close to fully populated as possible) - while this is probably a very slow processes to determe the best ordering for the BSP tree, it only needs to be done once for a given game world. You can ship your game with the BSP tree already ordered compactly.