[MUD-Dev] DESIGN: R-Trees (fwd)

clawrenc at cup.hp.com clawrenc at cup.hp.com
Wed Aug 27 17:42:22 New Zealand Standard Time 1997

In <Pine.LNX.3.91.970824165454.19820M-100000 at uni-corn.demon.co.uk>, on
   at 06:14 PM, Greg Munt <greg at uni-corn.demon.co.uk> said:

>I should have known better, than to post this to rgma. I guess I
>won't  try too hard at being constructive there, anymore.

RGMA is a breeding ground for list members.  The list is a refuge from



>The R-Tree is a model for representing spatial data. I'm considering an 
>R-Tree as the basis for some kind of co-ordinate system.
>My understanding of the R-Tree model:

Yup, you;'ve basically got it, and rephrased the definition I'm about
to quote below.  Note that this by no means ilterates all the
different forms of trees or data storage structures for spatial data. 
Scour the Stony Brook page for a while for a better grasp on that one.

Note: My first preference is for R*-Trees.  Conceptually I like them. 
They *feel* elegant to me.  Hving been working with them for a while
however, as beautiful as the ability is to dynamically generate new
rectangles as queries demand them, I'm starting to dislike the clutter
than inevitably develops.  I've yet to think of a decent approach for
using this feature of R*-Trees for a very dynamic world where the
object density, the query rate and the number of sources of queries
may be all high.

VP-Trees are starting to look attractive especially when a
neighborhood concept is mapped against their partitions.

I too still have to spend my time going thru Stony Brook on this.

>From http://www.cs.cuhk.hk/~drsam/methods.html:

What is R-Tree

A R-Tree, proposed by Antonin Guttman[1], is an index structure for
point and spatial data at the same time. Insert, delete and search can
be intermixed without periodic reorganization. It uses a tuple to
represent a spatial data in the database. In order to retrieve the
data, each tuple has a unique identifier, tuple-identifier. At the
leaf node of a R-Tree, it has index record that can reference the
spatial data. The index record is (I, tuple-identifier). I is an
n-dimensional rectangle and it is the bounding rectangle of the
spatial data indexed. This rectangle is also known as minimal bounding
rectangle, MBR. and each entry in tuple-identifier is the upper and
lower bounds, [upper, lower], of the rectangle along the dimension.
Non-leaf nodes contain entries (I, childnode-pointer) where I is the
minimal rectangle bounding all the rectangles in the lower nodes'
entries. Childnode-pointer is the pointer to a lower node in the
R-Tree. Let M and m<=M/2 be the maximum and minimum number of entries
can be filled into one node respectively.

Properties of R-Tree

A R-Tree satisfies the following properties:

    A R-Tree is a height balance tree and all leaves are on the same

    Root node has at least two children unless it is the leaf node.

    Every non-leaf node contains between m and M entries unless it is
    the root.

    For each entries (I, childnode-pointer) in a non-leaf node, I is
    the smallest rectangle that spatially contains all rectangles in
    its child nodes.

    Every leaf node contains between m and M index records unless it
    is the root.

    For each index record (I, tuple-identifier) in a leaf node, I is
    the smallest rectangle that spatially contains the n-dimensional
    data object represented by the indicated tuple.


The R-tree is based on a heuristic optimization. The optimization
criterion is to minimize the area of each enclosing rectangle in the
inner nodes. R*-Tree which incorporates a combined optimization of
area, margin and overlap of each bounding rectangle in the inner nodes
was proposed in [6]. For slightly higher implementation cost, it
outperforms the existing R-Tree variants.

    Minimizing the area covered by a bounding rectangle should
    minimize the dead space. This will improve performance since
    decisions which paths have to be traversed, can be taken on higher

    Minimizing the overlap between bounding rectangles decreases the
    number of paths to be traversed.

    Minimizing the margin of a bounding rectangle will make the
    rectangle more quadratic. It is because for fixed area, the object
    with the smallest margin is the square. Quadratic rectangles can
    be packed easily and thus building a smaller rectangle.


Conventional spatial index structures divide the multi-dimensional
vector space into partitions which have approximately the same number
of data points as each other. It facilitates in finding the nearest
neighbor of a given query point because it is only necessary to touch
a small number of partitions. Most partitioning methods are based on
absolute coordinate values of the vector space. R-Tree and R*-Tree
described before use this type of partitioning method. The structures
partitioned in this way are useful for queries based on absolute
coordinates, like range queries. However, in general, it does not
maintain any distance information, such as distance between points
within a partition and the partition's boundaries. Since this
information is critical in pruning the search space for
nearest-neighbor search, index structures using partitioning methods
based on absolute coordinate are thus not so useful for
multi-dimensional nearest-neighbor search.

Nearest-neighbor search by definition is to find out one point with
minimum point-to-point distance from a given query point, so it is
natural to use partitioning method based on relative distance rather
than absolute coordinate values. Vantage-Point tree, or VP-Tree,
method was proposed by Peter N.Yianilos. It uses the partitioning
method based on relative distance and aims for handling
multi-dimensional nearest neighbor search.

As mentioned before, VP-Tree method bases the partitioning on the
relative distances among the data points, rather than their absolute
coordinate values. It also bases on a particular vantage point.
Actually, vantage point is nothing special but a point selected from a
vector space, or a set of data points. However, the choice of vantage
point plays an important role in the performance of indexing


J C Lawrence                           Internet: claw at null.net
(Contractor)                           Internet: coder at ibm.net
---------------(*)               Internet: clawrenc at cup.hp.com
...Honorary Member Clan McFUD -- Teamer's Avenging Monolith...

More information about the MUD-Dev mailing list