KD 树和 R 树的区别是什么?

我查看了 KD 树和 R- 树的定义,在我看来它们几乎是一样的。

KD 树和 R 树有什么区别?

28005 次浏览

R-trees and kd-trees are based on similar ideas (space partitioning based on axis-aligned regions), but the key differences are:

  • Nodes in kd-trees represent separating planes, whereas nodes in R-trees represent bounding boxes.
  • kd-trees partition the whole of space into regions whereas R-trees only partition the subset of space containing the points of interest.
  • kd-trees represent a disjoint partition (points belong to only one region) whereas the regions in an R-tree may overlap.

(There are lots of similar kinds of tree structures for partitioning space: quadtrees, BSP-trees, R*-trees, etc. etc.)

A major difference between the two not mentioned in this answer is that KD-trees are only efficient in bulk-loading situations. Once built, modifying or rebalancing a KD-tree is non-trivial. R-trees do not suffer from this.

They are actually quite different. They serve similar purpose (region queries on spatial data), and they both are trees (and both belong to the family of bounding volume hierarchy indexes), but that is about all they have in common.

  • R-Trees are balanced, k-d-trees are not (unless bulk-loaded). This is why R-trees are preferred for changing data, as k-d-trees may need to be rebuilt to re-optimize.
  • R-Trees are disk-oriented. They actually organize the data in areas that directly map to the on-disk representation. This makes them more useful in real databases and for out-of-memory operation. k-d-trees are memory oriented and are non-trivial to put into disk pages
  • k-d-trees are elegant when bulk-loaded (kudos to SingleNegationElimination for pointing this out), while R-trees are better for changing data (although they do benefit from bulk loading, when used with static data).
  • R-Trees do not cover the whole data space. Empty areas may be uncovered. k-d-trees always cover the whole space.
  • k-d-trees binary split the data space, R-trees partition the data into rectangles. The binary splits are obviously disjoint; while the rectangles of an R-tree may overlap (which actually is sometimes good, although one tries to minimize overlap)
  • k-d-trees are a lot easier to implement in memory, which actually is their key benefit
  • R-trees can store rectangles and polygons, k-d-trees only stores point vectors (as overlap is needed for polygons)
  • R-trees come with various optimization strategies, different splits, bulk-loaders, insertion and reinsertion strategies etc.
  • k-d-trees use the one-dimensional distance to the separating hyperplane as bound; R-trees use the d-dimensional minimum distance to the bounding hyperrectangle for bounding (they can also use the maximum distance for some counting queries, to filter true positives).
  • k-d-trees support squared Euclidean distance and Minkowski norms, while Rtrees have been shown to also support geodetic distance (for finding near points on geodata).