As research in robot motion planning, biology, data mining, geographic information systems, and other scientific fields progressively addresses problems of unprecedented complexity, the demand for computing nearest-neighbors graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to single machines.
This work address the problem of computing the nearest-neighbors graph utilizing multiple processors communicating via message passing in a cluster system with no-shared memory and where the amount of memory available to each processor is limited. DKNNG supports computation of very large nearest-neighbors graphs consisting of millions of points with hundreds of dimensions. Experimental results show nearly linear speedup on one hundred and forty processors and indicate that similar speedup can be obtained on several hundred processors.
Logged data for one of the benchmarks showing the percentage of time that is spent on useful computations. Since almost all the time is spent on useful computations, DKNNG is able to achieve near linear speedup on hundreds of processors.