*Improving Motion Planning and Nonlinear Dimensionality Reduction by Computing Proximity Relations Faster*

Nearest neighbors, which are usually defined as the k closest points, are used by many motion planning methods to capture the connectivity of the free space. In addition to motion planning, nonlinear dimensionality reduction methods often rely on the nearest-neighbors graph to extract low-dimensional embeddings that reliably capture the underlying structure of high-dimensional data.

Although researchers have developed many nearest-neighbors algorithms, such as kd-tree, R-tree, X-tree, SR-tree, M-tree, VP-tree, GNAT, CoverTree, iDistance, analysis has shown that for certain distance metrics and data distributions, the computational efficiency of nearest-neighbors algorithms decreases rapidly as the dimension of the data increases.

The main contribution of this work is to eliminate the bottleneck imposed by exact nearest neighbors by proposing the use of approximate nearest neighbors as a viable approach in high-dimensional motion planning and nonlinear dimensionality reduction. Two computational methods are developed, Distance-based Projections onto Euclidean Space (DPES) and an improved version Hill-Climbing Distance-based Projections onto Euclidean Space (hcDPES), that compute approximate nearest neighbors by using a distance-based projection of high-dimensional metric spaces onto low-dimensional Euclidean spaces.

As indicated by experimental results, the advantage of the proposed nearest-neighbors approximation is that while it reliably maintains the accuracy of exact nearest neighbors, it significantly reduces the computational time.

Quantitative analysis of proximity algorithms on high-dimensional problems in the context of sampling-based motion planning indicates that the computational efficiency of exact nearest-neighbors methods deteriorates rapidly as the dimension of the problem increases and approaches the performance of the brute-force method. The impracticality of exact nearest-neighbors algorithms motivates the use of approximate algorithms, which trade off accuracy for efficiency, such as DPES and hcDPES. These algorithms reduce the computational time required by motion-planning methods, while maintaining the overall accuracy needed to solve motion planning queries.

DPES-ScIMAP builds upon the recent ScIMAP (Scalable Isomap) method [Das et. al], which, by using proximity relations and dimensionality reduction, has been shown to reliably extract from simulation data a few parameters that capture the main, linear and/or nonlinear, modes of motion of a molecular system. Results on the characterization of protein folding reactions reveal that the folding landscapes emerging from the application of DPES-ScIMAP and ScIMAP are practically indistinguishable. The advantage is that, in many instances, by using DPES-ScIMAP instead of ScIMAP, the computational time required to analyze the simulation data is reduced from several CPU months to just a few CPU hours.

Nonlinear dimensionality reduction methods often rely on the nearest-neighbors graph to extract low-dimensional embeddings that reliably capture the underlying structure of high-dimensional data. Research however has shown that computing nearest neighbors of a point from a high-dimensional data set generally requires time proportional to the size of the data set itself, rendering the computation of the nearest-neighbors graph prohibitively expensive. hcDPES significantly reduces the major computational bottleneck of many nonlinear dimensionality reduction methods by efficiently and accurately approximating the nearest-neighbors graph. The approximation relies on a distance-based projection of high-dimensional data onto low-dimensional Euclidean spaces. As indicated by experimental results, the advantage of the proposed approximation is that while it reliably maintains the accuracy of nonlinear dimensionality reduction methods, it significantly reduces the computational time.