## \(k\)-nearest neighbors search inÂ scikit-learn

\(k\)-nearest neighbors search is at the base of many implementations used withinÂ scikit-learn.

For instance, it is used in Affinity Propagation, BIRCH, Mean Shift, OPTICS, Spectral Clustering, TSNE, KNeighbors Regressor and KNeighborsÂ Classifier.

Whilst many libraries implement approximated versions of \(k\)-nearest neighbors search to speed-up
the time of execution^{1},

scikit-learnâ€™s implementation aims at returning the exact \(k\)-nearestÂ neighbors.

There are two main aspects in scikit-learn current implementation of \(k\)-nearest neighborsÂ search.

### Computing chunks of the distance matrixÂ computations

The pattern for \(k\)-nearest neighbors computing the tasks is the followingÂ one:

- Compute \(\mathbf{D}_d(\mathbf{X}, \mathbf{Y})\), the distance matrix between the vectors of two arrays \(\mathbf{X}\) and \(\mathbf{Y}\)
- Reduce rows of \(\mathbf{D}_d(\mathbf{X}, \mathbf{Y})\) appropriately for the given algorithm: for instance, and as covered in a following subsection, the adapted reductions for \(k\)-nn search is \(\text{argkmin}\) which returns the \(k\) smallest indices of values in a orderedÂ set.
- Perform extra operations on results of the reductions (here sortÂ values)

Generally, one does not compute \(\mathbf{D}_d(\mathbf{X}, \mathbf{Y})\) entirely because its space complexity is \(\Theta(n_x \times n_y)\): technically, \(\mathbf{D}_d(\mathbf{X}, \mathbf{Y})\) does not fit in RAM for soundÂ datasets.

In practice one thus computes chunks of this dataset which are themselves reduced in streaming.
This is what is currently done in scikit-learn^{2}.

### Reducing distances: `argkmin`

The \(\text{argkmin}\) consist in returning the indices of the \(k\) smallest value of an orderedÂ container.

In the case of the \(k\)-nearest neighbors search, we can apply \(\text{argkmin}\) on each row of the distances matrix to getÂ the

### CurrentÂ issues

Unfortunately, most of those reductions are implementated using python functions as reductions
functions in a general parallelisation pattern which relies on
`joblib.Parallel`

.

This technically is not the most efficient: working at this level with views on numpy arrays moves large chunks of data back and forth several time between the RAM and the CPUsâ€™ caches, hardly make use of caches, and allocate temporaryÂ results.

Moreover, the cost of manipulating those chunks of data the CPython interpreter causes a non negligeable overhead because they are associated to Python objects which are bound to the Global Lock Interpreter for their referenceÂ counting.

Hence, this does not allow getting proper hardwareÂ scalability:

- Approximate nearest neighbors search algorithms comes in many different flavours, and thereâ€™s even a benchmark suite comparing them!. â†©
`KNearestNeighbors.kneighbors`

is the interface to lookÂ for. â†©