Performance and scikit-learn (0/4)

A series of blog-posts.

Published on the: 15.12.2021
Last modified on the: 14.09.2022
Estimated reading time: ~ 2 min.

For more than 10 years, scikit-learn has been bringing machine learning and data science to the world. Since then, the library always aimed to deliver quality implementations to its users.

This series of blog post aims at explaining the on-going work of the scikit-learn developers to boost the performance of the library.

This series should be read as follows:

Knowing about the following topics can help understand the blog posts:

  • the main algorithms in machine learning, especially \(k\)-nearest neighbors
  • basic data-structures and algorithms complexity
  • RAM and the hierarchy of CPU caches
  • some elements of linear algebra
  • some elements of object-oriented design (abstract class, template methods)
  • some elements of C programming (allocation on the heap, pointer arithmetic)
  • some elements of OpenMP (static scheduling and parallel for-loop)
  • some elements of Cython


This work has been made possible thanks to the Cython+ project.