Performance and scikit-learn (0/4)

Published on the: 15.12.2021
Last modified on the: 26.04.2023
Estimated reading time: ~ 2 min.

For more than 10 years, scikit-learn has been bringing machine learning and data science to the world. Since then, the library always aimed at delivering quality implementations to its users.

This series of blog post aims at explaining the ongoing work of the scikit-learn developers to improve the performance of the library by several orders of magnitude.

This series should be read as follows:

Knowing about the following topics can help understand the blog posts:

  • the main algorithms in machine learning, especially \(k\)-nearest neighbors
  • basic datastructures and algorithms complexity
  • RAM and the hierarchy of CPU caches
  • some elements of linear algebra
  • some elements of object-oriented design (abstract class, template methods)
  • some elements of C programming (allocation on the heap, pointer arithmetic)
  • some elements of OpenMP (static scheduling and parallel for-loop)
  • Cython