Performances and scikit-learn: a series of blog post

A series of blog-posts.

Published on the: 15.12.2021
Last modified on the: 17.01.2022
Estimated reading time: ~ 2 min.

Disclaimer: this series of blog post is currently a draft


scikit-learn has been around for more than 10 years.

Yet, scikit-learn has some room of manoeuvre when it comes to performances.

This series of blog post aims at explaining the current and on-going work the team is currently performing to boost the performances of the library.

This series should read as follows:

I will try to make its content accessible, getting in technical details progressively. Yet, this provide a lot of information, concepts and specific technical jargons that I won’t introduce for conciseness.

I believe knowing a bit about the following topics always can help:

  • the main algorithms in machine learning, especially \(k\)-nearest neighbors
  • basic data-structures and algorithm complexity
  • RAM and CPU caches
  • some rudimentals of linear algebra
  • some rudimentals of object-oriented design (abstract class, template methods)
  • some rudimentals of C programming (allocation on the heap, pointer arithmetic)
  • some rudimentals of OpenMP (static scheduling and parallel “for-loop”)