O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Accelerating Random Forests in Scikit-Learn 
Gilles Louppe 
Universite de Liege, Belgium 
August 29, 2014 
1 / 26
Motivation 
... and many more applications ! 
2 / 26
About 
Scikit-Learn 
 Machine learning library for Python 
 Classical and well-established 
algorithms 
 Emphasis on code ...
Outline 
1 Basics 
2 Scikit-Learn implementation 
3 Python improvements 
4 / 26
Machine Learning 101 
 Data comes as... 
A set of samples L = f(xi ; yi )ji = 0; : : : ;N
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Próximos SlideShares
Carregando em…5
×

Accelerating Random Forests in Scikit-Learn

Random Forests are without contest one of the most robust, accurate and versatile tools for solving machine learning tasks. Implementing this algorithm properly and efficiently remains however a challenging task involving issues that are easily overlooked if not considered with care. In this talk, we present the Random Forests implementation developed within the Scikit-Learn machine learning library. In particular, we describe the iterative team efforts that led us to gradually improve our codebase and eventually make Scikit-Learn's Random Forests one of the most efficient implementations in the scientific ecosystem, across all libraries and programming languages. Algorithmic and technical optimizations that have made this possible include:

- An efficient formulation of the decision tree algorithm, tailored for Random Forests;
- Cythonization of the tree induction algorithm;
- CPU cache optimizations, through low-level organization of data into contiguous memory blocks;
- Efficient multi-threading through GIL-free routines;
- A dedicated sorting procedure, taking into account the properties of data;
- Shared pre-computations whenever critical.

Overall, we believe that lessons learned from this case study extend to a broad range of scientific applications and may be of interest to anybody doing data analysis in Python.

Accelerating Random Forests in Scikit-Learn

  1. 1. Accelerating Random Forests in Scikit-Learn Gilles Louppe Universite de Liege, Belgium August 29, 2014 1 / 26
  2. 2. Motivation ... and many more applications ! 2 / 26
  3. 3. About Scikit-Learn Machine learning library for Python Classical and well-established algorithms Emphasis on code quality and usability Myself @glouppe PhD student (Liege, Belgium) Core developer on Scikit-Learn since 2011 Chief tree hugger scikit 3 / 26
  4. 4. Outline 1 Basics 2 Scikit-Learn implementation 3 Python improvements 4 / 26
  5. 5. Machine Learning 101 Data comes as... A set of samples L = f(xi ; yi )ji = 0; : : : ;N

×