4. IPython
Powerful interactive shell
Supports tab completion of just about everything
Inline help system for modules, classes etc. with ?, source
code with ??
Browser based notebook (Jupyter) with support for
(runnable) code, text, mathematical expressions using
LATEX, inline plots etc.
Could be used as a computational lab notes/worksheets
Magic functions to access the shell, run R code etc.
Parallel computing
4 / 31
5.
Notes on Jupyter
1. The Jupyter Notebook works with over 40 languages
2. Jupyter Notebooks render on GitHub
Jupyter
Computational Narratives
1. Computers are optimized for producing, consuming and
processing data.
2. Humans are optimized for producing, consuming and
processing narratives/stories.
3. For code and data to be useful to humans, we need tools
for creating and sharing narratives that involve code and
data.
The Jupyter Notebook is a tool for creating and sharing
computational narratives.
5 / 31
6. Jupyter & Data Science
The Jupyter Notebook is a tool that allows us to explore the
fundamental questions of Data Science
with a particular dataset
with code and data
in a manner that produces a computational narrative
that can be shared, reproduced, modified, and extended.
At the end of it all, those computational narratives encapsulate
the goal or end point of Data Science. The character of the
narrative (prediction, inference, data generation, insight, etc.)
will vary from case to case.
Thepurposeofcomputingisinsight,notnumbers.
Hamming,Richard(1962).NumericalMethodsforScientistsand
6 / 31
8. NumPy
NumPy is the fundamental package for scientific computing with
Python. It contains among other things:
A powerful N-dimensional array object
Sophisticated (broadcasting) functions
Tools for integrating C/C++ and Fortran code
Useful linear algebra, Fourier transform, and random
number capabilities
Besides its obvious scientific uses, NumPy can also be used as
an efficient multi-dimensional container of generic data.
Arbitrary data-types can be defined. This allows NumPy to
seamlessly and speedily integrate with a wide variety of
databases.
NumPy provides a powerful N-dimensions array object
Methods on these arrays are fast because they relies on
well-optimised librairies for linear algebra (BLAS, ATLAS,
MKL)
NumPy is tolerant to python’s lists
NumPy inherits from years of computer based numerical
analysis problem solving
8 / 31
9. importnumpyasnp
a=np.array([1,2,3]) #Createarank1array
printtype(a) #Prints"<type'numpy.ndarray'>"
printa.shape #Prints"(3,)"
printa[0],a[1],a[2] #Prints"123"
a[0]=5 #Changeanelementofthearray
printa #Prints"[5,2,3]"
b=np.array([[1,2,3],[4,5,6]]) #Createarank2array
printb.shape #Prints"(2,3)"
printb[0,0],b[0,1],b[1,0] #Prints"124"
#-----
a=np.zeros((2,2)) #Createanarrayofallzeros
printa #Prints"[[0. 0.]
# [0. 0.]]"
b=np.ones((1,2)) #Createanarrayofallones
printb #Prints"[[1. 1.]]"
c=np.full((2,2),7)#Createaconstantarray
printc #Prints"[[7. 7.]
# [7. 7.]]"
d=np.eye(2) #Createa2x2identitymatrix
printd #Prints"[[1. 0.]
# [0. 1.]]"
e=np.random.random((2,2))#Createanarrayfilledwithrandomvalues
printe #Mightprint"[[0.91940167 0.08143941]
# [0.68744134 0.87236687]]"
Numpy
Numpy is the core library for scientific computing in Python. It
provides a high-performance multidimensional array object
(MATLAB style), and tools for working with these arrays.
Arrays
A numpy array is a grid of values, all of the same type, and
is indexed by a tuple of nonnegative integers.
The number of dimensions is the rank of the array; the
shape of an array is a tuple of integers giving the size of
the array along each dimension.
We can initialize numpy arrays from nested Python lists,
and access elements using square brackets.
Numpy also provides many functions to create arrays.
9 / 31
11. SciPy
SciPy is a Python-based ecosystem of open-source software for
mathematics, science, and engineering. SciPy core packages:
IPython, NumPy, SciPy Library, SimPy, matplotlib, pandas.
SciPy Library
SciPy is a collection of mathematical algorithms and convenience
functions built on top of NumPy includes modules for: statistics,
integration & ODE solvers, linear algebra, optimization, FFT, etc.
We use the terms SciPy and SciPy Library interchangeably.
Meaning depends on context.
SciPy is a toolbox for researchers/scientists, it contains many
hidden treasures for them.
11 / 31
12. SciPy & NumPy
Numpy provides a high-performance multidimensional array
and basic tools to compute with and manipulate these arrays.
SciPy builds on this, and provides a large number of functions
that operate on numpy arrays and are useful for different types
of scientific and engineering applications.
SciPy provides numerous numerical routines, that run efficiently
on top of NumPy arrays for: optimization, signal processing,
linear algebra and many more. It also provides some convenient
data structures as compressed sparse matrix and spatial data
structures. If you had already use some scikits (scikit-learn,
scikit-image) you already used scipy extensively.
A few thoughts on SciPy:
Contains linear algebra routines that overlap with NumPy;
SciPy’s linear algebra routines always run on the
optimized system libraries (LAPACK, ATLAS, Intel Math
Kernel Library, etc.)
Sparse matrix support
Extends NumPy’s statistical capabilities
Under active development, new toys added constantly!
12 / 31
13. SciPy
A big box of tools:
Special functions (scipy.special)
Integration (scipy.integrate)
Optimization (scipy.optimize)
Interpolation (scipy.interpolate)
Fourier Transforms (scipy.fftpack)
Signal Processing (scipy.signal)
Statistics (scipy.stats)
Linear Algebra (scipy.linalg)
File IO (scipy.io)
Sparse Eigenvalue Problems with ARPACK
Compressed Sparse Graph Routines
(scipy.sparse.csgraph)
Spatial data structures and algorithms (scipy.spatial)
Multi-dimensional image processing (scipy.ndimage)
Weave (scipy.weave)
fromscipy.statsimportlinregress
(slope,intercept,r,p,se)=linregress(x,noisy_y)
#---
fromscipy.statsimportspearmanr,pearsonr
x_cubed=x**3
x_cubed+=np.random.normal(0,3,10)
13 / 31
15. matplotlib
The ultimate plotting library that renders 2D and 3D high-quality
plots for python.
pyplot implements Matlab-style plotting
Object-oriented API for more advanced graphics
The API mimics, in many ways the MATLAB one, easing the
transition from MATLAB users to python
Once again, no surprises, matplotlib is a very stable and
mature project (expect one major release per year)
Inline plots in the notebook:
ipythonnotebook--pylabinline
15 / 31
18. TL;DR
NumPy is the foundation
SciPy is built upon NumPy, with some overlapping
functionality
matplotlib complements both
NumPy, SciPy, matplotlib
NumPy is the foundation of scientific and numerical
computing with Python
SciPy is a collection of mathematical and scientific tools
matplotlib is a technical plotting package
NumPy Arrays
Implemented in C for efficiency
Python indexing and slicing
Elements are strongly typed
Taking advantage of NumPy
Think in parallel!
Replace loops with vector operations
matplotlib
Primarily 2D plotting
Basic 3D plots available with mplot3d (import
mpl_toolkits.mplot3d)
18 / 31
19. Other Notes
NumPy/SciPy/scikit-learn rely on many low-level Fortran/C
library such as BLAS, ATLAS, the Intel MKL…
most of these libraries are shipped by your favorite OS
unoptimized (well, maybe not the case for Mac)
you may want to re-compile these libraries or to use a
packaged python distribution (anaconda, canopy)
libraries for performance: numba, cython, ...
19 / 31
21. pandas is an open source, BSD-licensed library providing high-
performance, easy-to-use data structures and data analysis tools
for the Python programming language.
pandas
"R for Python"
Provides easy to use data structures & a ton of useful
helper functions for data cleanup and transformations
Fast! (backed by NumPy arrays)
Integrates well with other libs e.g. scikit-learn
21 / 31
24. SymPy
SymPy is a Python library for symbolic mathematics. It aims to
become a full-featured computer algebra system (CAS) while
keeping the code as simple as possible in order to be
comprehensible and easily extensible.
SymPy is written entirely in Python and does not require any
external libraries.
importsympy
sympy.sqrt(8)
#2*sqrt(2)
fromsympyimportsymbols
x,y=symbols('xy')
expr=x+2*y
expr
#x+2*y
expr-x
#2*y
24 / 31
26. scikit-learn
Machine Learning algorithms implemented in Python on
top of NumPy & SciPy
Conveniently maintains the same interface to a wide
range of algorithms
Includes algorithms for: Classification, Regression,
Clustering, Dimensionality reduction
As well as lots of useful utilities (cross-validation,
preprocessing etc.)
fromsklearnimportdatasets
iris=datasets.load_iris()
digits=datasets.load_digits()
print(digits.data)
digits.target
digits.images[0]
fromsklearnimportsvm
clf=svm.SVC(gamma=0.001,C=100.)
clf.fit(digits.data[:-1],digits.target[:-1])
26 / 31
30. References
1. Brian Granger: Project Jupyter as a Foundation for Open Data Science
2. Juan Luis Cano Rodriguez, IPython: How a notebook is changing science | Python as a real alternative to
MATLAB, Mathematica and other commercial software
3. Olivier Hervieu: Introduction to scientific programming in python
4. CS231n: IPython Tutorial, http://cs231n.github.io/ipython-tutorial/
5. J.R. Johansson: Introduction to scientific computing with Python
6. Introduction to solving biological problems with Python by pycam
7. Jake VanderPlas: The State of the Stack
30 / 31