IPython

Learning IPython for Interactive Computing and
Data Analysis
2015 Summer Data Mining Workshop
August 20, 2015
Kyunghoon Kim
kyunghoon@unist.ac.kr
Department of Mathematical Sciences
Ulsan National Institute of Science and Technology
Republic of Korea

1
Contents
Git
IPython
Installation1. Python
Installation2. pip upgrade and library install
IPython
큰 그림
Numpy
SciPy
pandas
matplotlib

2
Git
http://backlogtool.com/git-guide/kr/

3
Git
http://backlogtool.com/git-guide/kr/

4
Git
Git 참고문헌 (links)
▶ 누구나 쉽게 이해할 수 있는 Git 입문
▶ git을 시작하기 위한 간편 안내서. 어렵지 않아요 ;)
▶ 생활코딩 > 버전관리 시스템 GIT
Git 저장소
▶ https://bitbucket.org/ # private
▶ https://github.com/ # public
▶ https://gitlab.com/ # server

5
Warming up
Installation1. Python
▶ http://continuum.io/downloads에서 Anaconda를 다운로드
하세요.
▶ 설치 시, All User Option으로 설치하세요.
Anaconda is a completely free Python distribution. It includes over 195 of the
most popular Python packages for science, math, engineering, data analysis.
e.g., IPython, Numpy, Scipy, Pandas, Scikit-learn, especially Networkx

6
Warming up
윈도우 유저의 경우, 윈도우 키를 누르고 cmd 입력, 엔터.
나오는 화면에서 아래의 명령어 입력.
▶ python -m pip install -U pip # pip의 버전 업그레이드
▶ pip install -U ipython # ipython의 버전 업그레이드
▶ pip install <라이브러리 이름> # 설치
▶ pip install -U <라이브러리 이름> # 업그레이드
▶ pip uninstall <라이브러리 이름> # 제거
맥, 리눅스 유저의 경우, terminal에 위 명령어 입력.

7
Warming up
윈도우 유저의 경우, ASCII 에러가 날 수 있습니다. 이 경우 아래의 내용을
수정해 주세요.
▶ 텍스트 에디터로 다음 파일 열기 ‘C:AnacondaLibsite.py’
▶ 아래 내용을 찾아
def setencoding():
encoding = “ascii"
▶ 아래처럼 변경
def setencoding():
encoding = “mbcs"

8
IPython
Interactive Computing
IPython
▶ IPython is a command shell for interactive computing in multiple
programming languages, originally developed for the Python
programming language, that offers enhanced introspection, rich
media, additional shell syntax, tab completion, and rich history.
▶ IPython Notebook is a web-based interactive computational
environment for creating IPython notebooks. An IPython
notebook is a JSON document containing an ordered list of
input/output cells which can contain code, text, mathematics,
plots and rich media.

9
IPython
실행방법
1. cmd
2. mkdir test
3. cd test
4. ipython notebook
If your logo is , command ‘pip install -U ipython’ for version 3(jupyter)

10
IPython
새로운 ipython notebook 만들기

11
IPython

12
IPython

13
IPython

14
IPython

15
IPython

16
IPython

17
IPython
Intere
IPython을 배우는 방법
▶ https://github.com/jrjohansson/scientific-python-lectures

18
큰 그림 Big Picture
▶ Python : 프로그래밍 언어
▶ IPython : 인터랙티브 컴퓨팅을 위한 도구
▶ NumPy/SciPy : 수치 계산을 위한 라이브러리
▶ pandas : 데이터 구조를 생성하고 데이터를 분석하기 위한
라이브러리
▶ matplotlib : 과학적 그림을 생성하기 위한 라이브러리

19
Numpy (Numerical Python)
1 import numpy as np
2 import timeit
3 a = np.arange(1e7)
4 b = list(a)
5 tic = timeit.default_timer()
6 a = a*1.1
7 toc = timeit.default_timer()
8 print toc-tic
9 0.029629945755
10 tic = timeit.default_timer()
11 for index, value in enumerate(b):
12 b[index] = value*1.1
13 toc = timeit.default_timer()
14 print toc-tic
15 1.82178592682
사용 방법에 따라, ndarray의 연산 속도는 list()보다 훨씬 빠름.

20
1 np.array([0, 1, 2])
2 np.zeros(5) # 0인 원소를 5개 갖는 배열 생성
3 np.zeros((2,2))+1 # 2x2 형식의 1인 원소들을 갖는 배열 생성
4 np.zeros((5,5,5)).astype(int) # 정수 타입의 원소를 갖는 배열 생성
5 np.zeros((5,5,5)).astype(np.float16) # 16비트 부동소수점
6 np.zeros((5,5,5), dtype=np.float32) # 32비트 부동소수점
7 np.ones(5) # 1인 원소를 5개 갖는 배열 생성
8 np.arange(5, 10) # range(5, 10)을 ndarray로 생성
9 np.linspace(0, 1, 100) # 0부터 1 사이를 100개로 나눈 배열
10 np.logspace(0, 1, 100, base=10.) # log scale로 배열 생성

21
1 a1 = np.arange(9)
2 a1
3 array([0, 1, 2, 3, 4, 5, 6, 7, 8])
4 a2 = a1.reshape((3,3))
5 a2
6 array([[0, 1, 2],
7 [3, 4, 5],
8 [6, 7, 8]])
9 a3[2,2]
10 8
11 a3[2,2] = 1
12 a1
13 array([0, 1, 2, 3, 4, 5, 6, 7, 1])
14 a3 = np.copy(a3)
15 a3[2,2] = 5
16 a1
17 array([0, 1, 2, 3, 4, 5, 6, 7, 1])

22
1 l = [[1,2], [3,4]]
2 a = np.array(l)
3 a
4 array([[1, 2],
5 [3, 4]])
6 a[0, :]
7 array([1, 2])
8 a[:, 1]
9 array([2, 4])
10 a*a
11 array([[ 1, 4],
12 [ 9, 16]])
13 a**3
14 array([[ 1, 8],
15 [27, 64]])

23
1 a
2 array([[1, 2],
3 [3, 4]])
4 a>2
5 array([[False, False],
6 [ True, True]], dtype=bool)
7 np.where(a>2)
8 (array([1, 1]), array([0, 1]))

24
1 arr = np.zeros((10, 10))+3
2 arr[1:-1, 1:-1] = 5
3 arr[4:-4, 4:-4] = 1
4 arr
5 array([[ 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
6 [ 3., 5., 5., 5., 5., 5., 5., 5., 5., 3.],
7 [ 3., 5., 5., 5., 5., 5., 5., 5., 5., 3.],
8 [ 3., 5., 5., 5., 5., 5., 5., 5., 5., 3.],
9 [ 3., 5., 5., 5., 1., 1., 5., 5., 5., 3.],
10 [ 3., 5., 5., 5., 1., 1., 5., 5., 5., 3.],
11 [ 3., 5., 5., 5., 5., 5., 5., 5., 5., 3.],
12 [ 3., 5., 5., 5., 5., 5., 5., 5., 5., 3.],
13 [ 3., 5., 5., 5., 5., 5., 5., 5., 5., 3.],
14 [ 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.]])
15 arr[arr > 2] = 0
16 arr
17 array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
18 [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],

25
Numpy : Linear Algebra
1 a = np.array([[1,2,3], [4,5,6], [7,8,9]])
2 b = np.array([1,1,1])
3 np.linalg.inv(a) # Inverse
4 a.dot(b) # Dot Product
5 np.linalg.inv(a).dot(b)

26
SciPy
1 import scipy
2 scipy.__version__
3 0.14.0
pip install -U scipy # install for scipy-0.16.0
1 reload(scipy)
2 scipy.__version__
3 0.16.0

27
SciPy
I(a, b) =
1
0
ax2
+ b dx
1 from scipy.integrate import quad
2 def integrand(x, a, b):
3 return a * x ** 2 + b
4 a = 2
5 b = 1
6 I = quad(integrand, 0, 1, args=(a,b))
7 I
8 (2.0, 2.220446049250313e-14)
http://docs.scipy.org/doc/scipy/reference/tutorial/integrate.html

28
pandas
pandas 개발자가 원하던 기능
▶ 자동적으로 혹은 명시적으로 축의 이름에 따라 데이터를 정렬할 수
있는 자료구조. 잘못 정렬된 데이터에 의한 일반적인 오류를
예방하고 다양한 소스에서 가져온 다양한 방식으로 색인되어 있는
데이터를 다룰 수 있는 기능.
▶ 통합된 시계열 기능
▶ 시계열 데이터와 비시계열 데이터를 함께 다룰 수 있는 통합 자료
구조
▶ 산술연산과 한 축의 모든 값을 더하는 등의 데이터 축약연산은 축의
이름 같은 메타데이터로 전달될 수 있어야 함
▶ 누락된 데이터를 유연하게 처리할 수 있는 기능
▶ SQL 같은 일반 데이터베이스철머 데이터를 합치고 관계연산을
수행하는 기능
Ref: (link) 파이썬 라이브러리를 활용한 데이터 분석, Python for Data Analysis

29
pandas
▶ http://nbviewer.ipython.org/urls/gist.github.com/wesm/
4757075/raw/a72d3450ad4924d0e74fb57c9f62d1d895ea4574/
PandasTour.ipynb

Reference : IPython Interactive Computing and Visualization
Cookbook
https://www.packtpub.com/big-data-and-business-intelligence/ipython-interactive-computing-and-visualization-cookbook
(https://www.packtpub.com/big-data-and-business-intelligence/ipython-interactive-computing-and-visualization-cookbook)
In [1]: %pylab inline
In [2]: import pandas as pd
import pandas.io.data
import datetime
Logistic Map
In [3]: def f(a, x):
""" Logistic Function """
return a*x*(1-x)
In [4]: init = x = 0.2
values = np.array(init)
for i in xrange(100):
x = f(4, x)
values = np.append(values, x)
Populating the interactive namespace from numpy and matplotlib

In [5]: plt.plot(values, '.-')
to pandas Series
In [6]: obj = pd.Series(values)
In [7]: obj.values[:5]
In [8]: obj.index[:5]
Out[5]: [<matplotlib.lines.Line2D at 0x7f847c1e8250>]
Out[7]: array([ 0.2 , 0.64 , 0.9216 , 0.28901376, 0.82193923])
Out[8]: Int64Index([0, 1, 2, 3, 4], dtype='int64')

In [9]: obj.plot()
Bifurcation of Logistic Map
In [10]: n = 10000
r = np.linspace(2.5, 4.0, n)
In [11]: iterations = 1000
last = 100
In [12]: x = 1e-5 * np.ones(n)
In [13]: lyapunov = np.zeros(n)
In [14]: x = f(3, x)
Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0x7f847c1e8890>

In [15]: f(3, x)
In [16]: r, x
In [23]: plt.figure(figsize=(15, 10))
plt.subplot(211)
for i in range(iterations):
x = f(r, x)
# We compute the partial sum of the
# Lyapunov exponent.
lyapunov += np.log(abs(r-2*r*x))
# We display the bifurcation diagram.
if i >= (iterations - last):
plt.plot(r, x, '.k', markersize=1, alpha=.1)
plt.xlim(2.5, 4)
plt.title("Bifurcation diagram")
# We display the Lyapunov exponent.
plt.subplot(212)
plt.plot(r[lyapunov<0], lyapunov[lyapunov<0]/iterations, '.k', markersize=2)
plt.plot(r[lyapunov>=0], lyapunov[lyapunov>=0]/iterations, '.r', markersize=2)
plt.xlim(2.5, 4)
plt.title("Lyapunov exponent")
Out[15]: array([ 8.99964001e-05, 8.99964001e-05, 8.99964001e-05, ...,
8.99964001e-05, 8.99964001e-05, 8.99964001e-05])
Out[16]: (array([ 2.5 , 2.50015002, 2.50030003, ..., 3.99969997,
3.99984998, 4. ]),
array([ 2.99997000e-05, 2.99997000e-05, 2.99997000e-05, ...,
2.99997000e-05, 2.99997000e-05, 2.99997000e-05]))
Out[23]: <matplotlib.text.Text at 0x7f8478f6d190>

Ordinary Differential Equations(ODEs)

In [24]: m = 1.
k = 1.
g = 9.81
v0 = np.zeros(4) # initial position is (0, 0)
# The initial speed vector is oriented to the top right
v0[2] = 4.
v0[3] = 10.
In [25]: def f(v, t0, k):
# v has four components v=[u, u']
u, udot = v[:2], v[2:]
# compute the second derivative u'' of u
udotdot = -k/m * udot
udotdot[1] -= g
# return v'=[u', u'']
return np.r_[udot, udotdot]

In [26]: from scipy import integrate
# evaluate the system on 30 linearly spaced times t=[0, 3]
t = np.linspace(0., 3., 30)
# simulate the system for different values of k
for k in np.linspace(0., 1., 5):
# simulate the system and evaluate v on the given times
v = integrate.odeint(f, v0, t, args=(k,))
# plot the particle's trajectory
plt.plot(v[:,0], v[:,1], 'o-', mew=1, ms=8, mec='w',
label='k={0:.1f}'.format(k))
plt.legend()
plt.xlim(0, 12)
Partial Differential Equation(PDEs)
reaction-diffusion systems and Turing patterns
Out[26]: (0, 12)

on the domain
In [274]: a = 2.8e-4
b = 5e-3
tau = .1
k = -.005
In [275]: size = 80 #size of the 2D grid
dx = 2./size # space step
T = 10. # total time
dt = .9 * dx**2/2 # time step
n = int(T/dt)
In [276]: U = np.random.rand(size, size)
V = np.random.rand(size, size)
In [278]: def laplacian(Z):
Ztop = Z[0:-2, 1:-1]
Zleft = Z[1:-1, 0:-2]
Zbottom = Z[2:, 1:-1]
Zright = Z[1:-1, 2:]
Zcenter = Z[1:-1, 1:-1]
return (Ztop + Zleft + Zbottom + Zright
-4 * Zcenter) / dx**2

In [281]: for i in xrange(n):
# compute the laplacian of u and v
deltaU = laplacian(U)
deltaV = laplacian(V)
# take the values of u and v
# inside the grid
Uc = U[1:-1, 1:-1]
Vc = V[1:-1, 1:-1]
# update the variables
U[1:-1, 1:-1], V[1:-1, 1:-1] = (
Uc + dt * (a*deltaU + Uc - Uc**3 - Vc + k),
Vc + dt * (b*deltaV + Uc - Vc) / tau
)
# neumann conditions, derivatives at the edges are null
for Z in (U, V):
Z[0, :] = Z[1, :]
Z[-1, :] = Z[-2, :]
Z[:, 0] = Z[:, 1]
Z[:, -1] = Z[:, -2]

In [291]: plt.imshow(U, cmap=plt.cm.copper, extent=[-1,1,-1,1])
Out[291]: <matplotlib.image.AxesImage at 0x7f3dd831e990>

IPython

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a IPython

Semelhante a IPython (20)

Mais de Kyunghoon Kim

Mais de Kyunghoon Kim (20)

IPython