O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Python - the basics
Python - the basics
Carregando em…3
×

Confira estes a seguir

1 de 89 Anúncio

Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011

Baixar para ler offline

An introduction to Python in science and engineering.

The presentation was given by Dr Edward Schofield of Python Charmers (www.pythoncharmers.com) to A*STAR and the Singapore Computational Sciences Club in June 2011.

An introduction to Python in science and engineering.

The presentation was given by Dr Edward Schofield of Python Charmers (www.pythoncharmers.com) to A*STAR and the Singapore Computational Sciences Club in June 2011.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (16)

Anúncio

Semelhante a Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011 (20)

Mais recentes (20)

Anúncio

Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011

  1. 1. Python for Science and Engineering Dr Edward Schofield A*STAR / Singapore Computational Sciences Club Seminar June 14, 2011
  2. 2. Scientific programming in 2011 Most scientists and engineers are: programming for 50+% of their work time (and rising) self-taught programmers using inefficient programming practices using the wrong programming languages: C++, FORTRAN, C#, PHP, Java, ...
  3. 3. Scientific programming needs Rapid prototyping Efficiency for computational kernels Pre-written packages! Vectors, matrices, modelling, simulations, visualisation Extensibility; web front-ends; database backends; ...
  4. 4. Ed's story: How I found Python PhD in statistical pattern recognition: 2001-2006 Needed good tools for my research! Discovered Python in 2002 after frustration with C++, Matlab, Java, Perl Contributed to NumPy and SciPy: maxent, sparse matrices, optimization, Monte Carlo, etc. Managed six releases of SciPy in 2005-6
  5. 5. 1. Why Python?
  6. 6. Introducing Python What is it? What is it good for? Who uses it?
  7. 7. What is Python? interpreted strongly but dynamically typed object-oriented intuitive, readable open source, free ‘batteries included’
  8. 8. ‘batteries included’ Python’s standard library is: very large well-supported well-documented
  9. 9. Python’s standard library data types strings networking threads operating compression GUI arguments system complex CGI FTP cryptography numbers testing multimedia databases CSV files calendar email XML serialization
  10. 10. What is an efficient programming language? Native Python code executes 10x more slowly than C and FORTRAN
  11. 11. Would you build a racing car ... ... to get to Kuala Lumpur ASAP?
  12. 12. Date Cost per GFLOPS (US $) Technology 1961 US $1.1 trillion 17 million IBM 1620s 1984 US $15,000,000 Cray X-MP Two 16-CPU clusters of 1997 US $30,000 Pentiums 2000, Apr $1000 Bunyip Beowulf cluster 2003, Aug $82 KASY0 2007, Mar $0.42 Ambric AM2045 2009, Sep $0.13 ATI Radeon R800 Source: Wikipedia: “FLOPS”
  13. 13. Unit labor cost growth Proxy for cost of programmer time
  14. 14. Efficiency When FORTRAN was invented, computer time was more expensive than programmer time. In the 1980s and 1990s that reversed.
  15. 15. Efficient programming Python code is 10x faster to write than C and FORTRAN
  16. 16. What if ... ... you now need to reach Sydney?
  17. 17. Advantages of Python Easy to write Easy to maintain Great standard libraries Thriving ecosystem of third-party packages Open source
  18. 18. ‘Batteries included’ Python’s standard library is: very large well supported well documented
  19. 19. Python’s standard library data types strings networking threads operating compression GUI arguments system complex CGI FTP cryptography numbers testing multimedia databases CSV files calendar email XML serialization
  20. 20. Question What is the date 177 days from now?
  21. 21. Natural applications of Python Rapid prototyping Plotting, visualisation, 3D Numerical computing Web and database programming All-purpose glue
  22. 22. Python vs other languages
  23. 23. Languages used at CSIRO Python Fortran Java Matlab C VB.net IDL C++ R Perl C# +5-10 others!
  24. 24. Which language do I choose? A different language for each task? A language you know? A language others in your team are using: support and help?
  25. 25. Python Matlab Interpreted Yes Yes Powerful data input/output Yes Yes Great plotting Yes Yes General-purpose language Powerful Limited Cost Free $$$ Open source Yes No
  26. 26. Python C++ Powerful Yes Yes Portable Yes In theory Standard libraries Vast Limited Easy to write and maintain Yes No Easy to learn Yes No
  27. 27. Python C Fast to write Yes No Good for embedded systems, device No Yes drivers and operating systems Good for most other high-level tasks Yes No Standard library Vast Limited
  28. 28. Python Java Powerful, well-designed language Yes Yes Standard libraries Vast Vast Easy to learn Yes No Code brevity Short Verbose Easy to write and maintain Yes Okay
  29. 29. Open source Python is open source software Benefits: No vendor lock-in Cross-platform Insurance against bugs in the platform Free
  30. 30. Python success stories Computer graphics: Industrial Light & Magic Web: Google: News, Groups, Maps, Gmail Legacy system integration: AstraZeneca - collaborative drug discovery
  31. 31. Python success stories (2) Aerospace: NASA Research: universities worldwide ... Others: YouTube, Reddit, BitTorrent, Civilization IV,
  32. 32. Industrial Light & Magic Python spread from scripting to the entire production pipeline Numerous reviews since 1996: Python is still the best tool for them
  33. 33. United Space Alliance A common sentiment: “We achieve immediate functioning code so much faster in Python than in any other language that it’s staggering.” - Robin Friedrich, Senior Project Engineer
  34. 34. Case study: air-traffic control Eric Newton, “Python for Critical Applications”: http:// metaslash.com/brochure/ recall.html Metaslash, Inc: 1999 to 2001 Mission-critical system for air-traffic control Replicated, fault-tolerant data storage
  35. 35. Case study: air-traffic control Python prototype -> C++ implementation -> Python again Why? C++ dependencies were buggy C++ threads, STL were not portable enough Python’s advantages over C++ More portable 75% less code: more productivity, fewer bugs
  36. 36. More case studies See http://www.python.org/about/success/ for lots more case studies and success stories
  37. 37. 2. The scientific Python ecosystem
  38. 38. Scientific software development Small beginnings Piecemeal growth, quirky interfaces ... Large, cumbersome systems
  39. 39. NumPy An n-dimensional array/matrix package
  40. 40. NumPy Centre of Python’s numerical computing ecosystem
  41. 41. NumPy The most fundamental tool for numerical computing in Python Fast multi-dimensional array capability
  42. 42. What NumPy defines: Two fundamental objects: 1. n-dimensional array 2. universal function a rich set of numerical data types nearly 400 functions and methods on arrays: type conversions mathematical logical
  43. 43. NumPy's features Fast. Written in C with BLAS/LAPACK hooks. Rich set of data types Linear algebra: matrix inversion, decompositions, … Discrete Fourier transforms Random number generation Trig, hypergeometric functions, etc.
  44. 44. Elementwise array operations Loops are mostly unnecessary Operate on entire arrays! >>> a = numpy.array([20, 30, 40, 50]) >>> a < 35 array([True, True, False, False], dtype=bool) >>> b = numpy.arange(4) >>> a - b array([20, 29, 38, 47]) >>> b**2 array([0, 1, 4, 9])
  45. 45. Universal functions NumPy defines 'ufuncs' that operate on entire arrays and other sequences (hence 'universal') Example: sin() >>> a = numpy.array([20, 30, 40, 50]) >>> c = 10 * numpy.sin(a) >>> c array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
  46. 46. Array slicing Arrays can be sliced and indexed powerfully: >>> a = numpy.arange(10)**3 >>> a array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729]) >>> a[2:5] array([ 8, 27, 64])
  47. 47. Fancy indexing Arrays can be used as indices into other arrays: >>> a = numpy.arange(12)**2 >>> ind = numpy.array([ 1, 1, 3, 8, 5 ]) >>> a[ind] array([ 1, 1, 9, 64, 25])
  48. 48. Other linear algebra features Matrix inversion: mat(A).I Or: linalg.inv(A) Linear solvers: linalg.solve(A, x) Pseudoinverse: linalg.pinv(A)
  49. 49. What is SciPy? A community A conference A package of scientific libraries
  50. 50. Python for scientific software Back-end: computational work Front-end: input / output, visualization, GUIs Dozens of great scientific packages exist
  51. 51. Python in science (2) NumPy: numerical / array module Matplotlib: great 2D and 3D plotting library IPython: nice interactive Python shell SciPy: set of scientific libraries: sparse matrices, signal processing, … RPy: integration with the R statistical environment
  52. 52. Python in science (3) Cython: C language extensions Mayavi: 3D graphics, volumetric rendering Nitimes, Nipype: Python tools for neuroimaging SymPy: symbolic mathematics library
  53. 53. Python in science (4) VPython: easy, real-time 3D programming UCSF Chimera, PyMOL, VMD: molecular graphics PyRAF: Hubble Space Telescope interface to RAF astronomical data BioPython: computational molecular biology Natural language toolkit: symbolic + statistical NLP Physics: PyROOT
  54. 54. The SciPy package BSD-licensed software for maths, science, engineering integration signal processing sparse matrices optimization linear algebra maximum entropy interpolation ODEs statistics n-dim image FFTs scientific constants processing C/C++ and Fortran clustering interpolation integration
  55. 55. SciPy optimisation example Fit a model to noisy data: y = a/xb sin(cx)+ε
  56. 56. Example: fitting a model with scipy.optimize Task: Fit a model of the form y = a/bx sin(cx)+ε to noisy data. Spec: 1. Generate noisy data 2. Choose parameters (a, b, c) to minimize sum squared errors 3. Plot the data and fitted model (next session)
  57. 57. SciPy optimisation example import numpy import pylab from scipy.optimize import leastsq def myfunc(params, x): (a, b, c) = params return a / (x**b) * numpy.sin(c * x) true_params = [1.5, 0.1, 2.] def f(x): return myfunc(true_params, x) def err(params, x, y): # error function return myfunc(params, x) - y
  58. 58. SciPy optimisation example # Generate noisy data to fit n = 30; xmin = 0.1; xmax = 5 x = numpy.linspace(xmin, xmax, n) y = f(x) y += numpy.rand(len(x)) * 0.2 * (y.max() - y.min()) v0 = [3., 1., 4.] # initial param estimate # Fitting v, success = leastsq(err, v0, args=(x, y), maxfev=10000) print 'Estimated parameters: ', v print 'True parameters: ', true_params X = numpy.linspace(xmin, xmax, 5 * n) pylab.plot(x, y, 'ro', X, myfunc(v, X)) pylab.show()
  59. 59. SciPy optimisation example Fit a model to noisy data: y = a/xb sin(cx)+ε
  60. 60. Ingredients for this example numpy.linspace numpy.random.rand for the noise model (uniform) scipy.optimize.leastsq
  61. 61. Sparse matrix example Construct and solve a sparse linear system
  62. 62. Sparse matrices Sparse matrices are mostly zeros. They can be symmetric or asymmetric. Sparsity patterns vary: block sparse, band matrices, ... They can be huge! Only non-zeros are stored.
  63. 63. Sparse matrices in SciPy SciPy supports seven sparse storage schemes ... and sparse solvers in Fortran.
  64. 64. Sparse matrix creation To construct a 1000x1000 lil_matrix and add values: >>> from scipy.sparse import lil_matrix >>> from numpy.random import rand >>> from scipy.sparse.linalg import spsolve >>> A = lil_matrix((1000, 1000)) >>> A[0, :100] = rand(100) >>> A[1, 100:200] = A[0, :100] >>> A.setdiag(rand(1000))
  65. 65. Solving sparse matrix systems Now convert the matrix to CSR format and solve Ax=b: >>> A = A.tocsr() >>> b = rand(1000) >>> x = spsolve(A, b) # Convert it to a dense matrix and solve, and check that the result is the same: >>> from numpy.linalg import solve, norm >>> x_ = solve(A.todense(), b) # Compute norm of the error: >>> err = norm(x - x_) >>> err < 1e-10 True
  66. 66. Matplotlib Great plotting package in Python Matlab-like syntax Great rendering: anti-aliasing etc. Many ‘backends’: Cairo, GTK, Cocoa, PDF Flexible output: to EPS, PS, PDF, TIFF, PNG, ...
  67. 67. Matplotlib: worked examples Search the web for 'Matplotlib gallery'
  68. 68. Example: NumPy vectorization 1. Use a Monte Carlo algorithm to estimate π: 1. Generate uniform random variates (x,%y) over [0, 1]. 2. Estimate π from the proportion p that land in the unit circle. 2. Time two ways of doing this: 1. Using for loops 2. Using array operations (vectorized)
  69. 69. 3. Scaling
  70. 70. HPC High-performance computing
  71. 71. Aspects to HPC Supercomputers Distributed clusters / grids Parallel programming Scripting Caches, shared memory Job control Code porting Specialized hardware
  72. 72. Python for HPC Advantages Disadvantages Portability Global interpreter lock Easy scripting, glue Less control than C Maintainability Native loops are slow Profiling to identify hotspots Vectorization with NumPy
  73. 73. Large data sets Useful Python language features: Generators, iterators Useful packages: Great HDF5 support from PyTables!
  74. 74. Hierarchical data Databases without the relational baggage
  75. 75. Great interface for HDF5 data Efficient support for massive data sets
  76. 76. Applications of PyTables aeronautics telecommunications drug discovery data mining financial analysis statistical analysis climate prediction etc.
  77. 77. Breaking news: June 2011 PyTables Pro is now being open sourced. Indexed searches for speed Merging with PyTables Working project name: NewPyTables
  78. 78. PyTables performance OPSI indexing engine speed: Querying 10 billion rows can take hundredths of a second! Target use-case: mostly read-only or append-only data
  79. 79. Principles for efficient code
  80. 80. Important principles 1. "Premature optimization is the root of all evil" Don't write cryptic code just to make it more efficient! 2. 1-5% of the code takes up the vast majority of the computing time! ... and it might not be the 1-5% that you think!
  81. 81. Checklist for efficient code From most to least important: 1. Check: Do you really need to make it more efficient? 2. Check: Are you using the right algorithms and data structures? 3. Check: Are you reusing pre-written libraries wherever possible? 4. Check: Which parts of the code are expensive? Measure, don't guess!
  82. 82. Relative efficiency gains Exponential-order and polynomial-order speedups are possible by choosing the right algorithm for a task. These require the right data structures! These dwarf 10-25x linear-order speedups from: using lower-level languages using different language constructs.
  83. 83. 4. About Python Charmers
  84. 84. The largest Python training provider in South-East Asia Delighted customers include:
  85. 85. Most popular course topics Python for Programmers 3 days Python for Scientists and Engineers 4 days Python for Geoscientists 4 days Python for Bioinformaticians 4 days New courses: Python for Financial Engineers 4 days Python for IT Security Professionals 3 days
  86. 86. Python Charmers: Topics of expertise Python: beginners, advanced Scientific data processing with Python Software engineering with Python Large-scale problems: HPC, huge data sets, grids Statistics and Monte Carlo problems
  87. 87. Python Charmers: Topics of expertise (2) Spatial data analysis / GIS General scripting, job control, glue GUIs with PyQt Integrating with other languages: R, C, C++, Fortran, ... Web development in Django
  88. 88. How to get in touch See PythonCharmers.com or email us at: info@pythoncharmers.com

×