O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

2015 bioinformatics python_introduction_wim_vancriekinge_vfinal

Carregando em…3

Confira estes a seguir

1 de 81 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)


Semelhante a 2015 bioinformatics python_introduction_wim_vancriekinge_vfinal (20)

Mais de Prof. Wim Van Criekinge (20)


Mais recentes (20)

2015 bioinformatics python_introduction_wim_vancriekinge_vfinal

  1. 1. FBW 29-09-2015 Wim Van Criekinge
  2. 2. Bioinformatics.be
  3. 3. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  4. 4. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  5. 5. What is Python ? • Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. • Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. • Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed. • When he began implementing Python, Guido van Rossum was also reading the published scripts from “Monty Python's Flying Circus”, a BBC comedy series from the 1970s. Van Rossum thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python.
  6. 6. Programming Language • Formal notation for specifying computations – Syntax (usually specified by a context-free grammar) – Semantics for each syntactic construct – Practical implementation on a real or virtual machine • Compilation vs. interpretation • Efficiency vs. portability • Assembly Languages – Invented by machine designers the early 1950s – Reusable macros and subroutines
  7. 7. FORTRAN • Procedural, imperative language – Still used in scientific computation • Developed at IBM in the 1950s by John Backus (1924-2007) – Backus’s 1977 Turing award lecture made the case for functional programming – On FORTRAN: “We did not know what we wanted and how to do it. It just sort of grew. The first struggle was over what the language would look like. Then how to parse expressions – it was a big problem…” • BNF: Backus-Naur form for defining context-free grammars
  8. 8. LISP • Invented by John McCarthy (b. 1927, Turing award: 1971) • Formal notation for lambda-calculus • Pioneered many PL concepts – Automated memory management (garbage collection) – Dynamic typing – No distinction between code and data • Still in use: ACL2, Scheme, … “Anyone could learn Lisp in one day, except that if they already knew FORTRAN, it would take three days” - Marvin Minsky
  9. 9. PASCAL • Designed by Niklaus Wirth – 1984 Turing Award • Revised type system of Algol – Good data structure concepts • Records, variants, subranges – More restrictive than Algol 60/68 • Procedure parameters cannot have procedure parameters • Popular teaching language • Simple one-pass compiler
  10. 10. C • Bell Labs 1972 (Dennis Ritchie) • Development closely related to UNIX – 1983 Turing Award to Thompson and Ritchie • Compiles to native code • 1973-1980: new features; compiler ported – unsigned, long, union, enums • 1978: K&R C book published • 1989: ANSI C standardization – Function prototypes as in C++ • 1999: ISO 9899:1999 also known as “C99” – Inline functions, C++-like decls, bools, variable arrays • Concurrent C, Objective C, C*, C++, C# • “Portable assembly language” – Early C++, Modula-3, Eiffel source-translated to C
  11. 11. JAVA • Sun 1991-1995 (James Gosling) – Originally called Oak, intended for set top boxes • Mixture of C and Modula-3 – Unlike C++ • No templates (generics), no multiple inheritance, no operator overloading – Like Modula-3 (developed at DEC SRC) • Explicit interfaces, single inheritance, exception handling, built-in threading model, references & automatic garbage collection (no explicit pointers!) • “Generics” added later
  12. 12. Other Important Languages • Algol-like – Modula, Oberon, Ada • Functional – ISWIM, FP, SASL, Miranda, Haskell, LCF, ML, Caml, Ocaml, Scheme, Common LISP • Object-oriented – Smalltalk, Objective-C, Eiffel, Modula-3, Self, C#, CLOS • Logic programming – Prolog, Gödel, LDL, ACL2, Isabelle, HOL
  13. 13. … and more • Data processing and databases – Cobol, SQL, 4GLs, XQuery • Systems programming – PL/I, PL/M, BLISS • Specialized applications – APL, Forth, Icon, Logo, SNOBOL4, GPSS, Visual Basic • Concurrent, parallel, distributed – Concurrent Pascal, Concurrent C, C*, SR, Occam, Erlang, Obliq
  14. 14. … and more • Programming tool “mini-languages” – awk, make, lex, yacc, autoconf … • Command shells, scripting and “web” languages – sh, csh, tcsh, ksh, zsh, bash … – Perl, JavaScript, PHP, Python, Rexx, Ruby, Tcl, AppleScript, VBScript … • Web application frameworks and technologies – ASP.NET, AJAX, Flash, Silverlight … • Note: HTML/XML are markup languages, not programming languages, but they often embed executable scripts like Active Server Pages (ASPs) & Java Server Pages (JSPs)
  15. 15. What is scripting ? • Wikipedia has an informative and detailed explanation, “A scripting language, script language or extension language is a programming language that allows control of one or more software applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the end-user.[1] Scripts are often interpreted from source code or bytecode, whereas the applications they control are traditionally compiled to native machine code. Scripting languages are nearly always embedded in the applications they control.[2] • The name "script" is derived from the written script of the performing arts, in which dialogue is set down to be spoken by human actors. Early script languages were often called batch languages or job control languages. Such early scripting languages were created to shorten the traditional edit-compile-link-run process”.
  16. 16. What’s Driving Their Evolution? • Constant search for better ways to build software tools for solving computational problems – Many PLs are general purpose tools – Others are targeted at specific kinds of problems • For example, massively parallel computations or graphics • Useful ideas evolve into language designs – Algol  Simula  Smalltalk  C with Classes  C++ • Often design is driven by expediency – Scripting languages: Perl, Tcl, Python, PHP, etc. • “PHP is a minor evil perpetrated by incompetent amateurs, whereas Perl is a great and insidious evil, perpetrated by skilled but perverted professionals.” - Jon Ribbens
  17. 17. What Do They Have in Common? • Lexical structure and analysis – Tokens: keywords, operators, symbols, variables – Regular expressions and finite automata • Syntactic structure and analysis – Parsing, context-free grammars • Pragmatic issues – Scoping, block structure, local variables – Procedures, parameter passing, iteration, recursion – Type checking, data structures • Semantics – What do programs mean and are they correct
  18. 18. Visual history of programming languages http://cdn.oreillystatic.com/news/graphics/prog_lang_poster.pdf
  19. 19. The most valuable programming skills to have on a resume
  20. 20. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  21. 21. Python • Programming languages are overrated – If you are going into bioinformatics you probably learn/need multiple – If you know one you know 90% of a second • Choice does matter but it matters far less than people think it does • Why Python? – Lets you start useful programs asap – Build-in libraries – incl BioPython – Free, most platforms, widely (scientifically) used • Versus Perl? – Incredibly similar – Consistent syntax, indentation
  22. 22. http://www.python.org
  23. 23. Should I use Python 2 or Python 3 for my development activity? • Short version: Python 2.x is legacy, Python 3.x is the present and future of the language • Python 3.0 was released in 2008. The final 2.x version 2.7 release came out in mid- 2010, with a statement of extended support for this end-of-life release. The 2.x branch will see no new major releases after that. 3.x is under active development and has already seen over five years of stable releases, including version 3.3 in 2012 and 3.4 in 2014. This means that all recent standard library improvements, for example, are only available by default in Python 3.x. • Guido van Rossum (the original creator of the Python language) decided to clean up Python 2.x properly, with less regard for backwards compatibility than is the case for new releases in the 2.x range. The most drastic improvement is the better Unicode support (with all text strings being Unicode by default) as well as saner bytes/Unicode separation. • Besides, several aspects of the core language (such as print and exec being statements, integers using floor division) have been adjusted to be easier for newcomers to learn and to be more consistent with the rest of the language, and old cruft has been removed (for example, all classes are now new-style, "range()" returns a memory efficient iterable, not a list as in 2.x). • The What's New in Python 3.0 document provides a good overview of the major language changes and likely sources of incompatibility with existing Python 2.x code. Nick Coghlan (one of the CPython core developers) has also created a relatively extensive FAQ regarding the transition. • However, the broader Python ecosystem has amassed a significant amount of quality software over the years. The downside of breaking backwards compatibility in 3.x is that some of that software (especially in-house software in companies) still doesn't work on 3.x yet.
  24. 24. How to install ? • On windows you’ll need administrator right  • Portable python distribution ? Takes 500Mb and >2 hours 
  25. 25. Version 2.7 and 3.4 on http://athena.ugent.be
  26. 26. Interactive “Shell” • Great for learning the language • Great for experimenting with the library • Great for testing your own modules • Two variations: IDLE (GUI), python (command line) • Type statements or expressions at prompt: >>> print "Hello, world" Hello, world >>> x = 12**2 >>> x/2 72 >>> # this is a comment
  27. 27. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  28. 28. IDE: Integrated Development Environment • You type scripts using can use notepad(++) • Better: PyCharm – available for free on most OS but you need to be administrator to install  • We will use Eclipse in combination with PyDev
  29. 29. What is Eclipse? • Eclipse started as a proprietary IBM product (IBM Visual age for Smalltalk/Java) – Embracing the open source model IBM opened the product up • Open Source – It is a general purpose open platform that facilitates and encourages the development of third party plug- ins • Best known as an Integrated Development Environment (IDE) – Provides tools for coding, building, running and debugging applications • Originally designed for Java, now supports many other languages – Good support for C, C++ – Python, PHP, Ruby, etc…
  30. 30. Prerequisites for Running Eclipse • Eclipse is written in Java and will thus need an installed JRE or JDK in which to execute – JDK recommended
  31. 31. Selecting a Workspace • In Eclipse, all of your code will live under a workspace • A workspace is nothing more than a location where we will store our source code and where Eclipse will write out our preferences • Eclipse allows you to have multiple workspaces – each tailored in its own way • Choose a location where you want to store your files, then click OK
  32. 32. Eclipse IDE Components Menubars Full drop down menus plus quick access to common functions Editor Pane This is where we edit our source code Perspective Switcher We can switch between various perspectives here Outline Pane This contains a hierarchical view of a source file Package Explorer Pane This is where our projects/files are listed Miscellaneous Pane Various components can appear in this pane – typically this contains a console and a list of compiler problems Task List Pane This contains a list of “tasks” to complete PYTHON
  33. 33. PyDev: Python plug-in for Eclipse • Syntax highlighting • Debugger • Code completion • An extensive preference menu that can be used to edit the plug-in’s attributes and options.
  34. 34. Installation  The plug-in can be installed through Software Updates:
  35. 35. Setting Up In Eclipse, go to: Window, Preferences, PyDev, Interpreter-Python, and click New. Select the python.exe file in the Python directory, click OK and OK in the Preferences window again. Wait for the creating procedure to finish.
  36. 36. Create Python Project and File Click on File, New, choose File, click on Python project folder, write the file name ending in a .py, and click Finish. Go to File, New, Project, select Pydev,Python Project, click Next, write name, choose Python version, and click Finish.
  37. 37. Running Python To run Python code click on Run, Run As, and select Python Run.
  38. 38. Lets try for “Hello World!” from athena.ugent.be
  39. 39. Where is the workspace ?
  40. 40. Make PyDev Project
  41. 41. Which Python interpreter is used … check Preferences or run version.py
  42. 42. Create new file …
  43. 43. … Hello_world.py
  44. 44. Run Hello_world.py
  45. 45. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello World PIthon
  46. 46. git is an open source, distributed version control system designed for speed and efficiency
  47. 47. Git: A distributed version control system • Version control (or revision control, or source control) is all about managing multiple versions of documents, programs, web sites, etc. – Almost all “real” projects use some kind of version control – Essential for team projects, but also very useful for individual projects • Some well-known version control systems are CVS, Subversion, Mercurial, and Git – CVS and Subversion use a “central” repository; users “check out” files, work on them, and “check them in” – Mercurial and Git treat all repositories as equal • Distributed systems like Mercurial and Git are newer and are gradually replacing centralized systems like CVS and Subversion
  48. 48. Why version control? • For working by yourself: – Gives you a “time machine” for going back to earlier versions – Gives you great support for different versions (standalone, web app, etc.) of the same basic project • For working with others: – Greatly simplifies concurrent work, merging changes • For getting an internship or job: – Any company with a clue uses some kind of version control – Companies without a clue are bad places to work
  49. 49. Why Git? • Git has many advantages over earlier systems such as CVS and Subversion – More efficient, better workflow, etc. – See the literature for an extensive list of reasons – Of course, there are always those who disagree • It works from with Eclipse, also when started from athena 
  50. 50. No Network needed for (almost) everything is local • Performing a diff • Viewing file history • Committing changes • Merging branches • Obtaining any other revision of a file • Switching branches
  51. 51. GitHub: Hosted GIT • Largest open source git hosting site • Public and private options • User-centric rather than project-centric • http://github.ugent.be (use your Ugent login and password) – Accept invitation from Bioinformatics-I- 2015 URI: – https://github.ugent.be/Bioinformatics-I- 2015/Python.git
  52. 52. GitHub: Hosted GIT
  53. 53. GitHub: Hosted GIT
  54. 54. GitHub: Hosted GIT
  55. 55. Typical workflow Person A  Setup project & repo  push code onto github  edit/commit  edit/commit  pull/push Person B •clone code from github •edit/commit/push •edit… •edit… commit •pull/push This is just the flow, specific commands on following slides. It’s also possible to create your project first on github, then clone (i.e., no git init)
  56. 56. GitHub: Hosted GIT
  57. 57. GitHub: Hosted GIT
  58. 58. GitHub: Hosted GIT
  59. 59. GitHub: Hosted GIT
  60. 60. GitHub: Hosted GIT URI (Uniform Resource Identifier): https://github.ugent.be/Bioinformatics-I-2015/Python.git
  61. 61. GitHub: Hosted GIT
  62. 62. GitHub: Hosted GIT
  63. 63. GitHub: Hosted GIT
  64. 64. GitHub: Hosted GIT
  65. 65. GitHub: Hosted GIT
  66. 66. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello_World.py PI-thon.py
  67. 67. Hello_world.py
  68. 68. Overview What is Python ? Why Python 4 Bioinformatics ? How to Python IDE: Eclipse & PyDev / Athena Code Sharing: Git(hub) Examples Hello_World.py PI-thon.py
  69. 69. Variables • No need to declare • Need to assign (initialize) • use of uninitialized variable raises exception • Not typed if friendly: greeting = "hello world" else: greeting = 12**2 print greeting • Everything is a "variable": • Even functions, classes, modules
  70. 70. Numbers • The usual suspects • 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x), 0<x<=5 • C-style shifting & masking • 1<<16, x&0xff, x|1, ~x, x^y • Integer division truncates :-( • 1/2 -> 0 # 1./2. -> 0.5, float(1)/2 -> 0.5 • Will be fixed in the future • Long (arbitrary precision), complex • 2L**100 -> 1267650600228229401496703205376L – In Python 2.2 and beyond, 2**100 does the same thing • 1j**2 -> (-1+0j)
  71. 71. Control Structures if condition: statements [elif condition: statements] ... else: statements while condition: statements for var in sequence: statements break continue
  72. 72. Example Function def gcd(a, b): "greatest common divisor" while a != 0: a, b = b%a, a # parallel assignment return b >>> gcd.__doc__ 'greatest common divisor' >>> gcd(12, 20) 4
  73. 73. How to generate random numbers The standard random module implements a random number generator. import random print (random.random()) This prints a random floating point number in the range [0, 1) (that is, between 0 and 1, including 0.0 but always smaller than 1.0). There are also many other specialized generators in this module, such as: randrange(a, b) chooses an integer in the range [a, b). uniform(a, b) chooses a floating point number in the range [a, b). normalvariate(mean, sdev) samples the normal (Gaussian) distribution. Some higher-level functions operate on sequences directly, such as: choice(S) chooses a random element from a given sequence (the sequence must have a known length). shuffle(L) shuffles a list in-place, i.e. permutes it randomly There’s also a Random class you can instantiate to create independent multiple random number generators.
  74. 74. First program: PI-thon.py • How good are the random numbers ? • If they are good, you should be able to “measure” PI
  75. 75. Measure Pi with two random numbers …. many of them … 1 x y
  76. 76. Python Videos http://python.org/ - documentation, tutorials, beginners guide, core distribution, ... Books include:  Learning Python by Mark Lutz  Python Essential Reference by David Beazley  Python Cookbook, ed. by Martelli, Ravenscroft and Ascher