SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
WHY PYTHON IS BETTER
FOR DATA SCIENCE
ÍCARO MEDEIROS
São Paulo Big Data Meetup

São Paulo - SP, 25/11/2015
DATA SCIENTISTS SHOULD DO…
http://berkeleysciencereview.com/article/first-rule-data-science/
WHY PYTHON?
▸ General purpose

▸ Smooth learning curve

▸ REPL (IPython!)

▸ Programmer productivity

▸ Popular and mature

▸ Glue language (high level API, low level C/Fortran bindings)

▸ Science ecosystem (growing!)
PYTHON IS POPULAR: IT MEANS WIDESPREAD KNOWLEDGE AND MANY TOOLS
http://githut.info/
PYTHON IS POPULAR: IT MEANS WIDESPREAD KNOWLEDGE AND MANY TOOLS
pypl.github.io/PYPL.html
AVOID THE TWO LANGUAGE PROBLEM
PYTHON CAN BE USED IN WHOLE DATA SCIENCE WORKFLOW
https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss-2015?slide=22
AUTHOR A MULTISTAGE PROCESSING PIPELINE IN
PYTHON, DESIGN A HYPOTHESIS TEST, PERFORM A
REGRESSION ANALYSIS OVER DATA SAMPLES WITH R,
DESIGN AND IMPLEMENT AN ALGORITHM FOR SOME
DATA-INTENSIVE PRODUCT OR SERVICE IN HADOOP,
OR COMMUNICATE THE RESULTS OF OUR ANALYSES
Jeff Hammerbacher
ONE DAY AT FACEBOOK’S DATA SCIENCE TEAM, A MEMBER COULD…
http://berkeleysciencereview.com/scientific-collaborations-uc-berkeley-data-driven-cover/
OPTIONS FOR PROCESSING PIPELINE
Airflow
https://github.com/airbnb/airflow
https://github.com/spotify/luigi
AIRFLOW EXAMPLE
https://github.com/airbnb/airflow
REGRESSION ANALYSIS IN PYTHON: EASY
http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/ols.html
PYTHON <3 BIG DATA
map reduce in python
pure python HDFS client
fast and general engine for large-scale
data processing
mrjob
http://spark.apache.org
https://github.com/spotify/snakebite
https://pythonhosted.org/mrjob
…
OH, BUT SCALA/JAVA IS FASTER. PYTHON IS 2 *FASTER: [WRITING, RUNNING]
DataFrame operations are optimized and compiled into JVM bytecode
https://databricks.com/blog/2015/04/24/recent-performance-improvements-in-apache-spark-sql-python-
dataframes-and-more.html
RDD AVERAGE: EXAMPLE FROM ‘LEARNING SPARK'
RDD AVERAGE: EXAMPLE FROM ‘LEARNING SPARK'
SO CONCISE
COMMUNICATE RESULTS WITH IPYTHON / JUPYTER
Language agnostic :)
COMMUNICATE RESULTS WITH IPYTHON / JUPYTER
DEMO
TIME
MATPLOTLIB / SEABORN / PLOT.LY / BOKEH: SUCH VISUALIZATION!!
PYTHON FITS ALL!
PYTHON FITS ALL!
PYTHON FOR
SCIENCE IS
GROWING
SCIENCE IS GETTING MORE AND MORE IMPORTANT FOR PYTHON COMMUNITY
# module imports imports/numpy
1 sys 2437939 5.85
2 os 2009086 4.82
3 re 1303009 3.12
4 numpy 416981 1.00
5 warnings 371345 0.89
6 subprocess 344934 0.83
7 django 282097 0.68
8 math 281987 0.68
11 matplotlib 146913 0.35
13 pylab 77817 0.19
14 scipy 69092 0.17
22 pandas 18928 0.05
24 theano 5482 0.051
6/25 MOST POPULAR LIBRARIES ARE FOR DATA SCIENCE
https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement
SCIENCE IS IMPORTANT FOR PYTHON: MATRIX MULTIPLICATION
https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement
import numpy as np
from numpy.linalg import inv, solve
# Using dot function:
S = np.dot((np.dot(H, beta) - r).T,
np.dot(inv(np.dot(np.dot(H, V), H.T)),
np.dot(H, beta) - r))
# With the @ operator
S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
S = ( H β − r ) T ( H V H T ) − 1 ( H β − r )
PEP 0465: PROPOSED FEB/14. SINCE PY 3.5 (SEP/15)
2013: 7 INTERNATIONAL CONFERENCES ON NUMERICAL PYTHON
AT PYCON 2014, ~20% OF THE TUTORIALS INVOLVED THE USE OF MATRICES
SCIENCE STACK IS GETTING BETTER EACH DAY
https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=8
SCIENCE STACK IS ALWAYS EVOLVING…
https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=29
CONDA: AUTOMATING ENVIRONMENTS
https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss-2015?slide=60
THE STACK IS STILL GETTING NEW MEMBERS…
http://www.tensorflow.org/
TAKEAWAY MESSAGE
TRY PYTHON. IT WILL BE
A ONE WAY TRIP!
slides
icaromedeiros.com.br
slideshare.net/icaromedeiros
@icaromedeiros

Mais conteúdo relacionado

Mais procurados

Why Python?
Why Python?Why Python?
Why Python?Adam Pah
 
1901200100000 presentation short term mini project on python
1901200100000 presentation short term mini project on python1901200100000 presentation short term mini project on python
1901200100000 presentation short term mini project on pythonSANTOSHJAISWAL52
 
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...pythoncharmers
 
Python Tutorial | Python Tutorial for Beginners | Python Training | Edureka
Python Tutorial | Python Tutorial for Beginners | Python Training | EdurekaPython Tutorial | Python Tutorial for Beginners | Python Training | Edureka
Python Tutorial | Python Tutorial for Beginners | Python Training | EdurekaEdureka!
 
Python in Data Science Work
Python in Data Science WorkPython in Data Science Work
Python in Data Science WorkRick. Bahague
 
Python: the Project, the Language and the Style
Python: the Project, the Language and the StylePython: the Project, the Language and the Style
Python: the Project, the Language and the StyleJuan-Manuel Gimeno
 
Python Programming Language
Python Programming LanguagePython Programming Language
Python Programming LanguageLaxman Puri
 
Chapter 8 getting started with python
Chapter 8 getting started with pythonChapter 8 getting started with python
Chapter 8 getting started with pythonPraveen M Jigajinni
 
Which programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XIIWhich programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XIIMaggie Petrova
 
Python quick guide1
Python quick guide1Python quick guide1
Python quick guide1Kanchilug
 
Machine learning libraries with python
Machine learning libraries with pythonMachine learning libraries with python
Machine learning libraries with pythonVishalBisht9217
 
Python presentation by Monu Sharma
Python presentation by Monu SharmaPython presentation by Monu Sharma
Python presentation by Monu SharmaMayank Sharma
 
Python Introduction | JNTUA | R19 | UNIT 1
Python Introduction | JNTUA | R19 | UNIT 1 Python Introduction | JNTUA | R19 | UNIT 1
Python Introduction | JNTUA | R19 | UNIT 1 FabMinds
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonYi-Fan Chu
 

Mais procurados (20)

Introduction of python
Introduction of pythonIntroduction of python
Introduction of python
 
Why Python?
Why Python?Why Python?
Why Python?
 
Python final ppt
Python final pptPython final ppt
Python final ppt
 
Python
Python Python
Python
 
1901200100000 presentation short term mini project on python
1901200100000 presentation short term mini project on python1901200100000 presentation short term mini project on python
1901200100000 presentation short term mini project on python
 
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
 
Python Tutorial | Python Tutorial for Beginners | Python Training | Edureka
Python Tutorial | Python Tutorial for Beginners | Python Training | EdurekaPython Tutorial | Python Tutorial for Beginners | Python Training | Edureka
Python Tutorial | Python Tutorial for Beginners | Python Training | Edureka
 
Getting Started with Python
Getting Started with PythonGetting Started with Python
Getting Started with Python
 
Python in Data Science Work
Python in Data Science WorkPython in Data Science Work
Python in Data Science Work
 
Python: the Project, the Language and the Style
Python: the Project, the Language and the StylePython: the Project, the Language and the Style
Python: the Project, the Language and the Style
 
Python Programming Language
Python Programming LanguagePython Programming Language
Python Programming Language
 
Basics of python
Basics of pythonBasics of python
Basics of python
 
Chapter 8 getting started with python
Chapter 8 getting started with pythonChapter 8 getting started with python
Chapter 8 getting started with python
 
Which programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XIIWhich programming language to learn R or Python - MeasureCamp XII
Which programming language to learn R or Python - MeasureCamp XII
 
Python quick guide1
Python quick guide1Python quick guide1
Python quick guide1
 
Machine learning libraries with python
Machine learning libraries with pythonMachine learning libraries with python
Machine learning libraries with python
 
Python presentation by Monu Sharma
Python presentation by Monu SharmaPython presentation by Monu Sharma
Python presentation by Monu Sharma
 
Python Introduction | JNTUA | R19 | UNIT 1
Python Introduction | JNTUA | R19 | UNIT 1 Python Introduction | JNTUA | R19 | UNIT 1
Python Introduction | JNTUA | R19 | UNIT 1
 
Python training
Python trainingPython training
Python training
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 

Semelhante a Why Python is better for Data Science

Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in pythonUmmeSalmaM1
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th juneEdureka!
 
Python PPT
Python PPTPython PPT
Python PPTEdureka!
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.Marcel Caraciolo
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Exploring and Using the Python Ecosystem
Exploring and Using the Python EcosystemExploring and Using the Python Ecosystem
Exploring and Using the Python EcosystemAdam Cook
 
Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Kir Chou
 
Micropython for the iot
Micropython for the iotMicropython for the iot
Micropython for the iotJacques Supcik
 
Pi, Python, and Paintball??? Innovating with Affordable Tech!
Pi, Python, and Paintball??? Innovating with Affordable Tech!Pi, Python, and Paintball??? Innovating with Affordable Tech!
Pi, Python, and Paintball??? Innovating with Affordable Tech!Barry Tarlton
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageEdureka!
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...Edureka!
 
Python 101 For The Net Developer
Python 101 For The Net DeveloperPython 101 For The Net Developer
Python 101 For The Net DeveloperSarah Dutkiewicz
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...Kamila Stępniowska
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitectureSkillspeed
 
Python – The Fastest Growing Programming Language
Python – The Fastest Growing Programming LanguagePython – The Fastest Growing Programming Language
Python – The Fastest Growing Programming LanguageIRJET Journal
 
Python in the Atmospheric sciences
Python in the Atmospheric sciencesPython in the Atmospheric sciences
Python in the Atmospheric sciencesScott Collis
 

Semelhante a Why Python is better for Data Science (20)

Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th june
 
Python PPT
Python PPTPython PPT
Python PPT
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
what is python ?
what is python ? what is python ?
what is python ?
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Exploring and Using the Python Ecosystem
Exploring and Using the Python EcosystemExploring and Using the Python Ecosystem
Exploring and Using the Python Ecosystem
 
Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!
 
Micropython for the iot
Micropython for the iotMicropython for the iot
Micropython for the iot
 
Pi, Python, and Paintball??? Innovating with Affordable Tech!
Pi, Python, and Paintball??? Innovating with Affordable Tech!Pi, Python, and Paintball??? Innovating with Affordable Tech!
Pi, Python, and Paintball??? Innovating with Affordable Tech!
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
 
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Old Dogs and New Tricks
Old Dogs and New TricksOld Dogs and New Tricks
Old Dogs and New Tricks
 
Mag pi18 Citation "PhotoReportage"
Mag pi18 Citation "PhotoReportage"Mag pi18 Citation "PhotoReportage"
Mag pi18 Citation "PhotoReportage"
 
Python 101 For The Net Developer
Python 101 For The Net DeveloperPython 101 For The Net Developer
Python 101 For The Net Developer
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
 
Python – The Fastest Growing Programming Language
Python – The Fastest Growing Programming LanguagePython – The Fastest Growing Programming Language
Python – The Fastest Growing Programming Language
 
Python in the Atmospheric sciences
Python in the Atmospheric sciencesPython in the Atmospheric sciences
Python in the Atmospheric sciences
 

Mais de Ícaro Medeiros

Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and CultureÍcaro Medeiros
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data ScienceÍcaro Medeiros
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comÍcaro Medeiros
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...Ícaro Medeiros
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Ícaro Medeiros
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Ícaro Medeiros
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologiasÍcaro Medeiros
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Ícaro Medeiros
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology MappingÍcaro Medeiros
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...Ícaro Medeiros
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeÍcaro Medeiros
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no LinuxÍcaro Medeiros
 

Mais de Ícaro Medeiros (15)

Data Science and Culture
Data Science and CultureData Science and Culture
Data Science and Culture
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data Science
 
Linked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.comLinked Data, Big Data, and User Science at Globo.com
Linked Data, Big Data, and User Science at Globo.com
 
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs  - Front in Bahia...
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...
 
Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)Web Semântica na Globo.com (Novas Mídias UFRJ)
Web Semântica na Globo.com (Novas Mídias UFRJ)
 
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
Linked data at globo.com - Web of Linked Entities (WoLE 2013) - WWW 2013
 
Engenharia de ontologias
Engenharia de ontologiasEngenharia de ontologias
Engenharia de ontologias
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
R2R Framework: Ontology Mapping
R2R Framework: Ontology MappingR2R Framework: Ontology Mapping
R2R Framework: Ontology Mapping
 
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
SameAs Networks and Beyond: Analyzing Deployment Status and Implications of o...
 
Tag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of KnowledgeTag Suggestion using Multiple Sources of Knowledge
Tag Suggestion using Multiple Sources of Knowledge
 
Expressões regulares no Linux
Expressões regulares no LinuxExpressões regulares no Linux
Expressões regulares no Linux
 
Ontology Learning
Ontology LearningOntology Learning
Ontology Learning
 
Tag Suggestion
Tag SuggestionTag Suggestion
Tag Suggestion
 

Último

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Último (20)

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Why Python is better for Data Science

  • 1. WHY PYTHON IS BETTER FOR DATA SCIENCE ÍCARO MEDEIROS São Paulo Big Data Meetup São Paulo - SP, 25/11/2015
  • 2. DATA SCIENTISTS SHOULD DO… http://berkeleysciencereview.com/article/first-rule-data-science/
  • 3. WHY PYTHON? ▸ General purpose ▸ Smooth learning curve ▸ REPL (IPython!) ▸ Programmer productivity ▸ Popular and mature ▸ Glue language (high level API, low level C/Fortran bindings) ▸ Science ecosystem (growing!)
  • 4. PYTHON IS POPULAR: IT MEANS WIDESPREAD KNOWLEDGE AND MANY TOOLS http://githut.info/
  • 5. PYTHON IS POPULAR: IT MEANS WIDESPREAD KNOWLEDGE AND MANY TOOLS pypl.github.io/PYPL.html
  • 6. AVOID THE TWO LANGUAGE PROBLEM
  • 7. PYTHON CAN BE USED IN WHOLE DATA SCIENCE WORKFLOW https://speakerdeck.com/chdoig/the-state-of-python-for-data-science-pyss-2015?slide=22
  • 8. AUTHOR A MULTISTAGE PROCESSING PIPELINE IN PYTHON, DESIGN A HYPOTHESIS TEST, PERFORM A REGRESSION ANALYSIS OVER DATA SAMPLES WITH R, DESIGN AND IMPLEMENT AN ALGORITHM FOR SOME DATA-INTENSIVE PRODUCT OR SERVICE IN HADOOP, OR COMMUNICATE THE RESULTS OF OUR ANALYSES Jeff Hammerbacher ONE DAY AT FACEBOOK’S DATA SCIENCE TEAM, A MEMBER COULD… http://berkeleysciencereview.com/scientific-collaborations-uc-berkeley-data-driven-cover/
  • 9. OPTIONS FOR PROCESSING PIPELINE Airflow https://github.com/airbnb/airflow https://github.com/spotify/luigi
  • 11. REGRESSION ANALYSIS IN PYTHON: EASY http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/ols.html
  • 12.
  • 13. PYTHON <3 BIG DATA map reduce in python pure python HDFS client fast and general engine for large-scale data processing mrjob http://spark.apache.org https://github.com/spotify/snakebite https://pythonhosted.org/mrjob …
  • 14. OH, BUT SCALA/JAVA IS FASTER. PYTHON IS 2 *FASTER: [WRITING, RUNNING] DataFrame operations are optimized and compiled into JVM bytecode https://databricks.com/blog/2015/04/24/recent-performance-improvements-in-apache-spark-sql-python- dataframes-and-more.html
  • 15. RDD AVERAGE: EXAMPLE FROM ‘LEARNING SPARK'
  • 16. RDD AVERAGE: EXAMPLE FROM ‘LEARNING SPARK' SO CONCISE
  • 17. COMMUNICATE RESULTS WITH IPYTHON / JUPYTER Language agnostic :)
  • 18. COMMUNICATE RESULTS WITH IPYTHON / JUPYTER DEMO TIME
  • 19. MATPLOTLIB / SEABORN / PLOT.LY / BOKEH: SUCH VISUALIZATION!!
  • 23. SCIENCE IS GETTING MORE AND MORE IMPORTANT FOR PYTHON COMMUNITY # module imports imports/numpy 1 sys 2437939 5.85 2 os 2009086 4.82 3 re 1303009 3.12 4 numpy 416981 1.00 5 warnings 371345 0.89 6 subprocess 344934 0.83 7 django 282097 0.68 8 math 281987 0.68 11 matplotlib 146913 0.35 13 pylab 77817 0.19 14 scipy 69092 0.17 22 pandas 18928 0.05 24 theano 5482 0.051 6/25 MOST POPULAR LIBRARIES ARE FOR DATA SCIENCE https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement
  • 24. SCIENCE IS IMPORTANT FOR PYTHON: MATRIX MULTIPLICATION https://www.python.org/dev/peps/pep-0465/#but-isn-t-matrix-multiplication-a-pretty-niche-requirement import numpy as np from numpy.linalg import inv, solve # Using dot function: S = np.dot((np.dot(H, beta) - r).T, np.dot(inv(np.dot(np.dot(H, V), H.T)), np.dot(H, beta) - r)) # With the @ operator S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r) S = ( H β − r ) T ( H V H T ) − 1 ( H β − r ) PEP 0465: PROPOSED FEB/14. SINCE PY 3.5 (SEP/15) 2013: 7 INTERNATIONAL CONFERENCES ON NUMERICAL PYTHON AT PYCON 2014, ~20% OF THE TUTORIALS INVOLVED THE USE OF MATRICES
  • 25. SCIENCE STACK IS GETTING BETTER EACH DAY https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=8
  • 26. SCIENCE STACK IS ALWAYS EVOLVING… https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=29
  • 28. THE STACK IS STILL GETTING NEW MEMBERS… http://www.tensorflow.org/
  • 29. TAKEAWAY MESSAGE TRY PYTHON. IT WILL BE A ONE WAY TRIP!