SlideShare uma empresa Scribd logo
1 de 14
NAVIGATING THE PYTHON ECOSYSTEM
FOR DATA SCIENCE
Ananth Krishnamoorthy, Ph.D.
Outline Slides for Talk at PyCon2017
Summary
• In their day-to-day jobs, data science teams and data scientists face challenges in
many overlapping yet distinct areas such as Reporting, Data Processing &
Storage, Scientific Computing, ML Modelling, Application Development. To
succeed, Data science teams, especially small ones, need a deep appreciation of
these dependencies on their success.
• Python ecosystem for data science has a number of tools and libraries for various
aspects of data science, including Machine Learning, Cluster Computing,
Scientific Computing, etc.
• The idea of this talk is to understand what the Python data science ecosystem
offers (so that you don't reinvent it), what are some common gaps (so that you
don't go blue looking for answers).
• In this talk, we describe how different tools/libraries fit in the machine learning
model development and deployment workflow . This talk is about how these
different tools work (and don’t work) together with each other. It is intended as a
landscape survey of the python data science ecosystem, along with a mention of
some common gaps that practitioners may notice as they put together a stack
and/or an application for their company.
The most important trait of the Analytics 3.0 era is that not only online firms, but virtually any type of firm
in any industry, can participate in the data economy. Banks, industrial manufacturers, health care
providers, retailers—any company in any industry that is willing to exploit the possibilities—can all
develop data-based offerings for customers, as well as support internal decisions with big data.
Analytics 1.0 Analytics 2.0 Analytics 3.0
Data  Enterprise Data
 Structured transactional data
 Bring in web and social data
 Complex, large,
semistructured data sources
 GPS, Mobile Device, Clickstream,
Sensor data
 Unstructured, real time, streaming
Tools  Spreadsheets
 BI, OLAP
 ETL
 On-premise servers
 Visualization
 NoSQL
 Hadoop
 Machine Learning , Artificial
Intelligence
 On-Demand Everything
 Analytical Apps
 Integrated, Embedded models
Activity  Majority of analytical activity
was descriptive analytics, or
reporting
 Creating analytical models
was a time-consuming
“batch” process
 Visual analytics dominates
predictive and prescriptive
techniques
 Develop products, not
PowerPoints or reports
 Analytics integral to running the
business, strategic asset
 Rapid and agile insight delivery
 Analytical tools available at point of
decision
Source: THE RISE OF ANALYTICS 3.0, By Thomas H. Davenport, IIA, 2013
Evolving Role of Data Science Teams
Machine Learning vs Real World Data
Science
Machine Learning
Deployment
Application Development
Big Data Processing
Data Storage
ETL
Challenges faced by Data Science Teams
• Requires many more competencies than can be reasonably expected
from one person
• Challenges are greater for smaller teams and smaller companies, e.g.
startups
• Challenges create dependencies on other teams e.g. Development
• Dependencies slow down execution and benefits realization
Plethora of Choices
Reporting
Data
Processing
& Storage
Scientific
Computing
ML
Modelling
Application
Development
SQL
NoSQL
Graphdb
OLAP
ETL
Cluster
Computing
Stream
Processing
SQL
Charting
Statistics
Cloud
Front End
Microservices
Back End
ML
Deep Learning
Dim. Reduction
Signal
Processing
Optimization
Time Series
Analysis
Simulation
MapReduce
Data Science Workflow
ETL Process ModelStore Deploy
DATA SCIENTIST SKILLS
Infrastructure and Provisioning ???
Python Ecosystem
ETL Process ModelStore Deploy
Odo Blaze Pandas
Dask
Spark
Sklearn_Pandas
Scikit-learn
Keras
Spark MLlib
Bokeh
Jupyter
Review of Key Tools
(50% of talk time spent here, more slides to be added)
• Jupyter
• Pandas
• Scikit-Learn
• Keras / TensorFlow / Theano
• Matplotlib/Bokeh
• Blaze
• Odo
• Dask
• pySpark
We shall see some code snippets here, to
illustrate a few ideas
The idea is to know enough to pick the right
components for the job at hand
Use Case 1: Small Data
This use case will illustrate case of Small
Data i.e. Desktop / In-memory processing
Use Case 2: ‘Medium’ Data
This use case will illustrate case of Medium
Data with Out-of-core processing
Use Case 3: Big Data
This use case will illustrate case of Big Data
i.e cluster computing
What Works
• Sklearn’s Consistent API, wide variety of ML algorithms
• Sklearn Pipelines
• Scikit-Keras Integration
• Pandas for Data Analysis
• ….
• ….
Gaps – A Data Scientist’s Perspective
• Uniform API Across Activities
• Separation of Data, Processing, and Instructions
• Single Data Structure Paradigm
• Support for in-memory, out-of-core, and distributed computing in same
paradigm e.g. SFrame
• ETL
• Push heavy lifting to backend systems
• Monitoring workflows
• UI development
• Bokeh
• Deployment
• Application
• Web Services

Mais conteúdo relacionado

Mais procurados

Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!Turi, Inc.
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTuri, Inc.
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine LearningSri Ambati
 
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Big Data Spain
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data EngineeringNovita Sari
 
Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)Indraneel Dabhade
 
Detecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousDetecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousNeo4j
 
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionAI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionDr. Haxel Consult
 
Что такое Data Science
Что такое Data ScienceЧто такое Data Science
Что такое Data ScienceOlga Lavrentieva
 
Machine Learning, Artificial General Intelligence, and Robots with Human Minds
Machine Learning, Artificial General Intelligence, and Robots with Human MindsMachine Learning, Artificial General Intelligence, and Robots with Human Minds
Machine Learning, Artificial General Intelligence, and Robots with Human MindsUniversity of Huddersfield
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain
 
Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?Danko Nikolic
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaComputational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaAalto University
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseAdam Gibson
 
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...Aalto University
 
Introduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsIntroduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsQamar un Nisa
 

Mais procurados (20)

Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 1 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
 
Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!Data! Data! Data! I Can't Make Bricks Without Clay!
Data! Data! Data! I Can't Make Bricks Without Clay!
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
Feature selection for Big Data: advances and challenges by Verónica Bolón-Can...
 
Basic Data Engineering
Basic Data EngineeringBasic Data Engineering
Basic Data Engineering
 
Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)
 
Detecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousDetecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and Linkurious
 
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionAI-SDV 2021: Francisco Webber - Efficiency is the New Precision
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
 
Что такое Data Science
Что такое Data ScienceЧто такое Data Science
Что такое Data Science
 
Machine Learning, Artificial General Intelligence, and Robots with Human Minds
Machine Learning, Artificial General Intelligence, and Robots with Human MindsMachine Learning, Artificial General Intelligence, and Robots with Human Minds
Machine Learning, Artificial General Intelligence, and Robots with Human Minds
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Olap, expert system, data visualisation
Olap, expert system, data visualisationOlap, expert system, data visualisation
Olap, expert system, data visualisation
 
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaComputational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
 
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
User Interfaces that Design Themselves: Talk given at Data-Driven Design Day ...
 
Introduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM ToolsIntroduction to DS, ML and IBM Tools
Introduction to DS, ML and IBM Tools
 

Semelhante a Proposed Talk Outline for Pycon2017

The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Building Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python ExpertiseBuilding Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python Expertiseriyak40
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
Cognitive Computing - A Primer
Cognitive Computing - A PrimerCognitive Computing - A Primer
Cognitive Computing - A PrimerMarlabs
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsxSangeetaTripathi8
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxOTA13NayabNakhwa
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistancephdAssistance1
 

Semelhante a Proposed Talk Outline for Pycon2017 (20)

The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Big Data
Big DataBig Data
Big Data
 
Building Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python ExpertiseBuilding Your Dream Machine Learning Team with Python Expertise
Building Your Dream Machine Learning Team with Python Expertise
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
Notebooks in IBM
Notebooks in IBMNotebooks in IBM
Notebooks in IBM
 
Python para Manual de Ciência de Dados
Python para Manual de Ciência de DadosPython para Manual de Ciência de Dados
Python para Manual de Ciência de Dados
 
Cognitive Computing - A Primer
Cognitive Computing - A PrimerCognitive Computing - A Primer
Cognitive Computing - A Primer
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 
On Big Data
On Big DataOn Big Data
On Big Data
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - Phdassistance
 

Último

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Último (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Proposed Talk Outline for Pycon2017

  • 1. NAVIGATING THE PYTHON ECOSYSTEM FOR DATA SCIENCE Ananth Krishnamoorthy, Ph.D. Outline Slides for Talk at PyCon2017
  • 2. Summary • In their day-to-day jobs, data science teams and data scientists face challenges in many overlapping yet distinct areas such as Reporting, Data Processing & Storage, Scientific Computing, ML Modelling, Application Development. To succeed, Data science teams, especially small ones, need a deep appreciation of these dependencies on their success. • Python ecosystem for data science has a number of tools and libraries for various aspects of data science, including Machine Learning, Cluster Computing, Scientific Computing, etc. • The idea of this talk is to understand what the Python data science ecosystem offers (so that you don't reinvent it), what are some common gaps (so that you don't go blue looking for answers). • In this talk, we describe how different tools/libraries fit in the machine learning model development and deployment workflow . This talk is about how these different tools work (and don’t work) together with each other. It is intended as a landscape survey of the python data science ecosystem, along with a mention of some common gaps that practitioners may notice as they put together a stack and/or an application for their company.
  • 3. The most important trait of the Analytics 3.0 era is that not only online firms, but virtually any type of firm in any industry, can participate in the data economy. Banks, industrial manufacturers, health care providers, retailers—any company in any industry that is willing to exploit the possibilities—can all develop data-based offerings for customers, as well as support internal decisions with big data. Analytics 1.0 Analytics 2.0 Analytics 3.0 Data  Enterprise Data  Structured transactional data  Bring in web and social data  Complex, large, semistructured data sources  GPS, Mobile Device, Clickstream, Sensor data  Unstructured, real time, streaming Tools  Spreadsheets  BI, OLAP  ETL  On-premise servers  Visualization  NoSQL  Hadoop  Machine Learning , Artificial Intelligence  On-Demand Everything  Analytical Apps  Integrated, Embedded models Activity  Majority of analytical activity was descriptive analytics, or reporting  Creating analytical models was a time-consuming “batch” process  Visual analytics dominates predictive and prescriptive techniques  Develop products, not PowerPoints or reports  Analytics integral to running the business, strategic asset  Rapid and agile insight delivery  Analytical tools available at point of decision Source: THE RISE OF ANALYTICS 3.0, By Thomas H. Davenport, IIA, 2013 Evolving Role of Data Science Teams
  • 4. Machine Learning vs Real World Data Science Machine Learning Deployment Application Development Big Data Processing Data Storage ETL
  • 5. Challenges faced by Data Science Teams • Requires many more competencies than can be reasonably expected from one person • Challenges are greater for smaller teams and smaller companies, e.g. startups • Challenges create dependencies on other teams e.g. Development • Dependencies slow down execution and benefits realization
  • 6. Plethora of Choices Reporting Data Processing & Storage Scientific Computing ML Modelling Application Development SQL NoSQL Graphdb OLAP ETL Cluster Computing Stream Processing SQL Charting Statistics Cloud Front End Microservices Back End ML Deep Learning Dim. Reduction Signal Processing Optimization Time Series Analysis Simulation MapReduce
  • 7. Data Science Workflow ETL Process ModelStore Deploy DATA SCIENTIST SKILLS Infrastructure and Provisioning ???
  • 8. Python Ecosystem ETL Process ModelStore Deploy Odo Blaze Pandas Dask Spark Sklearn_Pandas Scikit-learn Keras Spark MLlib Bokeh Jupyter
  • 9. Review of Key Tools (50% of talk time spent here, more slides to be added) • Jupyter • Pandas • Scikit-Learn • Keras / TensorFlow / Theano • Matplotlib/Bokeh • Blaze • Odo • Dask • pySpark We shall see some code snippets here, to illustrate a few ideas The idea is to know enough to pick the right components for the job at hand
  • 10. Use Case 1: Small Data This use case will illustrate case of Small Data i.e. Desktop / In-memory processing
  • 11. Use Case 2: ‘Medium’ Data This use case will illustrate case of Medium Data with Out-of-core processing
  • 12. Use Case 3: Big Data This use case will illustrate case of Big Data i.e cluster computing
  • 13. What Works • Sklearn’s Consistent API, wide variety of ML algorithms • Sklearn Pipelines • Scikit-Keras Integration • Pandas for Data Analysis • …. • ….
  • 14. Gaps – A Data Scientist’s Perspective • Uniform API Across Activities • Separation of Data, Processing, and Instructions • Single Data Structure Paradigm • Support for in-memory, out-of-core, and distributed computing in same paradigm e.g. SFrame • ETL • Push heavy lifting to backend systems • Monitoring workflows • UI development • Bokeh • Deployment • Application • Web Services

Notas do Editor

  1. Slide needs improvement 