SlideShare uma empresa Scribd logo
1 de 21
Python For BIG DATA ANALYTICS 
View Mastering Python course details at http://www.edureka.co/python 
For Queries: 
Post on Twitter @edurekaIN: #askEdureka 
Post on Facebook /edurekaIN 
For more details please contact us: 
US : 1800 275 9730 (toll free) 
INDIA : +91 88808 62004 
Email Us : sales@edureka.co
Objectives 
At the end of this module, you will be able to 
 Understand Python 
 Understand Web Scrapping example using Python 
 Understand PyDoop: Python API for Hadoop 
 Implement Word Count example in Pydoop 
 Integrate Data Science with Python 
 Implement Zombie Invasion modeling using Python 
Slide 2 www.edureka.co/python
Why Python? 
 Python is a great language for the beginner programmers since it is easy-to-learn and easy-to-maintain. 
 Python’s biggest strength is that the bulk of it’s library is portable. It also supports GUI Programming and 
can be used to create Applications portable on Mac, Windows and Unix X-Windows system. 
 With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics. 
Slide 3 www.edureka.co/python
Growing Interest in Python 
Slide 4 www.edureka.co/python
Demo: Web Scraping using Python 
 This example demonstrates how to scrape basic financial data from IMDB webpage 
 We shall use open source web scraping framework for Python called Beautiful Soup to crawl and 
extract data from webpages 
 Scraping is used for a wide range of purposes, from data mining to monitoring and automated testing 
Slide 5 www.edureka.co/python
Demo: Collecting Tweets using Python 
 This example demonstrates how to extract historical tweets for a particular brand like “nike” or “apple” 
 We shall make a REST API call to twitter to extract tweets 
 This data can be further used to perform sentiment analysis for a particular brand on Twitter 
Slide 6 www.edureka.co/python
Big Data 
 Lots of Data (Terabytes or Petabytes) 
 Big data is the term for a collection of data 
sets so large and complex that it becomes 
difficult to process using on-hand database 
management tools or traditional data 
processing applications 
 The challenges include capture, curation, 
storage, search, sharing, transfer, analysis, 
and visualization 
cloud 
tools 
statistics 
No SQL 
Big Data 
compression 
support 
database 
storage 
analize 
information 
mobile 
processing 
terabytes 
Slide 7 www.edureka.co/python
Un-Structured Data is Exploding 
Complex, Unstructured 
Relational 
 2500 exabytes of new information in 2012 with internet as primary driver 
 Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year 
Slide 8 www.edureka.co/python
Big Data Scenarios : Hospital Care 
Hospitals are analyzing medical data and patient 
records to predict those patients that are likely to seek 
readmission within a few months of discharge. The 
hospital can then intervene in hopes of preventing 
another costly hospital stay 
Medical diagnostics company analyzes millions of lines 
of data to develop first non-intrusive test for 
predicting coronary artery disease. To do so, 
researchers at the company analyzed over 100 million 
gene samples to ultimately identify the 23 primary 
predictive genes for coronary artery disease 
Slide 9 www.edureka.co/python
Big Data Scenarios : Amazon.com 
Amazon has an unrivalled bank of data on online 
consumer purchasing behaviour that it can mine from 
its 152 million customer accounts 
Amazon also uses Big Data to monitor, track and secure its 
1.5 billion items in its retail store that are laying around it 
200 fulfilment centres around the world. Amazon stores the 
product catalogue data in S3 
S3 can write, read and delete objects up to 5 TB of data 
each. The catalogue stored in S3 receives more than 50 
million updates a week and every 30 minutes all data 
received is crunched and reported back to the different 
warehouses and the website 
http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png 
Slide 10 www.edureka.co/python
Netflix uses 1 petabyte to store the videos for streaming 
BitTorrent Sync has transferred over 30 petabytes of data 
since its pre-alpha release in January 2013 
The 2009 movie Avatar is reported to have taken over 1 
petabyte of local storage at Weta Digital for the rendering 
of the 3D CGI effects 
One petabyte of average MP3-encoded songs (for mobile, 
roughly one megabyte per minute), would require 2000 
years to play 
Big Data Scenarios: NetFlix 
http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png 
Slide 11 www.edureka.co/python
IBM’s Definition 
 IBM’s Definition – Big Data Characteristics 
http://www-01.ibm.com/software/data/bigdata/ 
Web 
logs 
Audios 
Images 
Videos 
Sensor 
Data 
Volume Velocity Variety 
Slide 12 www.edureka.co/python
Hadoop for Big Data 
 Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of 
commodity computers using a simple programming model 
 It is an Open-source Data Management with scale-out storage & distributed processing 
Slide 13 www.edureka.co/python
Hadoop and MapReduce 
Hadoop is a system for large scale data processing 
It has two main components: 
 HDFS – Hadoop Distributed File System (Storage) 
» Distributed across “nodes” 
» Natively redundant 
» NameNode tracks locations 
 MapReduce (Processing) 
» Splits a task across processors 
» “near” the data & assembles results 
» Self-Healing, High Bandwidth 
» Clustered storage 
» Job Tracker manages the Task Trackers 
Key Value 
Map-Reduce 
Slide 14 www.edureka.co/python
PyDoop – Hadoop with Python 
Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with 
PyDoop package 
 PyDoop package provides a Python API for Hadoop MapReduce and 
HDFS 
 PyDoop has several advantages over Hadoop’s built-in solutions for 
Python programming, i.e., Hadoop Streaming and Jython 
 One of the biggest advantage of PyDoop is it’s HDFS API. This 
allows you to connect to an HDFS installation, read and write files, and 
get information on files, directories and global file system properties 
 The MapReduce API of PyDoop allows you to solve many complex 
problems with minimal programming efforts. Advance MapReduce 
concepts such as ‘Counters’ and ‘Record Readers’ can be implemented 
in Python using PyDoop 
Slide 15 www.edureka.co/python
Demo: Word Count using Hadoop Streaming API 
 The example shows the simple word count application written in Python 
 We shall use Hadoop Streaming APIs to run MapReduce code written in Python 
 Word Count application can be used to index text documents/files for a given “search query” 
Slide 16 www.edureka.co/python
Python and Data Science 
The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing and 
manipulating data, computing statistics and , creating visual reports on that data, building predictive and 
explanatory models, evaluating these models on additional data, integrating models into production systems, etc. 
 Python is an excellent choice for Data 
Scientist to do his day-to-day activities as it 
provides libraries to do all these things 
 Python has a diverse range of open source 
libraries for just about everything that a 
Data Scientist does in his day-to-day work 
 Python and most of its libraries are both 
open source and free 
Slide 17 www.edureka.co/python
SciPy.org 
SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and 
engineering. 
NumPy 
Base N-dimensional 
array package 
IPython 
Enhanced Interactive 
Console 
SciPy library 
Base N-dimensional 
array package 
Sympy 
Symbolic mathematics 
Matplotlib 
Comprehensive 2D 
Plotting 
pandas 
Data structures 
and analysis 
Slide 18 www.edureka.co/python
Demo: Zombie Invasion Model 
This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie 
invasion", using the equations specified by Philip Munz. 
The system is given as: 
dS/dt = P - B*S*Z - d*S 
dZ/dt = B*S*Z + G*R - A*S*Z 
dR/dt = d*S + A*S*Z - G*R 
Where: 
S: the number of susceptible victims 
Z: the number of zombies 
R: the number of people "killed” 
P: the population birth rate 
d: the chance of a natural death 
B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie) 
G: the chance a dead person is resurrected into a zombie 
A: the chance a zombie is totally destroyed 
There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial 
conditions. 
This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R]. 
Slide 19 www.edureka.co/python
Questions 
Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/python
Slide 21 Course Url

Mais conteúdo relacionado

Mais procurados

Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopEdureka!
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with REdureka!
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingPaco Nathan
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future TensePaco Nathan
 
Increasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchIncreasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchKrist Wongsuphasawat
 
Logistic Regression In Data Science
Logistic Regression In Data ScienceLogistic Regression In Data Science
Logistic Regression In Data ScienceEdureka!
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopRussell Jurney
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsData Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsKrist Wongsuphasawat
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop Edureka!
 

Mais procurados (20)

Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
Webinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use HadoopWebinar: Big Data & Hadoop - When not to use Hadoop
Webinar: Big Data & Hadoop - When not to use Hadoop
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Big Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely headingBig Data is changing abruptly, and where it is likely heading
Big Data is changing abruptly, and where it is likely heading
 
Data Science in Future Tense
Data Science in Future TenseData Science in Future Tense
Data Science in Future Tense
 
Increasing the Impact of Visualization Research
Increasing the Impact of Visualization ResearchIncreasing the Impact of Visualization Research
Increasing the Impact of Visualization Research
 
Logistic Regression In Data Science
Logistic Regression In Data ScienceLogistic Regression In Data Science
Logistic Regression In Data Science
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Using hadoop for big data
Using hadoop for big dataUsing hadoop for big data
Using hadoop for big data
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
Myths of Data Science
Myths of Data ScienceMyths of Data Science
Myths of Data Science
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsData Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 

Semelhante a Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Analysis

Python PPT
Python PPTPython PPT
Python PPTEdureka!
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big DataEdureka!
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Researh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonResearh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonWaternomics
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with pythonUmair ul Hassan
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceEdureka!
 
What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?SofiaCarter4
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and researchkchine3
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageEdureka!
 
SC7 Workshop 1: Big Data in Secure Societies
SC7 Workshop 1: Big Data in Secure Societies SC7 Workshop 1: Big Data in Secure Societies
SC7 Workshop 1: Big Data in Secure Societies BigData_Europe
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooJason Dai
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsEdureka!
 

Semelhante a Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Analysis (20)

Python PPT
Python PPTPython PPT
Python PPT
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Researh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonResearh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-python
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with python
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Use r 2013 tutorial - r and cloud computing for higher education and research
Use r 2013   tutorial - r and cloud computing for higher education and researchUse r 2013   tutorial - r and cloud computing for higher education and research
Use r 2013 tutorial - r and cloud computing for higher education and research
 
Why Python Should Be Your First Programming Language
Why Python Should Be Your First Programming LanguageWhy Python Should Be Your First Programming Language
Why Python Should Be Your First Programming Language
 
Python
Python Python
Python
 
SC7 Workshop 1: Big Data in Secure Societies
SC7 Workshop 1: Big Data in Secure Societies SC7 Workshop 1: Big Data in Secure Societies
SC7 Workshop 1: Big Data in Secure Societies
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Hadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionalsHadoop for Data Warehousing professionals
Hadoop for Data Warehousing professionals
 

Mais de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Mais de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Último

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Último (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Analysis

  • 1. Python For BIG DATA ANALYTICS View Mastering Python course details at http://www.edureka.co/python For Queries: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : sales@edureka.co
  • 2. Objectives At the end of this module, you will be able to  Understand Python  Understand Web Scrapping example using Python  Understand PyDoop: Python API for Hadoop  Implement Word Count example in Pydoop  Integrate Data Science with Python  Implement Zombie Invasion modeling using Python Slide 2 www.edureka.co/python
  • 3. Why Python?  Python is a great language for the beginner programmers since it is easy-to-learn and easy-to-maintain.  Python’s biggest strength is that the bulk of it’s library is portable. It also supports GUI Programming and can be used to create Applications portable on Mac, Windows and Unix X-Windows system.  With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics. Slide 3 www.edureka.co/python
  • 4. Growing Interest in Python Slide 4 www.edureka.co/python
  • 5. Demo: Web Scraping using Python  This example demonstrates how to scrape basic financial data from IMDB webpage  We shall use open source web scraping framework for Python called Beautiful Soup to crawl and extract data from webpages  Scraping is used for a wide range of purposes, from data mining to monitoring and automated testing Slide 5 www.edureka.co/python
  • 6. Demo: Collecting Tweets using Python  This example demonstrates how to extract historical tweets for a particular brand like “nike” or “apple”  We shall make a REST API call to twitter to extract tweets  This data can be further used to perform sentiment analysis for a particular brand on Twitter Slide 6 www.edureka.co/python
  • 7. Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization cloud tools statistics No SQL Big Data compression support database storage analize information mobile processing terabytes Slide 7 www.edureka.co/python
  • 8. Un-Structured Data is Exploding Complex, Unstructured Relational  2500 exabytes of new information in 2012 with internet as primary driver  Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year Slide 8 www.edureka.co/python
  • 9. Big Data Scenarios : Hospital Care Hospitals are analyzing medical data and patient records to predict those patients that are likely to seek readmission within a few months of discharge. The hospital can then intervene in hopes of preventing another costly hospital stay Medical diagnostics company analyzes millions of lines of data to develop first non-intrusive test for predicting coronary artery disease. To do so, researchers at the company analyzed over 100 million gene samples to ultimately identify the 23 primary predictive genes for coronary artery disease Slide 9 www.edureka.co/python
  • 10. Big Data Scenarios : Amazon.com Amazon has an unrivalled bank of data on online consumer purchasing behaviour that it can mine from its 152 million customer accounts Amazon also uses Big Data to monitor, track and secure its 1.5 billion items in its retail store that are laying around it 200 fulfilment centres around the world. Amazon stores the product catalogue data in S3 S3 can write, read and delete objects up to 5 TB of data each. The catalogue stored in S3 receives more than 50 million updates a week and every 30 minutes all data received is crunched and reported back to the different warehouses and the website http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png Slide 10 www.edureka.co/python
  • 11. Netflix uses 1 petabyte to store the videos for streaming BitTorrent Sync has transferred over 30 petabytes of data since its pre-alpha release in January 2013 The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects One petabyte of average MP3-encoded songs (for mobile, roughly one megabyte per minute), would require 2000 years to play Big Data Scenarios: NetFlix http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png Slide 11 www.edureka.co/python
  • 12. IBM’s Definition  IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ Web logs Audios Images Videos Sensor Data Volume Velocity Variety Slide 12 www.edureka.co/python
  • 13. Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model  It is an Open-source Data Management with scale-out storage & distributed processing Slide 13 www.edureka.co/python
  • 14. Hadoop and MapReduce Hadoop is a system for large scale data processing It has two main components:  HDFS – Hadoop Distributed File System (Storage) » Distributed across “nodes” » Natively redundant » NameNode tracks locations  MapReduce (Processing) » Splits a task across processors » “near” the data & assembles results » Self-Healing, High Bandwidth » Clustered storage » Job Tracker manages the Task Trackers Key Value Map-Reduce Slide 14 www.edureka.co/python
  • 15. PyDoop – Hadoop with Python Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with PyDoop package  PyDoop package provides a Python API for Hadoop MapReduce and HDFS  PyDoop has several advantages over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython  One of the biggest advantage of PyDoop is it’s HDFS API. This allows you to connect to an HDFS installation, read and write files, and get information on files, directories and global file system properties  The MapReduce API of PyDoop allows you to solve many complex problems with minimal programming efforts. Advance MapReduce concepts such as ‘Counters’ and ‘Record Readers’ can be implemented in Python using PyDoop Slide 15 www.edureka.co/python
  • 16. Demo: Word Count using Hadoop Streaming API  The example shows the simple word count application written in Python  We shall use Hadoop Streaming APIs to run MapReduce code written in Python  Word Count application can be used to index text documents/files for a given “search query” Slide 16 www.edureka.co/python
  • 17. Python and Data Science The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing and manipulating data, computing statistics and , creating visual reports on that data, building predictive and explanatory models, evaluating these models on additional data, integrating models into production systems, etc.  Python is an excellent choice for Data Scientist to do his day-to-day activities as it provides libraries to do all these things  Python has a diverse range of open source libraries for just about everything that a Data Scientist does in his day-to-day work  Python and most of its libraries are both open source and free Slide 17 www.edureka.co/python
  • 18. SciPy.org SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. NumPy Base N-dimensional array package IPython Enhanced Interactive Console SciPy library Base N-dimensional array package Sympy Symbolic mathematics Matplotlib Comprehensive 2D Plotting pandas Data structures and analysis Slide 18 www.edureka.co/python
  • 19. Demo: Zombie Invasion Model This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie invasion", using the equations specified by Philip Munz. The system is given as: dS/dt = P - B*S*Z - d*S dZ/dt = B*S*Z + G*R - A*S*Z dR/dt = d*S + A*S*Z - G*R Where: S: the number of susceptible victims Z: the number of zombies R: the number of people "killed” P: the population birth rate d: the chance of a natural death B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie) G: the chance a dead person is resurrected into a zombie A: the chance a zombie is totally destroyed There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial conditions. This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R]. Slide 19 www.edureka.co/python
  • 20. Questions Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.co/python