SlideShare uma empresa Scribd logo
1 de 20
Python for Big Data Analytics
www.edureka.in/python
View Complete Course at : www.edureka.in/python
*
Post your Questions on Twitter on @edurekaIN: #askEdureka
Objectives of this Session
• Un
• Why Python?
• Web Scrapping example using Python
• Pydoop : Python API for Hadoop
• Word Count example in Pydoop
• Data Science with Python
• Zombie Invasion modeling using Python
For Queries during the session and class recording:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
www.edureka.in/python
Why Python?
 Python is a great language for the beginner programmers since it is easy-to-
learn and easy-to-maintain
 Python’s biggest strength is that bulk of it’s library is portable. It also
supports GUI Programming and can be used to create Applications portable on
Mac, Windows and Unix X-Windows system
 With libraries like PyDoop and SciPy, it’s a dream come true for Big Data
Analytics
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Growing Interest in Python
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Demo: Web Scraping using Python
 This example demonstrates how to scrape basic financial data from
https://www.google.com/finance website from a given list of Companies.
 We shall use open source web scraping framework for Python called
Beautiful Soup to crawl and extract data from webpages.
 Scraping is used for a wide range of purposes, from data mining to
monitoring and automated testing.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Demo: Collecting Tweets using Python
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
 This example demonstrates how to extract historical tweets for a particular
brand like “nike” or “apple”
 We shall make a REST API call to twitter to extract tweets.
 This data can be further used to perform sentiment analysis for a particular
brand on Twitter.
Big Data
 Lots of Data (Terabytes or Petabytes)
 Big data is the term for a collection of data sets
so large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data processing applications.
 The challenges include capture, curation,
storage, search, sharing, transfer, analysis, and
visualization.
cloud
tools
statistics
No SQL
compression
storage
support
database
analize
information
terabytes
processing
mobile
Big Data
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Un-Structured Data is Exploding
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Big Data Scenarios : Hospital Care
Hospitals are analyzing medical data and patient
records to predict those patients that are likely to seek
readmission within a few months of discharge. The
hospital can then intervene in hopes of preventing
another costly hospital stay.
Medical diagnostics company analyzes millions of lines
of data to develop first non-intrusive test for
predicting coronary artery disease. To do so,
researchers at the company analyzed over 100 million
gene samples to ultimately identify the 23 primary
predictive genes for coronary artery disease
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png
Amazon has an unrivalled bank of data on online consumer
purchasing behaviour that it can mine from its 152
million customer accounts.
Amazon also uses Big Data to monitor, track and secure its 1.5
billion items in its retail store that are laying around it 200
fulfilment centres around the world. Amazon stores the
product catalogue data in S3.
S3 can write, read and delete objects up to 5 TB of data each.
The catalogue stored in S3 receives more than 50 million
updates a week and every 30 minutes all data received is
crunched and reported back to the different warehouses and
the website.
Big Data Scenarios : Amazon.com
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png
Netflix uses 1 petabyte to store the videos for streaming.
BitTorrent Sync has transferred over 30 petabytes of data
since its pre-alpha release in January 2013.
The 2009 movie Avatar is reported to have taken over 1
petabyte of local storage at Weta Digital for the rendering
of the 3D CGI effects.
One petabyte of average MP3-encoded songs (for mobile,
roughly one megabyte per minute), would require 2000
years to play.
Big Data Scenarios: NetFlix
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
 IBM’s Definition – Big Data Characteristics
http://www-01.ibm.com/software/data/bigdata/
Web
logs
Images
Videos
Audios
Sensor
Data
Volume Velocity Variety
IBM’s Definition
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Hadoop for Big Data
 Apache Hadoop is a framework that allows for the distributed processing of large data sets across
clusters of commodity computers using a simple programming model.
 It is an Open-source Data Management with scale-out storage & distributed processing.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Hadoop and MapReduce
Hadoop is a system for large scale data processing.
It has two main components:
 HDFS – Hadoop Distributed File System (Storage)
 Distributed across “nodes”
 Natively redundant
 NameNode tracks locations.
 MapReduce (Processing)
 Splits a task across processors
 “near” the data & assembles results
 Self-Healing, High Bandwidth
 Clustered storage
 Job Tracker manages the Task Trackers
Map-Reduce
Key Value
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
PyDoop – Hadoop with Python
 PyDoop package provides a Python API for Hadoop MapReduce and
HDFS
 PyDoop has several advantages over Hadoop’s built-in solutions for
Python programming, i.e., Hadoop Streaming and Jython.
 One of the biggest advantage of PyDoop is it’s HDFS API. This
allows you to connect to an HDFS installation, read and write files, and
get information on files, directories and global file system properties.
 The MapReduce API of PyDoop allows you to solve many complex
problems with minimal programming efforts. Advance MapReduce
concepts such as ‘Counters’ and ‘Record Readers’ can be
implemented in Python using PyDoop.
Python can be used to write Hadoop MapReduce programs and
applications to access HDFS API for Hadoop with PyDoop
package.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Demo: Word Count using Hadoop Streaming API
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
 The example shows the simple word count application written in Python.
 We shall use Hadoop Streaming APIs to run mapreduce code written in Python.
 Word Count application can be used to index text documents/files for a given “search query”.
Python and Data Science
 Python is an excellent choice for Data Scientist to do his
day-to-day activities as it provides libraries to do all these
things.
 Python has a diverse range of open source libraries for
just about everything that a Data Scientist does in his
day-to-day work.
 Python and most of its libraries are both open source
and free.
The day-to-day tasks of a data scientist involves many
interrelated but different activities such as accessing and
manipulating data, computing statistics and , creating visual
reports on that data, building predictive and explanatory
models, evaluating these models on additional data, integrating
models into production systems, etc.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
SciPy.org
SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics,
science, and engineering.
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
Demo: Zombie Invasion Model
Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie
invasion", using the equations specified by Philip Munz.
The system is given as:
dS/dt = P - B*S*Z - d*S
dZ/dt = B*S*Z + G*R - A*S*Z
dR/dt = d*S + A*S*Z - G*R
There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial
conditions.
This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R].
Where:
S: the number of susceptible victims
Z: the number of zombies
R: the number of people "killed”
P: the population birth rate
d: the chance of a natural death
B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie)
G: the chance a dead person is resurrected into a zombie
A: the chance a zombie is totally destroyed
Questions?
www.edureka.in/python
Complete Course curriculum at : www.edureka.in/python
Post your Questions on Twitter on @edurekaIN: #askEdureka

Mais conteúdo relacionado

Mais procurados

Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonSujith Kumar
 
Best Python Libraries For Data Science & Machine Learning | Edureka
Best Python Libraries For Data Science & Machine Learning | EdurekaBest Python Libraries For Data Science & Machine Learning | Edureka
Best Python Libraries For Data Science & Machine Learning | EdurekaEdureka!
 
Python in real world.
Python in real world.Python in real world.
Python in real world.Alph@.M
 
Introduction to Python IDLE | IDLE Tutorial | Edureka
Introduction to Python IDLE | IDLE Tutorial | EdurekaIntroduction to Python IDLE | IDLE Tutorial | Edureka
Introduction to Python IDLE | IDLE Tutorial | EdurekaEdureka!
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPyDevashish Kumar
 
Inline Functions and Default arguments
Inline Functions and Default argumentsInline Functions and Default arguments
Inline Functions and Default argumentsNikhil Pandit
 
Python course syllabus
Python course syllabusPython course syllabus
Python course syllabusSugantha T
 
Python Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaPython Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaEdureka!
 
Introduction to AI & ML
Introduction to AI & MLIntroduction to AI & ML
Introduction to AI & MLMandy Sidana
 
Modules and packages in python
Modules and packages in pythonModules and packages in python
Modules and packages in pythonTMARAGATHAM
 
FDP on AI and ML by R. Rajkumar
FDP on AI and ML by  R. RajkumarFDP on AI and ML by  R. Rajkumar
FDP on AI and ML by R. RajkumarRajkumar R
 

Mais procurados (20)

Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in Python
 
Python - object oriented
Python - object orientedPython - object oriented
Python - object oriented
 
Best Python Libraries For Data Science & Machine Learning | Edureka
Best Python Libraries For Data Science & Machine Learning | EdurekaBest Python Libraries For Data Science & Machine Learning | Edureka
Best Python Libraries For Data Science & Machine Learning | Edureka
 
Python in real world.
Python in real world.Python in real world.
Python in real world.
 
Introduction to Python IDLE | IDLE Tutorial | Edureka
Introduction to Python IDLE | IDLE Tutorial | EdurekaIntroduction to Python IDLE | IDLE Tutorial | Edureka
Introduction to Python IDLE | IDLE Tutorial | Edureka
 
Python
PythonPython
Python
 
NUMPY
NUMPY NUMPY
NUMPY
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Java Servlets
Java ServletsJava Servlets
Java Servlets
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
 
Advanced Python : Decorators
Advanced Python : DecoratorsAdvanced Python : Decorators
Advanced Python : Decorators
 
Inline Functions and Default arguments
Inline Functions and Default argumentsInline Functions and Default arguments
Inline Functions and Default arguments
 
Python libraries
Python librariesPython libraries
Python libraries
 
Chapter 03 python libraries
Chapter 03 python librariesChapter 03 python libraries
Chapter 03 python libraries
 
Python course syllabus
Python course syllabusPython course syllabus
Python course syllabus
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Python Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaPython Anaconda Tutorial | Edureka
Python Anaconda Tutorial | Edureka
 
Introduction to AI & ML
Introduction to AI & MLIntroduction to AI & ML
Introduction to AI & ML
 
Modules and packages in python
Modules and packages in pythonModules and packages in python
Modules and packages in python
 
FDP on AI and ML by R. Rajkumar
FDP on AI and ML by  R. RajkumarFDP on AI and ML by  R. Rajkumar
FDP on AI and ML by R. Rajkumar
 

Semelhante a Python for Big Data Analytics

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...Edureka!
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th juneEdureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 
What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?SofiaCarter4
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonInsuk (Chris) Cho
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big DataEdureka!
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Eli White
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in pythonUmmeSalmaM1
 
python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptxKaviya452563
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Python for Data Science: Trends and Tools
Python for Data Science: Trends and ToolsPython for Data Science: Trends and Tools
Python for Data Science: Trends and ToolsSaschayaPranev
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitectureSkillspeed
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooJason Dai
 
Python for Data Engineering: Why Do Data Engineers Use Python?
Python for Data Engineering: Why Do Data Engineers Use Python?Python for Data Engineering: Why Do Data Engineers Use Python?
Python for Data Engineering: Why Do Data Engineers Use Python?hemayadav41
 
6 Open Source Data Science Projects To Impress Your Interviewer
6 Open Source Data Science Projects To Impress Your Interviewer6 Open Source Data Science Projects To Impress Your Interviewer
6 Open Source Data Science Projects To Impress Your InterviewerPrachiVarshney7
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.Marcel Caraciolo
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...Kamila Stępniowska
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Tomasz Bednarz
 

Semelhante a Python for Big Data Analytics (20)

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...Webinar:  Mastering Python - An Excellent tool for Web Scraping and Data Anal...
Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th june
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?What Is The Future of Data Science With Python?
What Is The Future of Data Science With Python?
 
Samsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of PythonSamsung SDS OpeniT - The possibility of Python
Samsung SDS OpeniT - The possibility of Python
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
python programming.pptx
python programming.pptxpython programming.pptx
python programming.pptx
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Python for Data Science: Trends and Tools
Python for Data Science: Trends and ToolsPython for Data Science: Trends and Tools
Python for Data Science: Trends and Tools
 
Python and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python ArchitecturePython and BIG Data analytics | Python Fundamentals | Python Architecture
Python and BIG Data analytics | Python Fundamentals | Python Architecture
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 
Python for Data Engineering: Why Do Data Engineers Use Python?
Python for Data Engineering: Why Do Data Engineers Use Python?Python for Data Engineering: Why Do Data Engineers Use Python?
Python for Data Engineering: Why Do Data Engineers Use Python?
 
6 Open Source Data Science Projects To Impress Your Interviewer
6 Open Source Data Science Projects To Impress Your Interviewer6 Open Source Data Science Projects To Impress Your Interviewer
6 Open Source Data Science Projects To Impress Your Interviewer
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
How You Can Use Open Source Materials to Learn Python & Data Science - EuroPy...
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 

Mais de Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

Mais de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Último

REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 

Último (20)

REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 

Python for Big Data Analytics

  • 1. Python for Big Data Analytics www.edureka.in/python View Complete Course at : www.edureka.in/python * Post your Questions on Twitter on @edurekaIN: #askEdureka
  • 2. Objectives of this Session • Un • Why Python? • Web Scrapping example using Python • Pydoop : Python API for Hadoop • Word Count example in Pydoop • Data Science with Python • Zombie Invasion modeling using Python For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN www.edureka.in/python
  • 3. Why Python?  Python is a great language for the beginner programmers since it is easy-to- learn and easy-to-maintain  Python’s biggest strength is that bulk of it’s library is portable. It also supports GUI Programming and can be used to create Applications portable on Mac, Windows and Unix X-Windows system  With libraries like PyDoop and SciPy, it’s a dream come true for Big Data Analytics Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 4. Growing Interest in Python Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 5. Demo: Web Scraping using Python  This example demonstrates how to scrape basic financial data from https://www.google.com/finance website from a given list of Companies.  We shall use open source web scraping framework for Python called Beautiful Soup to crawl and extract data from webpages.  Scraping is used for a wide range of purposes, from data mining to monitoring and automated testing. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 6. Demo: Collecting Tweets using Python Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python  This example demonstrates how to extract historical tweets for a particular brand like “nike” or “apple”  We shall make a REST API call to twitter to extract tweets.  This data can be further used to perform sentiment analysis for a particular brand on Twitter.
  • 7. Big Data  Lots of Data (Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. cloud tools statistics No SQL compression storage support database analize information terabytes processing mobile Big Data Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 8. Un-Structured Data is Exploding Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 9. Big Data Scenarios : Hospital Care Hospitals are analyzing medical data and patient records to predict those patients that are likely to seek readmission within a few months of discharge. The hospital can then intervene in hopes of preventing another costly hospital stay. Medical diagnostics company analyzes millions of lines of data to develop first non-intrusive test for predicting coronary artery disease. To do so, researchers at the company analyzed over 100 million gene samples to ultimately identify the 23 primary predictive genes for coronary artery disease Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 10. http://wp.streetwise.co/wp-content/uploads/2012/08/Amazon-Recommendations.png Amazon has an unrivalled bank of data on online consumer purchasing behaviour that it can mine from its 152 million customer accounts. Amazon also uses Big Data to monitor, track and secure its 1.5 billion items in its retail store that are laying around it 200 fulfilment centres around the world. Amazon stores the product catalogue data in S3. S3 can write, read and delete objects up to 5 TB of data each. The catalogue stored in S3 receives more than 50 million updates a week and every 30 minutes all data received is crunched and reported back to the different warehouses and the website. Big Data Scenarios : Amazon.com Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 11. http://smhttp.23575.nexcesscdn.net/80ABE1/sbmedia/blog/wp-content/uploads/2013/03/netflix-in-asia.png Netflix uses 1 petabyte to store the videos for streaming. BitTorrent Sync has transferred over 30 petabytes of data since its pre-alpha release in January 2013. The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects. One petabyte of average MP3-encoded songs (for mobile, roughly one megabyte per minute), would require 2000 years to play. Big Data Scenarios: NetFlix Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 12.  IBM’s Definition – Big Data Characteristics http://www-01.ibm.com/software/data/bigdata/ Web logs Images Videos Audios Sensor Data Volume Velocity Variety IBM’s Definition Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 13. Hadoop for Big Data  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an Open-source Data Management with scale-out storage & distributed processing. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 14. Hadoop and MapReduce Hadoop is a system for large scale data processing. It has two main components:  HDFS – Hadoop Distributed File System (Storage)  Distributed across “nodes”  Natively redundant  NameNode tracks locations.  MapReduce (Processing)  Splits a task across processors  “near” the data & assembles results  Self-Healing, High Bandwidth  Clustered storage  Job Tracker manages the Task Trackers Map-Reduce Key Value Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 15. PyDoop – Hadoop with Python  PyDoop package provides a Python API for Hadoop MapReduce and HDFS  PyDoop has several advantages over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython.  One of the biggest advantage of PyDoop is it’s HDFS API. This allows you to connect to an HDFS installation, read and write files, and get information on files, directories and global file system properties.  The MapReduce API of PyDoop allows you to solve many complex problems with minimal programming efforts. Advance MapReduce concepts such as ‘Counters’ and ‘Record Readers’ can be implemented in Python using PyDoop. Python can be used to write Hadoop MapReduce programs and applications to access HDFS API for Hadoop with PyDoop package. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 16. Demo: Word Count using Hadoop Streaming API Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python  The example shows the simple word count application written in Python.  We shall use Hadoop Streaming APIs to run mapreduce code written in Python.  Word Count application can be used to index text documents/files for a given “search query”.
  • 17. Python and Data Science  Python is an excellent choice for Data Scientist to do his day-to-day activities as it provides libraries to do all these things.  Python has a diverse range of open source libraries for just about everything that a Data Scientist does in his day-to-day work.  Python and most of its libraries are both open source and free. The day-to-day tasks of a data scientist involves many interrelated but different activities such as accessing and manipulating data, computing statistics and , creating visual reports on that data, building predictive and explanatory models, evaluating these models on additional data, integrating models into production systems, etc. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 18. SciPy.org SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering. Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python
  • 19. Demo: Zombie Invasion Model Post your Questions on Twitter on @edurekaIN: #askEdureka www.edureka.in/python This is a lighthearted example, a system of ODEs(Ordinary differential equations) can be used to model a "zombie invasion", using the equations specified by Philip Munz. The system is given as: dS/dt = P - B*S*Z - d*S dZ/dt = B*S*Z + G*R - A*S*Z dR/dt = d*S + A*S*Z - G*R There are three scenarios given in the program to show how Zombie Apocalypse vary with different initial conditions. This involves solving a system of first order ODEs given by: dy/dt = f(y, t) Where y = [S, Z, R]. Where: S: the number of susceptible victims Z: the number of zombies R: the number of people "killed” P: the population birth rate d: the chance of a natural death B: the chance the "zombie disease" is transmitted (an alive person becomes a zombie) G: the chance a dead person is resurrected into a zombie A: the chance a zombie is totally destroyed
  • 20. Questions? www.edureka.in/python Complete Course curriculum at : www.edureka.in/python Post your Questions on Twitter on @edurekaIN: #askEdureka