SlideShare a Scribd company logo
1 of 58
Download to read offline
Machine Learning, Deep
Learning, Big Data
Hands-On
by Dony Riyanto
Prepared and Presented to Panin Asset Management
January 2019
Hands-on Agenda
• Machine Learning Re-Visited
• Python Example of Machine Learning
• Introduction to Deep Learning
• Immentation of 'Big Data' (Hadoop Ecosystem)
• Hadoop File System
• Hadoop Map Reduce
• Case Study
• More advance implementation of Big Data
The Learning Problem
The essence of ML:
1. We have data
2. Patterns exist in data
3. We can't do math formula
(don't know the formula yet)
Examples:
 Movie Rating
 Credit Approval
 Hand Written Recognition
Domain Areas
 Computer Vision
 Natural Language Processing
 Business Intelligence
Components of Learning
Example in Banking: Credit Card Approval
Input : x (customer application)
Output : y (good/bad customer)
Unknown Target Function f :XY
Dataset {x, y} (customers record database)
Hypothesis Set: H : X Y
Final Hypothesis g
Learning Model = Hypothesis Set + Learning Algorithm
Machine Learning Model
Spatial Data
(Text, Image)
{x, y}
Sequence or
Time Series
Data
{x, t}
Classifier
Class Score
Regression
Cont. Values
Main Paradigms
Automatic discovery of patterns in data through computer algorithms and the use of
those patterns to take actions such as classifying or clustering the data into
categories.
Supervised Learning: Learning by labeled example
E.g. An email spam detector
We have (input, correct output), and we can predict (new input, predicted output)
Amazingly effective if you have lots of data
Unsupervised Learning: Discovering Patterns
E.g. Data clustering
Instead of (input, correct output), we get (input, ?)
Difficult in practices but useful if we lack labeled data
Reinforcement Learning: Feedback & Error
E.g. Learning to play chess
Instead of (input, correct output), we get (input, only some output, grade of this output)
Works well in some domains, becoming more important
What/why is Python
Python is an interpreted, high-level programming language, general-
purpose programming language.
Its high-level built in data structures, combined with dynamic typing and
dynamic binding, make it very attractive for Rapid Application Development,
as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy to learn syntax emphasizes
readability and therefore reduces the cost of program maintenance. Python
supports modules and packages, which encourages program modularity and
code reuse. The Python interpreter and the extensive standard library are
available in source or binary form without charge for all major platforms, and
can be freely distributed.
Machine Learning with Python
• We need Python 2.7.x or 3.7.x
• Libraries, ex.:
• numpy (fundamental package for scientific computing with Python)
• matplotlib (plotting library for the Python programming language and its
numerical mathematics extension NumPy)
• pandas (software library written for the Python programming language for
data manipulation and analysis)
• seaborn (Python data visualization library based on matplotlib)
• sklearn (Scikit-learn is a machine learning library for the Python programming
language)
• IDE, ex: pycharm
• Alternatives, install Anaconda (distribution of the Python programming
languages for scientific computing (data science, machine learning applications,
large-scale data processing, predictive analytics, etc.)
Machine Learning with Python
• Python 3 installation
• Introduction to pip (python package installer)
• Install PyCharm
• or install Anaconda
Lesson 1
*data preprocessing
Lesson 1 (with Anaconda)
Lesson 2
*Class labeling with preprocessing
Lesson 3
*Load CSV and data observation
Lesson 4
Introduction to Deep Learning
• Deep learning has produced good results for a few applications
such as computer vision, language translation, image captioning,
audio transcription, molecular biology, speech recognition,
natural language processing, self-driving cars, brain tumour
detection, real-time speech translation, music composition,
automatic game playing and so on.
• Deep learning is the next big leap after machine learning with a
more advanced implementation. Currently, it is heading towards
becoming an industry standard bringing a strong promise of
being a game changer when dealing with raw unstructured data.
Introduction to Deep Learning
• Deep learning is currently one of the best solution providers fora wide
range of real-world problems. Developers are building AI programs that,
instead of using previously given rules, learn from examples to solve
complicated tasks. With deep learning being used by many data scientists,
deeper neural networks are delivering results that are ever more accurate.
• The idea is to develop deep neural networks by increasing the number of
training layers for each network; machine learns more about the data until
it is as accurate as possible. Developers can use deep learning techniques
to implement complex machine learning tasks, and train AI networks to
have high levels of perceptual recognition.
Introduction to Deep Learning
• Deep learning finds its popularity in Computer vision. Here one
of the tasks achieved is image classification where given input
images are classified as cat, dog, etc. or as a class or label that
best describe the image. We as humans learn how to do this
task very early in our lives and have these skills of quickly
recognizing patterns, generalizing from prior knowledge, and
adapting to different image environments.
Deep Learning Performance
Deep Learning with TensorFlow
• Googles TensorFlow is a python library. This library is a great choice for
building commercial grade deep learning applications.
• TensorFlow grew out of another library DistBelief V2 that was a part of
Google Brain Project. This library aims to extend the portability of
machine learning so that research models could be applied to
commercial-grade applications.
• Much like the Theano library, TensorFlow is based on computational
graphs where a node represents persistent data or math operation and
edges represent the flow of data between nodes, which is a
multidimensional array or tensor; hence the name TensorFlow
Deep Learning Implementation with
Tensorflow and Python
• Preparation (Python + libraries)
• Installing Tensorflow
• Running Several Tensorflow built-in example, ex.:
• Regression
• Image Classification
Introduction to Hadoop
Hadoop
Hadoop is:
• - scalable.
• - a “Framework”.
• - not a drop in replacement for RDBMS.
• - great for pipelining massive amounts of data to achieve the
end result.
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
Hadoop
• example of file/text search
Hadoop
• Planning
• Installation step
• Using HDFS
• Using Map Reduce
Hadoop Map Reduce
• MapReduce is a processing technique and a program model for distributed computing
based on java. The MapReduce algorithm contains two important tasks, namely Map
and Reduce. Map takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key/value pairs). Secondly, reduce task,
which takes the output from a map as an input and combines those data tuples into a
smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task
is always performed after the map job.
VS
Hadoop Map Reduce
• The MapReduce framework operates on <key, value> pairs, that is, the framework views the input to the job
as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably
of different types.
• The key and the value classes should be in serialized manner by the framework and hence, need to
implement the Writable interface. Additionally, the key classes have to implement the Writable-Comparable
interface to facilitate sorting by the framework. Input and Output types of a MapReduce job − (Input) <k1, v1>
→ map → <k2, v2> → reduce → <k3, v3>(Output).
Hadoop Map Reduce
• Words Count (without map-reduce)
Hadoop Map Reduce
• Words Count (mapper)
Hadoop Map Reduce
• Words Count (reducer)
Hadoop Map Reduce
• Run on HadoopMR
input file from local or HDFS
mapper application (see prev. slide)
reducer application (see prev. slide)
*mapper and recuder apps can be written in Python , R, Java, Scala, etc
Hadoop Map Reduce
Hadoop Map Reduce
• Map Reduce is not magic. It's a method
• Map Reduce is not always about big data (ex: find pi value)
• Map Reduce is not silver bullet. (e.g: batch vs streaming data)
• Map Reduce is usually solved:
• Batch processing flow
• Unstructured/Semi-structured data
Bigger Image of Hadoop (Hadoop Ecosystem)
Data Stream
Why Stream Processing?
• Processing unbounded data sets, or "stream processing", is a new way
of looking at what has always been done as batch in the past. Whilst
intra-day ETL and frequent batch executions have brought latencies
down, they are still independent executions with optional bespoke code
in place to handle intra-batch accumulations. With a platform such as
Spark Streaming we have a framework that natively supports
processing both within-batch and across-batch (windowing).
• By taking a stream processing approach we can benefit in several ways.
The most obvious is reducing latency between an event occurring and
taking an action driven by it, whether automatic or via analytics
presented to a human. Other benefits include a more smoothed out
resource consumption profile.
Introducing Spark
• Better speed compared to HadoopMR
• Minimized disk read-write (on memory processing)
• Comes with Spark Streaming (later, Hadoop also create
Hadoop Stream)
• Still in Hadoop Ecosystem
Data Stream with Spark Streaming
Simple Spark Streaming Implementation Example
near realtime dashboard
data stream processing and analytics
(bigger/reliable capabilities)
multiple channel/type of data
Different programming style.
Spark libraries included in app
returned data of processing/analytics
Infinite run
Spark Streaming Implementation
• Review some spark streaming example
• Review some Spark Streaming architecture
Example of Bukalapak
• Save all data from 2014 'til now
• >1.5PB data including:
• Product images
• Products data
• Messaging
Buka Lapak 'Big Data' Implementation
Example: Application Health Monitoring
Example: Recomender Engine
source:
Example: Recomender Engine
Example: Gojek Data Visualization
Example Gojek Problem
Example Gojek Problem
Example Gojek Problem
Example Gojek Problem

More Related Content

What's hot

Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
Object Detection with Tensorflow
Object Detection with TensorflowObject Detection with Tensorflow
Object Detection with TensorflowElifTech
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDatabricks
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its ApplicationsDr Ganesh Iyer
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
Machine Learning With Python | Machine Learning Algorithms | Machine Learning...
Machine Learning With Python | Machine Learning Algorithms | Machine Learning...Machine Learning With Python | Machine Learning Algorithms | Machine Learning...
Machine Learning With Python | Machine Learning Algorithms | Machine Learning...Simplilearn
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye viewRoelof Pieters
 
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPTAI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPTCprime
 
System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and searchEugene Yan Ziyou
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overviewRodion Kiryukhin
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 

What's hot (20)

Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
Object Detection with Tensorflow
Object Detection with TensorflowObject Detection with Tensorflow
Object Detection with Tensorflow
 
Deep learning
Deep learning Deep learning
Deep learning
 
Deep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreath
 
Machine Learning for dummies!
Machine Learning for dummies!Machine Learning for dummies!
Machine Learning for dummies!
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Machine Learning With Python | Machine Learning Algorithms | Machine Learning...
Machine Learning With Python | Machine Learning Algorithms | Machine Learning...Machine Learning With Python | Machine Learning Algorithms | Machine Learning...
Machine Learning With Python | Machine Learning Algorithms | Machine Learning...
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPTAI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
AI for Everyone: Demystifying Large Language Models (LLMs) Like ChatGPT
 
Introduction of Faster R-CNN
Introduction of Faster R-CNNIntroduction of Faster R-CNN
Introduction of Faster R-CNN
 
System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and search
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 

Similar to Big Data Analytics (ML, DL, AI) hands-on

Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportAhmad El Tawil
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersKumari Surabhi
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Adam Gibson
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
 
How to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdfHow to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdfCareervira
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is DistributedAlluxio, Inc.
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformShivaji Dutta
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 

Similar to Big Data Analytics (ML, DL, AI) hands-on (20)

Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Toolboxes for data scientists
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientists
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
hadoop
hadoophadoop
hadoop
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
How to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdfHow to Become a Big Data Professional.pdf
How to Become a Big Data Professional.pdf
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Python ml
Python mlPython ml
Python ml
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 

More from Dony Riyanto

KNIME For Enterprise Data Analytics.pdf
KNIME For Enterprise Data Analytics.pdfKNIME For Enterprise Data Analytics.pdf
KNIME For Enterprise Data Analytics.pdfDony Riyanto
 
Implementasi Teknologi Industri 4.0 pada TNI AD
Implementasi Teknologi Industri 4.0 pada TNI ADImplementasi Teknologi Industri 4.0 pada TNI AD
Implementasi Teknologi Industri 4.0 pada TNI ADDony Riyanto
 
Blockchain untuk Big Data
Blockchain untuk Big DataBlockchain untuk Big Data
Blockchain untuk Big DataDony Riyanto
 
Mengenal ROS2 Galactic
Mengenal ROS2 GalacticMengenal ROS2 Galactic
Mengenal ROS2 GalacticDony Riyanto
 
Membuat Desain Roket Amatir dan Menjalankan Simulasi
Membuat Desain Roket Amatir dan Menjalankan SimulasiMembuat Desain Roket Amatir dan Menjalankan Simulasi
Membuat Desain Roket Amatir dan Menjalankan SimulasiDony Riyanto
 
Creating UDP Broadcast App Using Python Socket on WIndows & Linux
Creating UDP Broadcast App Using Python Socket on WIndows & LinuxCreating UDP Broadcast App Using Python Socket on WIndows & Linux
Creating UDP Broadcast App Using Python Socket on WIndows & LinuxDony Riyanto
 
Desain ground control & Sistem Pendukung untuk Male UAV/UCAV
Desain ground control & Sistem Pendukung untuk Male UAV/UCAVDesain ground control & Sistem Pendukung untuk Male UAV/UCAV
Desain ground control & Sistem Pendukung untuk Male UAV/UCAVDony Riyanto
 
Application Performance, Test and Monitoring
Application Performance, Test and MonitoringApplication Performance, Test and Monitoring
Application Performance, Test and MonitoringDony Riyanto
 
Cloud Service Design for Computer Vision, Image & Video Processing+Analytics
Cloud Service Design for Computer Vision, Image & Video Processing+AnalyticsCloud Service Design for Computer Vision, Image & Video Processing+Analytics
Cloud Service Design for Computer Vision, Image & Video Processing+AnalyticsDony Riyanto
 
RealNetworks - SAFR Platform Whitepaper
RealNetworks - SAFR Platform WhitepaperRealNetworks - SAFR Platform Whitepaper
RealNetworks - SAFR Platform WhitepaperDony Riyanto
 
Dl6960 Demo Software User's Guide v1.4
Dl6960 Demo Software User's Guide v1.4Dl6960 Demo Software User's Guide v1.4
Dl6960 Demo Software User's Guide v1.4Dony Riyanto
 
Review of Existing Response System & Technology.
Review of Existing Response System & Technology.Review of Existing Response System & Technology.
Review of Existing Response System & Technology.Dony Riyanto
 
Beberapa Studi Kasus Fintech Micro Payment
Beberapa Studi Kasus Fintech Micro PaymentBeberapa Studi Kasus Fintech Micro Payment
Beberapa Studi Kasus Fintech Micro PaymentDony Riyanto
 
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANG
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANGRencana Pengembangan REST API dan Microservice pada MONEVRISBANG
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANGDony Riyanto
 
Implementasi Full Textsearch pada Database
Implementasi Full Textsearch pada DatabaseImplementasi Full Textsearch pada Database
Implementasi Full Textsearch pada DatabaseDony Riyanto
 
Beberapa strategi implementasi open api untuk legacy system existing app
Beberapa strategi implementasi open api untuk legacy system existing appBeberapa strategi implementasi open api untuk legacy system existing app
Beberapa strategi implementasi open api untuk legacy system existing appDony Riyanto
 
Pengenalan Big Data untuk Pemula
Pengenalan Big Data untuk PemulaPengenalan Big Data untuk Pemula
Pengenalan Big Data untuk PemulaDony Riyanto
 
Introduction to BACnet: Building Automation & Control Network
Introduction to BACnet: Building Automation & Control NetworkIntroduction to BACnet: Building Automation & Control Network
Introduction to BACnet: Building Automation & Control NetworkDony Riyanto
 
Enterprise Microservices
Enterprise MicroservicesEnterprise Microservices
Enterprise MicroservicesDony Riyanto
 
Edge Exploration of QR Code Technology Implementation
Edge Exploration of QR Code Technology ImplementationEdge Exploration of QR Code Technology Implementation
Edge Exploration of QR Code Technology ImplementationDony Riyanto
 

More from Dony Riyanto (20)

KNIME For Enterprise Data Analytics.pdf
KNIME For Enterprise Data Analytics.pdfKNIME For Enterprise Data Analytics.pdf
KNIME For Enterprise Data Analytics.pdf
 
Implementasi Teknologi Industri 4.0 pada TNI AD
Implementasi Teknologi Industri 4.0 pada TNI ADImplementasi Teknologi Industri 4.0 pada TNI AD
Implementasi Teknologi Industri 4.0 pada TNI AD
 
Blockchain untuk Big Data
Blockchain untuk Big DataBlockchain untuk Big Data
Blockchain untuk Big Data
 
Mengenal ROS2 Galactic
Mengenal ROS2 GalacticMengenal ROS2 Galactic
Mengenal ROS2 Galactic
 
Membuat Desain Roket Amatir dan Menjalankan Simulasi
Membuat Desain Roket Amatir dan Menjalankan SimulasiMembuat Desain Roket Amatir dan Menjalankan Simulasi
Membuat Desain Roket Amatir dan Menjalankan Simulasi
 
Creating UDP Broadcast App Using Python Socket on WIndows & Linux
Creating UDP Broadcast App Using Python Socket on WIndows & LinuxCreating UDP Broadcast App Using Python Socket on WIndows & Linux
Creating UDP Broadcast App Using Python Socket on WIndows & Linux
 
Desain ground control & Sistem Pendukung untuk Male UAV/UCAV
Desain ground control & Sistem Pendukung untuk Male UAV/UCAVDesain ground control & Sistem Pendukung untuk Male UAV/UCAV
Desain ground control & Sistem Pendukung untuk Male UAV/UCAV
 
Application Performance, Test and Monitoring
Application Performance, Test and MonitoringApplication Performance, Test and Monitoring
Application Performance, Test and Monitoring
 
Cloud Service Design for Computer Vision, Image & Video Processing+Analytics
Cloud Service Design for Computer Vision, Image & Video Processing+AnalyticsCloud Service Design for Computer Vision, Image & Video Processing+Analytics
Cloud Service Design for Computer Vision, Image & Video Processing+Analytics
 
RealNetworks - SAFR Platform Whitepaper
RealNetworks - SAFR Platform WhitepaperRealNetworks - SAFR Platform Whitepaper
RealNetworks - SAFR Platform Whitepaper
 
Dl6960 Demo Software User's Guide v1.4
Dl6960 Demo Software User's Guide v1.4Dl6960 Demo Software User's Guide v1.4
Dl6960 Demo Software User's Guide v1.4
 
Review of Existing Response System & Technology.
Review of Existing Response System & Technology.Review of Existing Response System & Technology.
Review of Existing Response System & Technology.
 
Beberapa Studi Kasus Fintech Micro Payment
Beberapa Studi Kasus Fintech Micro PaymentBeberapa Studi Kasus Fintech Micro Payment
Beberapa Studi Kasus Fintech Micro Payment
 
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANG
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANGRencana Pengembangan REST API dan Microservice pada MONEVRISBANG
Rencana Pengembangan REST API dan Microservice pada MONEVRISBANG
 
Implementasi Full Textsearch pada Database
Implementasi Full Textsearch pada DatabaseImplementasi Full Textsearch pada Database
Implementasi Full Textsearch pada Database
 
Beberapa strategi implementasi open api untuk legacy system existing app
Beberapa strategi implementasi open api untuk legacy system existing appBeberapa strategi implementasi open api untuk legacy system existing app
Beberapa strategi implementasi open api untuk legacy system existing app
 
Pengenalan Big Data untuk Pemula
Pengenalan Big Data untuk PemulaPengenalan Big Data untuk Pemula
Pengenalan Big Data untuk Pemula
 
Introduction to BACnet: Building Automation & Control Network
Introduction to BACnet: Building Automation & Control NetworkIntroduction to BACnet: Building Automation & Control Network
Introduction to BACnet: Building Automation & Control Network
 
Enterprise Microservices
Enterprise MicroservicesEnterprise Microservices
Enterprise Microservices
 
Edge Exploration of QR Code Technology Implementation
Edge Exploration of QR Code Technology ImplementationEdge Exploration of QR Code Technology Implementation
Edge Exploration of QR Code Technology Implementation
 

Recently uploaded

Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Recently uploaded (20)

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 

Big Data Analytics (ML, DL, AI) hands-on

  • 1. Machine Learning, Deep Learning, Big Data Hands-On by Dony Riyanto Prepared and Presented to Panin Asset Management January 2019
  • 2. Hands-on Agenda • Machine Learning Re-Visited • Python Example of Machine Learning • Introduction to Deep Learning • Immentation of 'Big Data' (Hadoop Ecosystem) • Hadoop File System • Hadoop Map Reduce • Case Study • More advance implementation of Big Data
  • 3. The Learning Problem The essence of ML: 1. We have data 2. Patterns exist in data 3. We can't do math formula (don't know the formula yet) Examples:  Movie Rating  Credit Approval  Hand Written Recognition Domain Areas  Computer Vision  Natural Language Processing  Business Intelligence
  • 4. Components of Learning Example in Banking: Credit Card Approval Input : x (customer application) Output : y (good/bad customer) Unknown Target Function f :XY Dataset {x, y} (customers record database) Hypothesis Set: H : X Y Final Hypothesis g Learning Model = Hypothesis Set + Learning Algorithm
  • 5. Machine Learning Model Spatial Data (Text, Image) {x, y} Sequence or Time Series Data {x, t} Classifier Class Score Regression Cont. Values
  • 6. Main Paradigms Automatic discovery of patterns in data through computer algorithms and the use of those patterns to take actions such as classifying or clustering the data into categories. Supervised Learning: Learning by labeled example E.g. An email spam detector We have (input, correct output), and we can predict (new input, predicted output) Amazingly effective if you have lots of data Unsupervised Learning: Discovering Patterns E.g. Data clustering Instead of (input, correct output), we get (input, ?) Difficult in practices but useful if we lack labeled data Reinforcement Learning: Feedback & Error E.g. Learning to play chess Instead of (input, correct output), we get (input, only some output, grade of this output) Works well in some domains, becoming more important
  • 7. What/why is Python Python is an interpreted, high-level programming language, general- purpose programming language. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.
  • 8. Machine Learning with Python • We need Python 2.7.x or 3.7.x • Libraries, ex.: • numpy (fundamental package for scientific computing with Python) • matplotlib (plotting library for the Python programming language and its numerical mathematics extension NumPy) • pandas (software library written for the Python programming language for data manipulation and analysis) • seaborn (Python data visualization library based on matplotlib) • sklearn (Scikit-learn is a machine learning library for the Python programming language) • IDE, ex: pycharm • Alternatives, install Anaconda (distribution of the Python programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.)
  • 9. Machine Learning with Python • Python 3 installation • Introduction to pip (python package installer) • Install PyCharm • or install Anaconda
  • 11. Lesson 1 (with Anaconda)
  • 12. Lesson 2 *Class labeling with preprocessing
  • 13. Lesson 3 *Load CSV and data observation
  • 15. Introduction to Deep Learning • Deep learning has produced good results for a few applications such as computer vision, language translation, image captioning, audio transcription, molecular biology, speech recognition, natural language processing, self-driving cars, brain tumour detection, real-time speech translation, music composition, automatic game playing and so on. • Deep learning is the next big leap after machine learning with a more advanced implementation. Currently, it is heading towards becoming an industry standard bringing a strong promise of being a game changer when dealing with raw unstructured data.
  • 16. Introduction to Deep Learning • Deep learning is currently one of the best solution providers fora wide range of real-world problems. Developers are building AI programs that, instead of using previously given rules, learn from examples to solve complicated tasks. With deep learning being used by many data scientists, deeper neural networks are delivering results that are ever more accurate. • The idea is to develop deep neural networks by increasing the number of training layers for each network; machine learns more about the data until it is as accurate as possible. Developers can use deep learning techniques to implement complex machine learning tasks, and train AI networks to have high levels of perceptual recognition.
  • 17. Introduction to Deep Learning • Deep learning finds its popularity in Computer vision. Here one of the tasks achieved is image classification where given input images are classified as cat, dog, etc. or as a class or label that best describe the image. We as humans learn how to do this task very early in our lives and have these skills of quickly recognizing patterns, generalizing from prior knowledge, and adapting to different image environments.
  • 19. Deep Learning with TensorFlow • Googles TensorFlow is a python library. This library is a great choice for building commercial grade deep learning applications. • TensorFlow grew out of another library DistBelief V2 that was a part of Google Brain Project. This library aims to extend the portability of machine learning so that research models could be applied to commercial-grade applications. • Much like the Theano library, TensorFlow is based on computational graphs where a node represents persistent data or math operation and edges represent the flow of data between nodes, which is a multidimensional array or tensor; hence the name TensorFlow
  • 20. Deep Learning Implementation with Tensorflow and Python • Preparation (Python + libraries) • Installing Tensorflow • Running Several Tensorflow built-in example, ex.: • Regression • Image Classification
  • 22. Hadoop Hadoop is: • - scalable. • - a “Framework”. • - not a drop in replacement for RDBMS. • - great for pipelining massive amounts of data to achieve the end result.
  • 32. Hadoop • example of file/text search
  • 33. Hadoop • Planning • Installation step • Using HDFS • Using Map Reduce
  • 34. Hadoop Map Reduce • MapReduce is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task is always performed after the map job. VS
  • 35. Hadoop Map Reduce • The MapReduce framework operates on <key, value> pairs, that is, the framework views the input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of the job, conceivably of different types. • The key and the value classes should be in serialized manner by the framework and hence, need to implement the Writable interface. Additionally, the key classes have to implement the Writable-Comparable interface to facilitate sorting by the framework. Input and Output types of a MapReduce job − (Input) <k1, v1> → map → <k2, v2> → reduce → <k3, v3>(Output).
  • 36. Hadoop Map Reduce • Words Count (without map-reduce)
  • 37. Hadoop Map Reduce • Words Count (mapper)
  • 38. Hadoop Map Reduce • Words Count (reducer)
  • 39. Hadoop Map Reduce • Run on HadoopMR input file from local or HDFS mapper application (see prev. slide) reducer application (see prev. slide) *mapper and recuder apps can be written in Python , R, Java, Scala, etc
  • 41. Hadoop Map Reduce • Map Reduce is not magic. It's a method • Map Reduce is not always about big data (ex: find pi value) • Map Reduce is not silver bullet. (e.g: batch vs streaming data) • Map Reduce is usually solved: • Batch processing flow • Unstructured/Semi-structured data
  • 42. Bigger Image of Hadoop (Hadoop Ecosystem)
  • 43. Data Stream Why Stream Processing? • Processing unbounded data sets, or "stream processing", is a new way of looking at what has always been done as batch in the past. Whilst intra-day ETL and frequent batch executions have brought latencies down, they are still independent executions with optional bespoke code in place to handle intra-batch accumulations. With a platform such as Spark Streaming we have a framework that natively supports processing both within-batch and across-batch (windowing). • By taking a stream processing approach we can benefit in several ways. The most obvious is reducing latency between an event occurring and taking an action driven by it, whether automatic or via analytics presented to a human. Other benefits include a more smoothed out resource consumption profile.
  • 44. Introducing Spark • Better speed compared to HadoopMR • Minimized disk read-write (on memory processing) • Comes with Spark Streaming (later, Hadoop also create Hadoop Stream) • Still in Hadoop Ecosystem
  • 45. Data Stream with Spark Streaming
  • 46. Simple Spark Streaming Implementation Example near realtime dashboard data stream processing and analytics (bigger/reliable capabilities) multiple channel/type of data
  • 47. Different programming style. Spark libraries included in app returned data of processing/analytics Infinite run
  • 48. Spark Streaming Implementation • Review some spark streaming example • Review some Spark Streaming architecture
  • 49. Example of Bukalapak • Save all data from 2014 'til now • >1.5PB data including: • Product images • Products data • Messaging
  • 50. Buka Lapak 'Big Data' Implementation
  • 54. Example: Gojek Data Visualization