SlideShare uma empresa Scribd logo
1 de 162
challenges, learnings and opportunities
presented by imron zuhri, adit, and samudra
KUDO codefest 14 May 2016
machine learning
can a machine think?
in 1996, Garry Kasparov was not afraid of a computer, and he won
the next year, he played against a new and improved Deep Blue and lost
this is the move that was so surprising, so un-machine-like,
that he was sure the IBM team had cheated
Rd5
Rd1
a random move, a computer bug
to kasparov, a sign of superior intelligence
Rd5
Rd1
big data analytics, is the culmination
of the machine way of thinking
we can now immensely
extend our memory and computational power
to helped us doing that
what is machine learning
some definitions
 a (hypnotized) user’s perspective
a scientific (witchcraft) field that:
researches fundamental principles from data (potions) and
develops magical algorithms (spells to cast)
 (pascal vincent, 2015)
 field of study that gives computers the ability to learn without
being explicitly programmed
 arthur samuel (1959)
 formal definitions (tom mitchell, 1998):
“A machine is said to be learning IF
it improves with:
 each experience E
 on specific tasks T
 with specific performance P
CURRENT VIEW OF ML FOUNDING DISCIPLINES
10
three niches for machine learning
data mining: using historical data to improve
decisions
 medical records  medical knowledge
software applications that are difficult to program
by hand
 autonomous driving
 image classification
user modeling
 automatic recommender systems
source: rong jin, 2013
(some) open problems in machine learning
 one-shot learning
 unsupervised learning
 reinforced learning
 artificial general intelligence
“most of human and animal learning
is unsupervised learning. If
intelligence was a cake, unsupervised
learning would be the cake,
supervised learning would be the
icing on the cake, and reinforcement
learning would be the cherry on the
cake. We know how to make the icing
and the cherry, but we don't know
how to make the cake.”
yan lecun
challenges in machine learning
 data-related:
 abundant yet scattered data
 unstructured, noisy data
 offline-stored data (duh!)
 resource-related:
 data storage
 space constraints
 computing power
 training time
 inve$$$tments
• initial investments
• running costs
challenges in machine learning
 methodical issues:
 result consistency
(i.e. accuracy)
 overfitting
 algorithm computational efficiency
 miscellaneous:
 architectural differences/
 portability issues
 popularity of non-open standard, vendor-
locked compute libraries/apis
(rawr!)
recent breakthroughs in machine learning
deepmind atari q learner (2014)
plays 5 kinds of atari 2600 games
states: pixels in atari
actions: left/right move
reward: score
algorithm used:
feedforward “q-learning”
conv-net
for unsupervised map of reward
recent breakthroughs in machine learning
the translator (2015)
real-time translations of speech
from/into 7 different languages
able to run from even from
resource-constrained embedded
hardware (i.e. smartphones)
uses same engine that was used in
microsoft cortana (creepy!)
Reinforcement Learning: DeepMind AlphaGo
 google deepmind alphago (2016)
 99.8% winning rate
vs other algorithm
 first program to defeat
human go champion
 algorithm used:
 deep neural network
 monte carlo search tree
 supervised learning from expert games
 reinforcement learning vs other alphago instances
supervised learning: random forest
deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data,
result:
 top 5 are random forest classifier
 for kaggle competition, try gbm : xgboost.
supervised: deep learning
don’t be fooled, dl research improve
part by part, either new kind of layer,
new activation function, new non-
convex optimization solver, or deeper
neural net.
from rodrigo benenson
deep learning accuracies ranking
supervised: deep learning
summary:
 relu works better than sigmoid function for activation.
 maxout works better when applied to dropconnect for
activation function.
 dropout layer works to fight overfitting.
 adagrad and adadelta works better if you don’t want to
tune optimization hyperparameter.
 deeper layer works: highway layer and residual layer.
unsupervised: t-sne
t-stochastic neighbor embedding
maaten and hinton (2008):
mnist data set visualization
 works best for data-viz
 can be used for clustering too
(if you’d bother to tweak the algo)
Given 100 and 1000 label of data, and the other unlabeled (~50.000)
Try to predict 10.000 future data.
● It works! with small label data.
● Now we don’t have to tell some interns or PhD student to label some
data. :)
A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015)
semi-supervised learning: ladder neural networks
collaborative filtering: restricted boltzmann machine
rbm for collaborative learning (hinton, 2008):
 it has been used in netflix and spotify algo.
 it works better than svd!
 correlation(svd, rbm) : -1 < c < 1
• can be assembled with svd
 to improve the prediction.
some advices for applied machine learning research
(this competition)
 preprocessing: scaling & imputation
 cross-validation: choose best algos
 hyperparameter optimization
 ensembling n-models: dark knowledge
raschka(2014):
scaling improve prediction!
gelman(2006)
do prediction for n/a data, then
predict the data with noise
less biased!
data preprocessing: scaling & imputation
cross-validation: how to choose best algo?
 cross-validation is a must!
 (tibshirani et.al 2014)
 don’t overlap your cross-
validation data partition!
 (zhang, data robot)
hyperparameter optimization
if you want to search best hyperparamaters:
do random search.
random search is better than grid search
(bengio, 2012)
ensembling n-models: dark knowledge
If two model give same accuracy, but low
correlation of prediction output, then we can
improve prediction accuracy by averaging
model prediction.
(Hinton, 2015)
the landscape of opportunities
Popular Big Data Industry
Financial Services Telco Web/Media Retail Healthcare Government
• Fraud detection
• Compliance
reporting
• Portfolio analysis
• Customer
statements
• Wire transfer alerts
• Customer
acquisition,
retention, and
profitability
• Subscriber data
management
• Fraud analysis
• Social analysis
• Response times
• Traffic analysis
• Product
affinity/bundling
• Sentiment Analysis
• Content
monetization
• Advertising
optimization
• Optimization of user
experience/ click
stream analysis
• Network
optimization to
support service
levels
• Store operation
analysis
• Customer loyalty
programs
• Collaborative
planning and
forecasting
• Loss prevention
• Supply chain
optimization
• Drug development
and launch cost
reduction
• Regulatory
compliance
• Product quality
• Return on
promotional
investment
• Lowered risk of new
product success
• Security/anti-terror
• Recovery Act public
disclosure
• Budgetary control
and management
• Educational
reporting
• Asset control and
assessment
Environment
monitoring
*cisco 2013-2014
currently the biggest prescriptive analytics engine:
contextual advertising
http://www.flashtalking.com/us/targeted-ads/
another one:
marketplace and services recommendation engine
challenges of implementation
and
what we do with machine learning
do you follow waze instruction during the first one week?
 would you buy a self-driving car that couldn’t drive
itself in 99 percent of the country?
 or that knew nearly nothing about parking,
 couldn’t be taken out in snow or heavy rain,
 and would drive straight over a gaping pothole?
if your answer is yes, then check out the google self-driving car, model year
2014
but
can we trust them enough?
the BIGGEST CHALLENGES in indonesia
DATA SETS
the current analytics technology
human still doing
most of the process
the current challenges of big data analytics?
heterogeneous
data sources,
systems and
formats
time consuming
and complex
data preparation
process
almost
impossible task
of integrating
various kind of
data
it requires
experts to
analyze big and
complex data
most of the user
interactions are
not intuitive
“Before performing analytics, data scientists must first
format and prepare the raw data for analytics, often with
more than 80% of the effort.”, said Intel Corp. Research
what it would be like,
if we can simplify the whole process?
?
?
hence our vision
we believe human should not be bogged down by tedious matters.
by reimagining analytics we envisioned the creation of intelligent
machines,
that will free human to focus on solving the world’s toughest
problems.
intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
x-men
then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
gundam
batman
sith
and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
regi
mita
gundam
batman
tom
mediatrac
gundam
batman
sith
through a highly intuitive and natural user interface
natural language interface
voice and gesture recognition
ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
digital
telco
legal
retail
healthcare
agriculture
multi format
structured
unstructured
unclean
missing data
unstandardized
unconnected
difficult to analyze
cleaned and standardized
enriched and validated
connected at granular level
analytics ready
data
automatic
data collection
automatic
data preparation
automatic
data integration
teritory management
CONFIDENTIAL for internal use only
all of our silo data will have a totally elevated value,
once you connect them all in a meaningful way
are all of our current data connected yet?
Almost…
google is a humongous library index, with a smart
library card search that redirects you to the original
documents
facebook is a giant personal scrapbook of all your
acquaintances that are currently linked by manual
tagging and friends list
source:techglimpse
youtube and instagram are a huge repository of
current knowledge, lifestyle and trends that are still
largely unconnected
now imagine this!
when we can have intelligent machines that can
connect everything, in a meaningful way…
we can start asking questions, on things we never
thought possible to be asked before
can map songs across social
graphs.
Spotify
can give us situational data — where
someone is listening to a song,
when, how and even (to an extent) why.
Shazam
can help us track the growth of a song
using search and streams.
YouTube
are becoming hotbeds for music discovery.
Instagram & Vine
If we can connect all their data together?
or if you have a radio station, what sort of playlist that will appeal to
your target audience, if we know, that a sizeable percentage of them
have a hummer?
we can even predict specific combination of words, notes and
beats that will increase the chance of putting the song in
billboard top 40 this upcoming season.
here are some sample of hidden insights
that we can discover from our own large repository of data,
using our intelligent data integration and data discovery tools
when we integrate historical media articles with geodemographic and point of
interest database we can create a model that can predict high probability of fire
incidence down to street level
productivy optimization
lessons learned including how to scale your ML
scalability problems - outline
 large scale machine learning
 mahout - scalable ml on hadoop
 jubatus – distributed online real-time ml
 vowpal wabbit – fast learning at yahoo/ms
 trident ml and storm pattern: ml on storm, yarn
 upcoming --- samoa: ml on s4, storm
 issues in scalable distributed ml
 load balancing
 auto scaling
 job scheduling
 workflow management
 data and model parallelism
 parameter server framework
 peer-to-peer framework
scalability problems - outline
 distributed deep learning
 yahoolda: scalable parallel framework in latent variable models
 distbelief – distributed deep learning on cluster
 h2o – distributed deep learning on spark
 adam at msr – distributed deep learning
 dl4j – open source for deep learning on hadoop and spark
 petuum – distributed machine learning
 singa – distributed deep learning
 tensorflow: google large scale distributed dl
 mxnet: heterogeneous distributed deep learning
 caffee on spark: yahoo
 distributed learning and optimization
 proximal splitting/auxiliary coordinates;
 bundle (sub-gradient);
 shotgun: parallelized cdm (coordinate descent method)
 asynchronous sgd;
 hogwild/dogwild;
what’s next?
emerging analytics technology for automatic
analytics on large dimensional data
online deep learning
topological data analysis
fuzzy-rough set based data exploration system
granular computing
kernel set and spatiotemporal analysis
applied differential geometry
non axiomatic reasoning system
intelligent rule and knowledge extraction/discovery
multi agent based modeling
weak signal detection and analysis
bayesian networks analysis
genetic programming
self organizing neural networks
and also more humanlike user
interaction and data visualization
technology
eye tracking
glass-free auto stereoscopy
touch sensitive hologram
natural language user interface
tangible user interface
wearable gestural interface
brain-computer interface
sensor network user interface
In the meantime
principles for the development of a complete mind:
study the science of art. study the art of science.
develop your senses — especially learn how to see.
realize that everything connects to everything else.
Leonardo DaVinci

Mais conteúdo relacionado

Mais procurados

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning Tasks
Hima Patel
 

Mais procurados (20)

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Machine learning
Machine learningMachine learning
Machine learning
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Machine learning
Machine learningMachine learning
Machine learning
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Introduction to Deep Learning
Introduction to Deep Learning Introduction to Deep Learning
Introduction to Deep Learning
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Supervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSupervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And Techniques
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning Tasks
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 

Semelhante a Machine Learning - Challenges, Learnings & Opportunities

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 

Semelhante a Machine Learning - Challenges, Learnings & Opportunities (20)

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Intro to AI.pptx
Intro to AI.pptxIntro to AI.pptx
Intro to AI.pptx
 
What is Artificial Intelligence - Beginners
What is Artificial Intelligence - BeginnersWhat is Artificial Intelligence - Beginners
What is Artificial Intelligence - Beginners
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
SSE 2017 10-09
SSE 2017 10-09SSE 2017 10-09
SSE 2017 10-09
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGYAI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van Tol
 

Mais de CodePolitan

Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
CodePolitan
 

Mais de CodePolitan (19)

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium Member
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarov
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potential
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayat
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexander
 
Vison final
Vison   finalVison   final
Vison final
 
Tride
TrideTride
Tride
 
React ftw
React ftwReact ftw
React ftw
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for Hackathon
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.js
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User Profiling
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of Things
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOP
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Último (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

Machine Learning - Challenges, Learnings & Opportunities

  • 1. challenges, learnings and opportunities presented by imron zuhri, adit, and samudra KUDO codefest 14 May 2016 machine learning
  • 2. can a machine think?
  • 3. in 1996, Garry Kasparov was not afraid of a computer, and he won the next year, he played against a new and improved Deep Blue and lost
  • 4. this is the move that was so surprising, so un-machine-like, that he was sure the IBM team had cheated Rd5 Rd1
  • 5. a random move, a computer bug to kasparov, a sign of superior intelligence Rd5 Rd1
  • 6. big data analytics, is the culmination of the machine way of thinking we can now immensely extend our memory and computational power to helped us doing that
  • 7. what is machine learning
  • 8. some definitions  a (hypnotized) user’s perspective a scientific (witchcraft) field that: researches fundamental principles from data (potions) and develops magical algorithms (spells to cast)  (pascal vincent, 2015)  field of study that gives computers the ability to learn without being explicitly programmed  arthur samuel (1959)  formal definitions (tom mitchell, 1998): “A machine is said to be learning IF it improves with:  each experience E  on specific tasks T  with specific performance P
  • 9. CURRENT VIEW OF ML FOUNDING DISCIPLINES
  • 10. 10 three niches for machine learning data mining: using historical data to improve decisions  medical records  medical knowledge software applications that are difficult to program by hand  autonomous driving  image classification user modeling  automatic recommender systems source: rong jin, 2013
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. (some) open problems in machine learning  one-shot learning  unsupervised learning  reinforced learning  artificial general intelligence “most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake.” yan lecun
  • 50. challenges in machine learning  data-related:  abundant yet scattered data  unstructured, noisy data  offline-stored data (duh!)  resource-related:  data storage  space constraints  computing power  training time  inve$$$tments • initial investments • running costs
  • 51. challenges in machine learning  methodical issues:  result consistency (i.e. accuracy)  overfitting  algorithm computational efficiency  miscellaneous:  architectural differences/  portability issues  popularity of non-open standard, vendor- locked compute libraries/apis (rawr!)
  • 52. recent breakthroughs in machine learning deepmind atari q learner (2014) plays 5 kinds of atari 2600 games states: pixels in atari actions: left/right move reward: score algorithm used: feedforward “q-learning” conv-net for unsupervised map of reward
  • 53. recent breakthroughs in machine learning the translator (2015) real-time translations of speech from/into 7 different languages able to run from even from resource-constrained embedded hardware (i.e. smartphones) uses same engine that was used in microsoft cortana (creepy!)
  • 54. Reinforcement Learning: DeepMind AlphaGo  google deepmind alphago (2016)  99.8% winning rate vs other algorithm  first program to defeat human go champion  algorithm used:  deep neural network  monte carlo search tree  supervised learning from expert games  reinforcement learning vs other alphago instances
  • 55. supervised learning: random forest deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data, result:  top 5 are random forest classifier  for kaggle competition, try gbm : xgboost.
  • 56. supervised: deep learning don’t be fooled, dl research improve part by part, either new kind of layer, new activation function, new non- convex optimization solver, or deeper neural net. from rodrigo benenson deep learning accuracies ranking
  • 57. supervised: deep learning summary:  relu works better than sigmoid function for activation.  maxout works better when applied to dropconnect for activation function.  dropout layer works to fight overfitting.  adagrad and adadelta works better if you don’t want to tune optimization hyperparameter.  deeper layer works: highway layer and residual layer.
  • 58. unsupervised: t-sne t-stochastic neighbor embedding maaten and hinton (2008): mnist data set visualization  works best for data-viz  can be used for clustering too (if you’d bother to tweak the algo)
  • 59. Given 100 and 1000 label of data, and the other unlabeled (~50.000) Try to predict 10.000 future data. ● It works! with small label data. ● Now we don’t have to tell some interns or PhD student to label some data. :) A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015) semi-supervised learning: ladder neural networks
  • 60. collaborative filtering: restricted boltzmann machine rbm for collaborative learning (hinton, 2008):  it has been used in netflix and spotify algo.  it works better than svd!  correlation(svd, rbm) : -1 < c < 1 • can be assembled with svd  to improve the prediction.
  • 61. some advices for applied machine learning research (this competition)  preprocessing: scaling & imputation  cross-validation: choose best algos  hyperparameter optimization  ensembling n-models: dark knowledge
  • 62. raschka(2014): scaling improve prediction! gelman(2006) do prediction for n/a data, then predict the data with noise less biased! data preprocessing: scaling & imputation
  • 63. cross-validation: how to choose best algo?  cross-validation is a must!  (tibshirani et.al 2014)  don’t overlap your cross- validation data partition!  (zhang, data robot)
  • 64. hyperparameter optimization if you want to search best hyperparamaters: do random search. random search is better than grid search (bengio, 2012)
  • 65. ensembling n-models: dark knowledge If two model give same accuracy, but low correlation of prediction output, then we can improve prediction accuracy by averaging model prediction. (Hinton, 2015)
  • 66. the landscape of opportunities
  • 67.
  • 68. Popular Big Data Industry Financial Services Telco Web/Media Retail Healthcare Government • Fraud detection • Compliance reporting • Portfolio analysis • Customer statements • Wire transfer alerts • Customer acquisition, retention, and profitability • Subscriber data management • Fraud analysis • Social analysis • Response times • Traffic analysis • Product affinity/bundling • Sentiment Analysis • Content monetization • Advertising optimization • Optimization of user experience/ click stream analysis • Network optimization to support service levels • Store operation analysis • Customer loyalty programs • Collaborative planning and forecasting • Loss prevention • Supply chain optimization • Drug development and launch cost reduction • Regulatory compliance • Product quality • Return on promotional investment • Lowered risk of new product success • Security/anti-terror • Recovery Act public disclosure • Budgetary control and management • Educational reporting • Asset control and assessment Environment monitoring *cisco 2013-2014
  • 69.
  • 70.
  • 71. currently the biggest prescriptive analytics engine: contextual advertising http://www.flashtalking.com/us/targeted-ads/
  • 72. another one: marketplace and services recommendation engine
  • 73. challenges of implementation and what we do with machine learning
  • 74. do you follow waze instruction during the first one week?
  • 75.  would you buy a self-driving car that couldn’t drive itself in 99 percent of the country?  or that knew nearly nothing about parking,  couldn’t be taken out in snow or heavy rain,  and would drive straight over a gaping pothole? if your answer is yes, then check out the google self-driving car, model year 2014
  • 76. but
  • 77. can we trust them enough?
  • 78. the BIGGEST CHALLENGES in indonesia
  • 80. the current analytics technology human still doing most of the process
  • 81. the current challenges of big data analytics? heterogeneous data sources, systems and formats time consuming and complex data preparation process almost impossible task of integrating various kind of data it requires experts to analyze big and complex data most of the user interactions are not intuitive “Before performing analytics, data scientists must first format and prepare the raw data for analytics, often with more than 80% of the effort.”, said Intel Corp. Research
  • 82. what it would be like, if we can simplify the whole process? ? ?
  • 83. hence our vision we believe human should not be bogged down by tedious matters. by reimagining analytics we envisioned the creation of intelligent machines, that will free human to focus on solving the world’s toughest problems.
  • 84. intelligent machines that can helped us collect the massive amount of data automatically reads and connects to any kind of data, including automatic machine to machine connections structured data printed invoices social media conversation
  • 85. intelligent machines that can helped us collect the massive amount of data automatically reads and connects to any kind of data, including automatic machine to machine connections structured data printed invoices social media conversation
  • 86. then helped us separate the signals from the noise automatic data quality assessments, data cleansing and data filtering regi mita gundam x-men
  • 87. then helped us separate the signals from the noise automatic data quality assessments, data cleansing and data filtering regi mita gundam
  • 88. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam
  • 89. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam batman tom mediatrac
  • 90. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam batman tom mediatrac
  • 91. and finally helped us making sense of the massively connected data contextual search and recommendation intelligent data discovery gundam batman sith
  • 92. and finally helped us making sense of the massively connected data contextual search and recommendation intelligent data discovery regi mita gundam batman tom mediatrac gundam batman sith
  • 93. through a highly intuitive and natural user interface natural language interface voice and gesture recognition ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
  • 95.
  • 96. multi format structured unstructured unclean missing data unstandardized unconnected difficult to analyze cleaned and standardized enriched and validated connected at granular level analytics ready data automatic data collection automatic data preparation automatic data integration
  • 97.
  • 98.
  • 100. all of our silo data will have a totally elevated value, once you connect them all in a meaningful way
  • 101. are all of our current data connected yet?
  • 103. google is a humongous library index, with a smart library card search that redirects you to the original documents
  • 104. facebook is a giant personal scrapbook of all your acquaintances that are currently linked by manual tagging and friends list source:techglimpse
  • 105. youtube and instagram are a huge repository of current knowledge, lifestyle and trends that are still largely unconnected
  • 107. when we can have intelligent machines that can connect everything, in a meaningful way… we can start asking questions, on things we never thought possible to be asked before
  • 108. can map songs across social graphs. Spotify can give us situational data — where someone is listening to a song, when, how and even (to an extent) why. Shazam can help us track the growth of a song using search and streams. YouTube are becoming hotbeds for music discovery. Instagram & Vine If we can connect all their data together?
  • 109. or if you have a radio station, what sort of playlist that will appeal to your target audience, if we know, that a sizeable percentage of them have a hummer?
  • 110. we can even predict specific combination of words, notes and beats that will increase the chance of putting the song in billboard top 40 this upcoming season.
  • 111. here are some sample of hidden insights that we can discover from our own large repository of data, using our intelligent data integration and data discovery tools
  • 112. when we integrate historical media articles with geodemographic and point of interest database we can create a model that can predict high probability of fire incidence down to street level
  • 113.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120. lessons learned including how to scale your ML
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153. scalability problems - outline  large scale machine learning  mahout - scalable ml on hadoop  jubatus – distributed online real-time ml  vowpal wabbit – fast learning at yahoo/ms  trident ml and storm pattern: ml on storm, yarn  upcoming --- samoa: ml on s4, storm  issues in scalable distributed ml  load balancing  auto scaling  job scheduling  workflow management  data and model parallelism  parameter server framework  peer-to-peer framework
  • 154. scalability problems - outline  distributed deep learning  yahoolda: scalable parallel framework in latent variable models  distbelief – distributed deep learning on cluster  h2o – distributed deep learning on spark  adam at msr – distributed deep learning  dl4j – open source for deep learning on hadoop and spark  petuum – distributed machine learning  singa – distributed deep learning  tensorflow: google large scale distributed dl  mxnet: heterogeneous distributed deep learning  caffee on spark: yahoo  distributed learning and optimization  proximal splitting/auxiliary coordinates;  bundle (sub-gradient);  shotgun: parallelized cdm (coordinate descent method)  asynchronous sgd;  hogwild/dogwild;
  • 156.
  • 157.
  • 158.
  • 159. emerging analytics technology for automatic analytics on large dimensional data online deep learning topological data analysis fuzzy-rough set based data exploration system granular computing kernel set and spatiotemporal analysis applied differential geometry non axiomatic reasoning system intelligent rule and knowledge extraction/discovery multi agent based modeling weak signal detection and analysis bayesian networks analysis genetic programming self organizing neural networks
  • 160. and also more humanlike user interaction and data visualization technology eye tracking glass-free auto stereoscopy touch sensitive hologram natural language user interface tangible user interface wearable gestural interface brain-computer interface sensor network user interface
  • 162. principles for the development of a complete mind: study the science of art. study the art of science. develop your senses — especially learn how to see. realize that everything connects to everything else. Leonardo DaVinci