Enviar pesquisa
Carregar
Sidi chang demo
•
0 gostou
•
84 visualizações
Sidi Chang
Seguir
Insight Data Science
Leia menos
Leia mais
Engenharia
Vista de apresentação de diapositivos
Denunciar
Compartilhar
Vista de apresentação de diapositivos
Denunciar
Compartilhar
1 de 15
Baixar agora
Baixar para ler offline
Recomendados
Probabilistic Data Structures and Approximate Solutions
Probabilistic Data Structures and Approximate Solutions
Oleksandr Pryymak
The Very ^ 2 Basics of R
The Very ^ 2 Basics of R
Winston Chen
R statistics with mongo db
R statistics with mongo db
MongoDB
Tech talk Probabilistic Data Structure
Tech talk Probabilistic Data Structure
Rishabh Dugar
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
Badrish Chandramouli
Sidi chang week_4.3
Sidi chang week_4.3
Sidi Chang
Ppt pkn endah
Ppt pkn endah
Tri_Endah_Sulistiani
Ppt Tri Endah
Ppt Tri Endah
Tri_Endah_Sulistiani
Recomendados
Probabilistic Data Structures and Approximate Solutions
Probabilistic Data Structures and Approximate Solutions
Oleksandr Pryymak
The Very ^ 2 Basics of R
The Very ^ 2 Basics of R
Winston Chen
R statistics with mongo db
R statistics with mongo db
MongoDB
Tech talk Probabilistic Data Structure
Tech talk Probabilistic Data Structure
Rishabh Dugar
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
Badrish Chandramouli
Sidi chang week_4.3
Sidi chang week_4.3
Sidi Chang
Ppt pkn endah
Ppt pkn endah
Tri_Endah_Sulistiani
Ppt Tri Endah
Ppt Tri Endah
Tri_Endah_Sulistiani
El estado colombiano
El estado colombiano
camilo charris
IMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketing
Kellen Dieterich
Base de datos
Base de datos
Yessica Yuliana Montealegre Amado
Creative commons
Creative commons
etevago lopez dofus
Riesgos laborales según las normas convenin
Riesgos laborales según las normas convenin
johalmy
Teori pendekatan gestalt
Teori pendekatan gestalt
Tri_Endah_Sulistiani
Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.
Suresh Babu G
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
ÚISK FF UK
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
ÚISK FF UK
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulace
ÚISK FF UK
Jakub Fiala: Quantified Self
Jakub Fiala: Quantified Self
ÚISK FF UK
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
Bayes Nets meetup London
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Nish Parikh
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
itstuff
Big data
Big data
canara engineering college
Big data
Big data
Harshit Namdev
Hadoop PDF
Hadoop PDF
1904saikrishna
Skillwise Big data
Skillwise Big data
Skillwise Group
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
QuantUniversity
Big data
Big data
Zeeshan Khan
Outlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
Pranab Ghosh
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
ibankuk
Mais conteúdo relacionado
Destaque
El estado colombiano
El estado colombiano
camilo charris
IMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketing
Kellen Dieterich
Base de datos
Base de datos
Yessica Yuliana Montealegre Amado
Creative commons
Creative commons
etevago lopez dofus
Riesgos laborales según las normas convenin
Riesgos laborales según las normas convenin
johalmy
Teori pendekatan gestalt
Teori pendekatan gestalt
Tri_Endah_Sulistiani
Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.
Suresh Babu G
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
ÚISK FF UK
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
ÚISK FF UK
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulace
ÚISK FF UK
Jakub Fiala: Quantified Self
Jakub Fiala: Quantified Self
ÚISK FF UK
Destaque
(11)
El estado colombiano
El estado colombiano
IMI_ebook_GuidetoInfluencerMarketing
IMI_ebook_GuidetoInfluencerMarketing
Base de datos
Base de datos
Creative commons
Creative commons
Riesgos laborales según las normas convenin
Riesgos laborales según las normas convenin
Teori pendekatan gestalt
Teori pendekatan gestalt
Suresh BIM HVAC Portfolio.
Suresh BIM HVAC Portfolio.
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Renata Salátová: Chcete vést časopis? Aneb slasti a strasti redaktorské práce
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Vít Richter: Veřejné knihovny a jejich prostor. Výsledky celostátního průzkum...
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Lukáš Kolek: Jak se vyvíjejí výukové simulace
Jakub Fiala: Quantified Self
Jakub Fiala: Quantified Self
Semelhante a Sidi chang demo
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
Bayes Nets meetup London
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Nish Parikh
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
itstuff
Big data
Big data
canara engineering college
Big data
Big data
Harshit Namdev
Hadoop PDF
Hadoop PDF
1904saikrishna
Skillwise Big data
Skillwise Big data
Skillwise Group
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
QuantUniversity
Big data
Big data
Zeeshan Khan
Outlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
Pranab Ghosh
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
ibankuk
Machine Learning Crash Course by Sebastian Raschka
Machine Learning Crash Course by Sebastian Raschka
PawanJayarathna1
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
NUS-ISS
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Subrata Kumer Paul
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
e2wi67sy4816pahn
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
Itai Yaffe
Clickstream data with spark
Clickstream data with spark
Marissa Saunders
Summit EU Machine Learning
Summit EU Machine Learning
MapR Technologies
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
Big data
Big data
Mohammad Reza Gerami
Semelhante a Sidi chang demo
(20)
Ralf Herbrich - Introduction to Graphical models in Industry
Ralf Herbrich - Introduction to Graphical models in Industry
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
Big data
Big data
Big data
Big data
Hadoop PDF
Hadoop PDF
Skillwise Big data
Skillwise Big data
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
Big data
Big data
Outlier and fraud detection using Hadoop
Outlier and fraud detection using Hadoop
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
Machine Learning Crash Course by Sebastian Raschka
Machine Learning Crash Course by Sebastian Raschka
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of...
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
Clickstream data with spark
Clickstream data with spark
Summit EU Machine Learning
Summit EU Machine Learning
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Big data
Big data
Último
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
DineshKumar4165
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
Asst.prof M.Gokilavani
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
RishantSharmaFr
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
sanyuktamishra911
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
SUHANI PANDEY
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
tanu pandey
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
bhaskargani46
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Suman Jyoti
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
M Maged Hegazy, LLM, MBA, CCP, P3O
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
Kamal Acharya
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
DineshKumar4165
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
JiananWang21
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
Thermal Engineering Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
DineshKumar4165
University management System project report..pdf
University management System project report..pdf
Kamal Acharya
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Call Girls in Nagpur High Profile
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
fenichawla
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
roncy bisnoi
Último
(20)
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
Thermal Engineering Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
University management System project report..pdf
University management System project report..pdf
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Sidi chang demo
1.
Sidi Chang Insight Data
Science Data Engineering Fellow Jul 2016 JustBid
2.
Sealed/blind second price
auction Item Bidder
3.
• Demo
4.
Data Pipeline Simulated Data
5.
Data • 10K bidders •
Nearly 15 million bidding
6.
Recommendation—Jaccard Similarity Jaccard Similarity: D_i
= user_i C_i = items(user_i)
7.
Recommendation For 𝑵 = 𝟏𝟎 million, it takes more than a year(AWS m4.large cluster)… Then we will need to use minHash Algorithm which can be easily distributed… Do an unbiased estimation by Chernoff Bounds and Markov Inequality: The expected error is
8.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 Hash 2
9.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 Hash 2
10.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 Hash 2
11.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 1 Hash 2
12.
MinHash Example Item Row
User 1 User 2 User 3 User 4 x+1 mod 5 3x+1 mod 5 1 0 1 0 0 1 1 1 2 1 0 0 1 0 2 4 3 2 0 1 0 1 3 2 4 3 1 0 1 1 4 0 5 4 0 0 1 0 0 3 U1 U2 U3 U4 Hash 1 1 3 0 1 Hash 2 0 2 0 0
13.
Performance
14.
Challenges • MinHash Algorithm
implemented in distributed system • Jaccard Similarity Tested in distributed system • Use right data structures to faster computation • Use both Scala and Python
15.
About me • MS
in CS and Operations Research
Baixar agora