SlideShare uma empresa Scribd logo
1 de 18
Big-Data Innovation
Hadoop , Real - time, and Machine Learning
A n d y F e n g
Ya h o o
Hadoop Clusters at Yahoo
How does Yahoo use its Big Data Tech?
3Yahoo Confidential & Proprietary
4
Personalized Content
Content augmentation
User profiling
Recommendation
Mail Smart Views
5 Yahoo Confidential & Proprietary
Flickr: flickr.com/cameraroll
Weather
7
 Beauty
› Computational
assessed
 Relevant
› Location
› Time
› Cloudy
› Shower
› …
Weather App Yahoo Weather App
Search
Query intention
Page ranking
Ads matching
Search2Vec: Cosine similarity b/w ads and queries
9 Yahoo Confidential & Proprietary
Big-Data Impact: Search2Vec
http://bit.ly/28SLdjU
10
Bucket Tests Query
Coverage
Auction Depth Revenue per Search
Simple model vs.
Baseline
+1.14% +2.13% +7.07%
Advanced model
vs. Small model
+2.44% +2.39% +9.39%
(= +17.12% vs. Baseline)
Open Source
11Yahoo Confidential & Proprietary
CaffeOnSpark: Distributed Deep Learning
github.com/yahoo/caffeonspark
Powerful
DL Platform
Fully
Distributed
High-level
API
Incremental
Learning
Existing
Clusters
12
Interactive Analytics
Data Sketches Algorithms Library
datasketches.github.io
Sub-second User Facing Analytics
druid.io
13
Apache Storm: Real-time Processing
https://storm.apache.org
MT & RA
Scheduler
Dist. Cache
API
8 x
Throughput
Improved
Debuggability
1 github.com/yahoo/streaming-benchmarks
Pacemaker
Server
Streaming
Benchmark 1
14
Apache Omid: Transactions for NoSQL DB
http://omid.incubator.apache.org/
• Multi-row/multi-table transactions
• Snapshot isolation
• Lock-free
15
ACID
Transactions
Yahoo Hadoop Stack
16Yahoo Confidential & Proprietary
17
18
Thanks!
yahoohadoop.tumblr.com
bigdata@yahoo-inc.com

Mais conteúdo relacionado

Destaque

Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate UniversityBig Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate UniversityData Con LA
 
Apache Mesos and the new Open Source Architecture of the Modern Datacenter
Apache Mesos and the new Open Source Architecture of the Modern DatacenterApache Mesos and the new Open Source Architecture of the Modern Datacenter
Apache Mesos and the new Open Source Architecture of the Modern DatacenterData Con LA
 
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guill...
Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guill...Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guill...
Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guill...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Data Con LA
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Data Con LA
 

Destaque (15)

Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate UniversityBig Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
Big Data Day LA 2016 Keynote - Tom Horan/ Claremont Graduate University
 
Apache Mesos and the new Open Source Architecture of the Modern Datacenter
Apache Mesos and the new Open Source Architecture of the Modern DatacenterApache Mesos and the new Open Source Architecture of the Modern Datacenter
Apache Mesos and the new Open Source Architecture of the Modern Datacenter
 
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
 
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
 
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
Big Data Day LA 2016/ Data Science Track - Enabling Cross-Screen Advertising ...
 
Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guill...
Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guill...Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guill...
Big Data Day LA 2016/ Big Data Track - Real Time Analytics with Druid - Guill...
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
 

Semelhante a Big Data Day LA 2016 Keynote - Andy Feng/ Yahoo

SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! Sumeet Singh
 
Agencies Developer Products
Agencies Developer ProductsAgencies Developer Products
Agencies Developer ProductsJeff Eddings
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data CenterAbe Usher
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Open Analytics
 
Future of Search | Yury Lifshits, Yahoo! Research
Future of Search | Yury Lifshits, Yahoo! ResearchFuture of Search | Yury Lifshits, Yahoo! Research
Future of Search | Yury Lifshits, Yahoo! ResearchYury Lifshits
 
Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Co...
Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Co...Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Co...
Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Co...Yahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
Facebook Open Graph - The Semantic Wallet
Facebook Open Graph - The Semantic WalletFacebook Open Graph - The Semantic Wallet
Facebook Open Graph - The Semantic WalletJonathan Laba
 
Data driven mobile UX - UX insight 2017, uxinsight.nl
Data driven mobile UX -  UX insight 2017, uxinsight.nlData driven mobile UX -  UX insight 2017, uxinsight.nl
Data driven mobile UX - UX insight 2017, uxinsight.nlJorden Lentze
 
The Future of Vertical Search Engines
The Future of Vertical Search EnginesThe Future of Vertical Search Engines
The Future of Vertical Search EnginesTed Drake
 
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008Darlene Fichter
 
Why Progressive Web Apps will transform your website
Why Progressive Web Apps will transform your websiteWhy Progressive Web Apps will transform your website
Why Progressive Web Apps will transform your websiteJason Grigsby
 
Advanced Information Gathering AKA Google Hacking
Advanced Information Gathering AKA Google HackingAdvanced Information Gathering AKA Google Hacking
Advanced Information Gathering AKA Google HackingGareth Davies
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15Edureka!
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Edureka!
 
WordCamp London 2019 - Content monetisation platforms with WordPress
WordCamp London 2019 - Content monetisation platforms with WordPressWordCamp London 2019 - Content monetisation platforms with WordPress
WordCamp London 2019 - Content monetisation platforms with WordPressAngry Creative (UK)
 
Structured Data & Schema.org - SMX Milan 2014
Structured Data & Schema.org - SMX Milan 2014Structured Data & Schema.org - SMX Milan 2014
Structured Data & Schema.org - SMX Milan 2014Bastian Grimm
 

Semelhante a Big Data Day LA 2016 Keynote - Andy Feng/ Yahoo (20)

SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
 
Agencies Developer Products
Agencies Developer ProductsAgencies Developer Products
Agencies Developer Products
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)
 
Future of Search | Yury Lifshits, Yahoo! Research
Future of Search | Yury Lifshits, Yahoo! ResearchFuture of Search | Yury Lifshits, Yahoo! Research
Future of Search | Yury Lifshits, Yahoo! Research
 
Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Co...
Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Co...Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Co...
Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Co...
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
Facebook Open Graph - The Semantic Wallet
Facebook Open Graph - The Semantic WalletFacebook Open Graph - The Semantic Wallet
Facebook Open Graph - The Semantic Wallet
 
Data driven mobile UX - UX insight 2017, uxinsight.nl
Data driven mobile UX -  UX insight 2017, uxinsight.nlData driven mobile UX -  UX insight 2017, uxinsight.nl
Data driven mobile UX - UX insight 2017, uxinsight.nl
 
The Future of Vertical Search Engines
The Future of Vertical Search EnginesThe Future of Vertical Search Engines
The Future of Vertical Search Engines
 
Search V Next Final
Search V Next FinalSearch V Next Final
Search V Next Final
 
Not Your Mom's SEO
Not Your Mom's SEONot Your Mom's SEO
Not Your Mom's SEO
 
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
Data 2.0 - Harnessing New Data Visualization Tools CIL 2008
 
Why Progressive Web Apps will transform your website
Why Progressive Web Apps will transform your websiteWhy Progressive Web Apps will transform your website
Why Progressive Web Apps will transform your website
 
Advanced Information Gathering AKA Google Hacking
Advanced Information Gathering AKA Google HackingAdvanced Information Gathering AKA Google Hacking
Advanced Information Gathering AKA Google Hacking
 
Hadoop Webinar 28July15
Hadoop Webinar 28July15Hadoop Webinar 28July15
Hadoop Webinar 28July15
 
Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?Is It A Right Time For Me To Learn Hadoop. Find out ?
Is It A Right Time For Me To Learn Hadoop. Find out ?
 
WordCamp London 2019 - Content monetisation platforms with WordPress
WordCamp London 2019 - Content monetisation platforms with WordPressWordCamp London 2019 - Content monetisation platforms with WordPress
WordCamp London 2019 - Content monetisation platforms with WordPress
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Structured Data & Schema.org - SMX Milan 2014
Structured Data & Schema.org - SMX Milan 2014Structured Data & Schema.org - SMX Milan 2014
Structured Data & Schema.org - SMX Milan 2014
 

Mais de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Big Data Day LA 2016 Keynote - Andy Feng/ Yahoo

Notas do Editor

  1. http://storm.apache.org/2016/04/12/storm100-released.html Big-Data Technology Innovation: Hadoop, Real-time, and Machine Learning Andy Feng VP Architecture, Yahoo! Yahoo started developing big-data technology with Hadoop MapReduce and File System in 2006, and made it an Apache open source project in 2009. Since then, big data has become a major component of the global tech industry, and Yahoo is leading the way. In the past three years, Yahoo has been a leading contributor to Apache Storm for event processing, Apache HBase for distributed NoSQL stores, Apache Spark for faster processing, and Druid for sub-second analytics. We have created new open source projects such as Apache Omid for transactional support of NoSQL stores, Yahoo Data Sketches for approximate analytics, and Yahoo CaffeOnSpark for distributed deep learning. In this talk, we walk through Yahoo use cases (search, advertising, personalization, and Flickr) where our big-data technologies are best exemplified. We explain how Yahoo leverages these technologies to perform real-time processing and advanced machine learning against 600 petabytes of data, and describe the system architecture of our heterogeneous clusters of 40,000 servers for supporting a variety of workloads. We provide an overview of open source technologies (Apache Storm, Apache HBase, Apache Omid, and Yahoo CaffeOnSpark) and our in-house technology for large-scale machine learning.  We discuss how academic researchers and industry technologists can help advance big-data technologies further. Bio Dr. Andy Feng is a VP of Architecture at Yahoo leading the architecture and design of big data and machine learning initiatives. He’s architected major platforms for personalization, ad serving, NoSQL, and cloud infrastructure. Prior to Yahoo, he was a Chief Architect at Netscape/AOL, and Principal Scientist at Xerox.
  2. (1min – T 2min) Yahoo has a long history of involvement with Hadoop and Big Data. From our initial work to open source Hadoop in Apache, to our effort stabilizing YARN at Scale, to recent efforts working with the Apache Community to drive Scale and utilization in Hadoop, the Hadoop Stack, and Storm. I plan to overview several areas which were all talks this year at Hadoop Summit: Overcommit of Hardware, Migration to Tez to improve Pig/Hive job utilization, and finally Storm’s Resource Aware Scheduler to improve Storm hardware utilization with the hope that you will be intrigued enough to attend the talk or go back and watch it after the summit TODOS: Work in Reference last Sumeet keynote? Work in reference the YARN talk from 2-3 years ago and breaking Scale benchmark Maybe Mention Community effort (find way to mention Hortonworks on Tez, Twitter? others?
  3. We released the magic view as part of the Flickr 4.0 release last April, and this is the most visible user-facing feature that exposes our image recognition capabilities. Our users can switch from the traditional timeline view of their photo to an experience where their photos are arranged according to 70 categories. For example, you can see here that landscape photos are sub-categorized into different types such as mountain, rock, or shore. This is a great feature for serendipitous photo discovery. Most of us have thousands of photos that we don’t get to see very often but are emotionally very attached to, and these types of groupings help us re-discover photos.
  4. Let’s start with WHY machine learning. Search is one of the key applications for Yahoo. For a user’s search phrase, we construct a result page with organic contents together with ads. To generate result page, we rank contents basedtheir relevance to query terms, match ads against query, and predict the probability of ad click. Several machine learning algorithms are applied in this process: decision tree, logistic regression and neural network.
  5. (2 min – T 7 min) And, best yet, CaffeOnSpark was open sourced last month with Apache 2.0 license
  6. (1 min – T 11 min) Certain class of problems in big data analytics don’t scale well due to queries taking too much time or resources, such as count distinct, most frequent, quantiles etc. And that’s where Sketches algorithms come in where “good enough” approximate answers work great for interactivity (and real-time stream data) We have used Sketches successfully for several use cases such as audience analytics and Flurry analytics for our Mobile Developer Suite Sketches integrates really well with Druid for sub-second OLAP where we have many lots of contributions recently such as dimension joins, reliable pull-based real-time ingestion, and schema introspection Sketches is now available in open source, and integrates well with Pig and Hive from the Hadoop ecosystem
  7. (2 min – T 10 min) In the absence of one, we established a real-world streaming benchmark, code is available on Git. I am excited to tell you that most of these multi-tenancy, scale, and security changes are available in the community releases or are on their way to be released
  8. (1 min – T 12 min) HBase is another cornerstone technology that we rely on extensively and there are applications on HBase that need to bundle multiple read and write operations into a single unit of work, and that’s’ exactly where Omid comes in With Omid, applications can execute transactions with ACID properties without worrying about performance and fault tolerance Omid executes millions of transactions per day for our incremental content management platform for nextgen search and personalization products And, I am pleased to say that the same technology is now available as a new Apache incubator project Applications that need to bundle multiple read and write operations on HBase into logically indivisible units of work can use Omid to execute transactions with ACID (Atomicity, Consistency, Isolation, Durability) properties, just as they would use transactions in the relational database world. Omid extends the HBase key-value access APl with transaction semantics. It can be exercised either directly, or via higher level data management API’s. For example, Apache Phoenix (SQL-on-top-of-HBase) might use Omid as its transaction management component. Development. Omid is backward-compatible with HBase APIs, making it developer friendly. Minimal extensions are introduced to enable transactions.   Semantics. Omid implements a popular, well-tracted Snapshot Isolation (SI) consistency paradigm that is supported by major SQL and NoSQL technologies (for example, Percolator). Scalability. Omid  provides a highly scalable, lock-free implementation of SI. To the best of our knowledge, it is the only open source NoSQL platform that can scale beyond 100K transactions per second. Reliability.  Omid has a high-availability (HA) mode, in which the core service operates as primary-backup process pair with automatic failover. The HA support has zero overhead on the mainstream operation.   Simplicity. Omid leverages the HBase infrastructure for managing its own metadata. It entails no additional services apart from those used by HBase. Track Record. Omid is already in use by very-large-scale production systems at Yahoo.