SlideShare a Scribd company logo
1 of 21
Hadoop & thefuture of Cloud Computing Todd Papaioannou  VP, Cloud Architecture  By SearchNetMedia
what’s happening  More publicly available human-generated content More interactions being tracked (e.g. clickstream data) More business processes are being digitized More history being kept = The Data Exhaust! Flickr : sub_lime79 BigData is here!
CUTTING THROUGH THE NOISE access audience blogs communication computerinternetmass media people networking technology Location Social Relationships Science UnderstandingUser Interests Flickr : Lomo-Cam
turning data into insights machine learning time series logic regression content clustering algorithms Ad inventory modeling user interest prediction Flickr : NASA Goddard Photo and Video factorization models
making it relevant Flickr : ogimogi
hadoop: lightning-fast Technology science + big data + insight = personal relevance = VALUE Flickr : DDFic
BEHIND every click
hadoop Flickr : Got Sarah
THE PLATFORM EFFECT THE HADOOP ECOSYSTEM and other Early Adopters Scale and productize Hadoop Orgs with Internet Scale Problems Add tools / frameworks, enhance Hadoop Enhance Hadoop Ecosystem Service Providers  Grow ecosystem - Training, support, enhancements  Apache Hadoop Virtuous Circle! ,[object Object]
 Adoption -> InvestmentMainstream / Enterprise adoption Fund further development, enhancements 9
HADOOP IS GOING MAINSTREAM 2010 2008 2009 2007 The Datagraph Blog 10
hadoop at yahoo! “Where Science meets Data” PRODUCTS Data Analytics  Content Optimization Content Enrichment  Yahoo! Mail Anti-Spam  Advertising Products Ad Optimization  Ad Selection Big Data Processing & ETL DIMENSIONAL DATA CONTENT DATA PIPELINES HADOOP CLUSTERS Tens of thousands of servers APPLIED SCIENCE User Interest Prediction Ad inventory prediction  Machine learning - search ranking  Machine learning - ad targeting Machine learning - spam filtering 11
250 200 150 100 50 0 from project to core platform 90 80 70 60 50 40 30 20 10 0 38K Servers 170 PB Storage 1M+ Monthly Jobs Petabytes Thousands of Servers Today 2010 2007 2008 2009 2006 12
yahoo!’S Vision open source cloud Open Source Benefits »Avoid technological dead ends »Leverage community contributions »Workforce already trained Ongoing contributions Yahoo!’s adoption of open source Future contributions Cloud serving Storage 13
What does The Future hold? By Elsie
More BIG By BionicTeaching
Data in the cloud By Fadilfb
PrivateClouds By Zachstern
hybrid clouds By Calop
Automation
cloud fabrics

More Related Content

Similar to Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! Sumeet Singh
 
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Acquia - NY Senate GSA
Acquia - NY Senate GSAAcquia - NY Senate GSA
Acquia - NY Senate GSAAcquia
 
Acquia - NY Senate GSA
Acquia - NY Senate GSAAcquia - NY Senate GSA
Acquia - NY Senate GSAAcquia
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopJongwook Woo
 
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilindEMC
 
Computer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VComputer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VRaji Gogulapati
 
Social Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureSocial Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureRick Mans
 
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev KumarApache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev KumarYahoo Developer Network
 
Big data in action
Big data in actionBig data in action
Big data in actionTu Pham
 
DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)
DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)
DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)DrupalCape
 

Similar to Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou (20)

SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
 
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
 
HadoopWorkshopJuly2014
HadoopWorkshopJuly2014HadoopWorkshopJuly2014
HadoopWorkshopJuly2014
 
Clouds
CloudsClouds
Clouds
 
How Do I Learn Big Data
How Do I Learn Big DataHow Do I Learn Big Data
How Do I Learn Big Data
 
How Do I Learn Big Data
How Do I Learn Big DataHow Do I Learn Big Data
How Do I Learn Big Data
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Acquia - NY Senate GSA
Acquia - NY Senate GSAAcquia - NY Senate GSA
Acquia - NY Senate GSA
 
Acquia - NY Senate GSA
Acquia - NY Senate GSAAcquia - NY Senate GSA
Acquia - NY Senate GSA
 
Svccg 2011-05-12
Svccg 2011-05-12Svccg 2011-05-12
Svccg 2011-05-12
 
Introduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on HadoopIntroduction To Big Data and Use Cases on Hadoop
Introduction To Big Data and Use Cases on Hadoop
 
Social media with big data analytics
Social media with big data analyticsSocial media with big data analytics
Social media with big data analytics
 
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
Big Data Systems: Past, Present &  (Possibly) Future with @techmilindBig Data Systems: Past, Present &  (Possibly) Future with @techmilind
Big Data Systems: Past, Present & (Possibly) Future with @techmilind
 
Computer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop VComputer Applications and Systems - Workshop V
Computer Applications and Systems - Workshop V
 
Social Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureSocial Media, Cloud Computing and architecture
Social Media, Cloud Computing and architecture
 
Social World
Social WorldSocial World
Social World
 
Project management in a virtual world
Project management in a virtual worldProject management in a virtual world
Project management in a virtual world
 
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev KumarApache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
Apache Hadoop India Summit 2011 talk "Informatica and Big Data" by Snajeev Kumar
 
Big data in action
Big data in actionBig data in action
Big data in action
 
DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)
DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)
DrupalCon Chicago 2011 ReportBack (11/03/30 - G. Bedford)
 

More from Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Apache Hadoop India Summit 2011 Keynote talk "Hadoop & the Future of Cloud Computing" by Todd Papaioannou

  • 1. Hadoop & thefuture of Cloud Computing Todd Papaioannou VP, Cloud Architecture By SearchNetMedia
  • 2. what’s happening More publicly available human-generated content More interactions being tracked (e.g. clickstream data) More business processes are being digitized More history being kept = The Data Exhaust! Flickr : sub_lime79 BigData is here!
  • 3. CUTTING THROUGH THE NOISE access audience blogs communication computerinternetmass media people networking technology Location Social Relationships Science UnderstandingUser Interests Flickr : Lomo-Cam
  • 4. turning data into insights machine learning time series logic regression content clustering algorithms Ad inventory modeling user interest prediction Flickr : NASA Goddard Photo and Video factorization models
  • 5. making it relevant Flickr : ogimogi
  • 6. hadoop: lightning-fast Technology science + big data + insight = personal relevance = VALUE Flickr : DDFic
  • 8. hadoop Flickr : Got Sarah
  • 9.
  • 10. Adoption -> InvestmentMainstream / Enterprise adoption Fund further development, enhancements 9
  • 11. HADOOP IS GOING MAINSTREAM 2010 2008 2009 2007 The Datagraph Blog 10
  • 12. hadoop at yahoo! “Where Science meets Data” PRODUCTS Data Analytics Content Optimization Content Enrichment Yahoo! Mail Anti-Spam Advertising Products Ad Optimization Ad Selection Big Data Processing & ETL DIMENSIONAL DATA CONTENT DATA PIPELINES HADOOP CLUSTERS Tens of thousands of servers APPLIED SCIENCE User Interest Prediction Ad inventory prediction Machine learning - search ranking Machine learning - ad targeting Machine learning - spam filtering 11
  • 13. 250 200 150 100 50 0 from project to core platform 90 80 70 60 50 40 30 20 10 0 38K Servers 170 PB Storage 1M+ Monthly Jobs Petabytes Thousands of Servers Today 2010 2007 2008 2009 2006 12
  • 14. yahoo!’S Vision open source cloud Open Source Benefits »Avoid technological dead ends »Leverage community contributions »Workforce already trained Ongoing contributions Yahoo!’s adoption of open source Future contributions Cloud serving Storage 13
  • 15. What does The Future hold? By Elsie
  • 16. More BIG By BionicTeaching
  • 17. Data in the cloud By Fadilfb

Editor's Notes

  1. The web is changing. It’s always evolving and changing. This evolution is about people-powered experiences and transient, unstructured data. My 16-year-old writes. He deletes. He retweets.In fact, a ton of the data on the web today is transient data. It exists for a moment and then it's gone. Its comments on Facebook, emails, content alerts, messenger updates, blogs, twitter feeds .In fact, only 5% of the information created in the world today is “structured”.
  2. Yahoo!'s role has always been to cut through the noise and help people find what they want. We do that in many ways – primarily with deep science and insights, all relying on Hadoop. From curating people’s relationships to get more meaning out of them, to understanding their interests and their location, to adding a complex layer of science on top of all that – Hadoop’s right at the core of making all of that possible.
  3. Turning data into insights isn't trivial. It's heavy lifting. It’s analysis and refinement of raw, unstructured information. It's also deep, best-in-class technology and science, and applying and improving this science is one of the things we do best at Yahoo! – using a variety of techniques as you see listed here.
  4. Yahoo! has made investments in Hadoop that have enabled us to add much more relevance to our data, enrich it, extract insights, and deliver relevant, personalized content and experiences to our consumers. These same investments help deliver the right audiences to our advertisers. As a result of delivering that highly relevant experience to 600 million users around the world, Yahoo!’s one of the most trusted brands on the Internet.
  5. Hadoop delivers huge value to Yahoo! by enabling the important stuff we do with all of our big data. Without it, we simply couldn’t deliver the engaging consumer experiences and advertiser value the way we do today. With Hadoop, we get the disruptive ability to rapidly innovate by customizing, personalizing and fusing people’s individual worlds with the Web at large, in a way no other company can today.
  6. With 600 million people visiting Yahoo!, 11 billion times a month, generating 98 billion page views, Yahoo! is a leader in many categories, and people trust us to give them a great experience and show them what’s most interesting and relevant to them. Behind every click, we’re using Hadoop to optimize what you see on Yahoo.com. We serve about 3 million different versions of the Today Module every 24 hours. Hadoop allows us to analyze story clicks by applying machine learning so we can figure out what you like and give you more of it. Every click a person makes on our homepage – that’s around half a billion clicks per day – results in multiple personalized rankings being computed, each completing in less than 1/100th of a second. Within ~7 minutes of a user clicking on a story, our entire ranking model is updated. Our Content Optimization Engine creates a real-time feedback loop for our editors. They can serve up popular stories and pull out unpopular stories, based on what the algorithm is telling them in real time. Our modeling techniques help us deeply understand the content and eliminate the guesswork, so we can actually predict a story’s relevance and popularity with our audience.
  7. Because of technologies like Hadoop and the rest of our Cloud platform, we’re learning and building faster and faster. It’s all about speed, innovation and real, substantial value to our business. At Yahoo, we’ve been using Hadoop across the company for the last five years, and I’ve shown you just a few examples. Based on our testing and experience, we believe Hadoop is now ready for mainstream enterprise use. We’ve deliberately chosen to invest in open source as the foundation of our cloud. Yahoo! is running the largest implementation of Hadoop in the world today.
  8. An overview of the Hadoop EcosystemYahoo! employees, including Doug Cutting, initiated Apache Hadoop in 2005Since then, the ecosystem has expanded
  9. Hadoop is at the center or our data eco system Every click, page view, search Foundation of our ad management & targeting systems Content Enrichment: (geo location, category) Customize content for users Where Science Meets DataMachine learning - algorithm developmentspam detectionad targetingpredicting user interest and ad inventory Research on ad effectivenessProvides Scale for Big DataDaily: 120TB, 3+PB. Total 70+PB data -- and growingWeb data growing at CAGR of 60% - by 2013 - 667 exabytes (Cisco)
  10. Started Developing Hadoop 5 years ago Prototype of a 20 node clusterDedicated team developing Hadoop every since Focused on supporting Yahoo! needsContributing Hadoop to Apache and helping build the communityStarted as research projectsProgressed to applied science efforts supporting search and adv productsThen production systems (Ad Targeting, Content optimization)Now Hadoop usage has spread to all parts of our business Hadoop is our Big Data infrastructure -- It provides agility with Big Data50% of enterprises cited recent study said strongly considering Hadoop adoption Agility cited as the number one reason
  11. People ask why we contribute to open sourceOpen Source helps us avoid technological dead endsBenefit from leveraging community contributionsAllows us to hire a workforce already trained in our technologyOpen sourcing our Cloud components starts with HadoopPigYahoo! Distribution of Hadoop (adding others)Yahoo! Traffic ServerZookeeperIn addition to benefiting from extern Hadoop contributions:Hive, Apache Web Server, Xen