SlideShare uma empresa Scribd logo
1 de 55
Welcome to today’s webinar How AOL Accelerates Ad Targeting Decisions with Hadoop and Membase Server Audio/Telephone:  +1 (805) 309-0021 Access Code:  670-793-134 Audio PIN: Shown after joining the Webinar Host: John Kreisa, VP of Marketing, Cloudera
Housekeeping Ask questions at any time using the Questions panel Problems? Use the Chat panel Recording will be available
About the Webinar How AOL Accelerates Ad Targeting Decisions with Hadoop and Membase Server
Speakers Matt Aslett, Senior Analyst, Enterprise Software Matt covers data management software for The 451 Group's Information Management practice, including relational and non-relational databases, data warehousing and data caching. Matthew is also an expert in open source software and contributes regularly to reports produced through the 451 Commercial Adoption of Open Source (CAOS) Research Service, as well as to the 451 CAOS Theory blog. PeroSubasic, Chief Architect, AOL Pero works on research and development in new technologies and contextual advertising at Aol Advertising in Palo Alto. Over the past 4 years he was the Chief Architect of R&D distributed infrastructure which today comprises more than 1000 nodes in multiple data centers. He also led large-scale contextual analysis and segmentation projects and a variety of machine learning efforts at Aol, Yahoo and Cadence Design Systems and published patents and research papers in these areas.
NoSQL and Hadoop: Open source innovation and adoption drivers Matthew Aslett, senior analyst The 451 Group
Analyzing the business of Enterprise IT Innovation Unique Analysis of the Hosting, Managed Service, Third-Party Datacenter and Internet Infrastructure sectors The 451 Group The Uptime Institute is the leading independent think tank and research body serving the global datacenter industry.
Coverage areas Commercial Adoption of Open Source (CAOS) Adoption by enterprise Adoption by vendors Information Management Database Data warehousing Data caching ,[object Object]
Senior analyst, enterprise software
With The 451 Group since 2007
www.about.me/mattaslett
www.twitter.com/maslett,[object Object]
Open source database landscape 2008 ,[object Object]
30.4% using open source databases
Main usage areas
45% In-house-developed apps
41% Single-function apps
38% Development
36% Web apps,[object Object]
Relevant reports Warehouse Optimization Ten considerations for choosing/building a data warehouse Published September 2009 The role of open source and emergence of Hadoop sales@the451group.com
Open source database landscape 2009 Analytic Infobright InfiniDB MonetDB LucidDB Hadoop
Relevant reports Data Warehousing 2009-2013 Market Sizing, Landscape and Future Published August 2010 The potential impact of Hadoop sales@the451group.com
Open source database landscape 2011 Analytic Infobright (InfiniDB) MonetDB LucidDB Hadoop Hadoop Pig Hive ZooKeeper Mahout Avro
Open source database landscape 2011 Hadoop Hadoop Pig Hive ZooKeeper Mahout Avro Analytic Infobright (InfiniDB) MonetDB LucidDB Cassandra CouchDB MongoDB NoSQL HBase Membase Riak
Relevant reports “Database alternatives” Assessing the drivers behind the development and adoption of NoSQL and scalable SQL databases, as well as Hadoop Planned for April 2011 Role of open source in driving innovation COMING  APRIL 2011
SPRAINED RELATIONAL DATABASES Photo credit: Foxtongue on Flickr http://www.flickr.com/photos/foxtongue/4844016087/
SPRAINed relational databases ,[object Object]
“An injury to ligaments… caused by being stretched beyond normal capacity”Wikipedia
SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance Relaxed consistency Agility Intricacy Necessity
SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency Agility Intricacy Necessity
SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility Intricacy Necessity
SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility – flexible, schema-free data models and agile development Intricacy Necessity
SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility – flexible, schema-free data models and agile development Intricacy – complex relationships and data types Necessity
Scalability users application database hardware
Scalability users users users application application application database hardware
Scalability users users users application application application database hardware hardware hardware hardware hardware hardware hardware hardware
Scalability users users users users users users users users application application application database hardware hardware hardware hardware hardware hardware hardware hardware
Scalability users users users users users users users users application application application application application application database hardware hardware hardware hardware hardware hardware hardware hardware
Scalability users users users users users users users users application application application application application application DATA – large volumes, structured and unstructured, real-time demands  database hardware hardware hardware hardware hardware hardware hardware hardware
Scalability users users users users users users users users application application application application application application BIG DATA – Volume, Variety and Velocity database hardware hardware hardware hardware hardware hardware hardware hardware
Scalability Operational database Database Analytic database
Scalability big audience real-timetransactional data management Database large scale data analysis big data
Requirements big audience real-timetransactional data management Interactive application ,[object Object]
 real-time
 low, predictable latency
 working set often < total data setData analysis ,[object Object]
 batch processing
 analytics-optimized
 data locality modellarge scale data analysis big data
Requirements big audience Membase Interactive application ,[object Object]
 real-time
 low, predictable latency
 working set often < total data setData analysis ,[object Object]
 batch processing
 analytics-optimized
 data locality modelCloudera’s Distribution for Apache Hadoop big data

Mais conteúdo relacionado

Mais de Cloudera, Inc.

Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 

Último

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Webinar: How AOL Accelerates Targeting Decisions with Hadoop and Membase Server

  • 1. Welcome to today’s webinar How AOL Accelerates Ad Targeting Decisions with Hadoop and Membase Server Audio/Telephone: +1 (805) 309-0021 Access Code: 670-793-134 Audio PIN: Shown after joining the Webinar Host: John Kreisa, VP of Marketing, Cloudera
  • 2. Housekeeping Ask questions at any time using the Questions panel Problems? Use the Chat panel Recording will be available
  • 3. About the Webinar How AOL Accelerates Ad Targeting Decisions with Hadoop and Membase Server
  • 4. Speakers Matt Aslett, Senior Analyst, Enterprise Software Matt covers data management software for The 451 Group's Information Management practice, including relational and non-relational databases, data warehousing and data caching. Matthew is also an expert in open source software and contributes regularly to reports produced through the 451 Commercial Adoption of Open Source (CAOS) Research Service, as well as to the 451 CAOS Theory blog. PeroSubasic, Chief Architect, AOL Pero works on research and development in new technologies and contextual advertising at Aol Advertising in Palo Alto. Over the past 4 years he was the Chief Architect of R&D distributed infrastructure which today comprises more than 1000 nodes in multiple data centers. He also led large-scale contextual analysis and segmentation projects and a variety of machine learning efforts at Aol, Yahoo and Cadence Design Systems and published patents and research papers in these areas.
  • 5. NoSQL and Hadoop: Open source innovation and adoption drivers Matthew Aslett, senior analyst The 451 Group
  • 6. Analyzing the business of Enterprise IT Innovation Unique Analysis of the Hosting, Managed Service, Third-Party Datacenter and Internet Infrastructure sectors The 451 Group The Uptime Institute is the leading independent think tank and research body serving the global datacenter industry.
  • 7.
  • 9. With The 451 Group since 2007
  • 11.
  • 12.
  • 13. 30.4% using open source databases
  • 18.
  • 19. Relevant reports Warehouse Optimization Ten considerations for choosing/building a data warehouse Published September 2009 The role of open source and emergence of Hadoop sales@the451group.com
  • 20. Open source database landscape 2009 Analytic Infobright InfiniDB MonetDB LucidDB Hadoop
  • 21. Relevant reports Data Warehousing 2009-2013 Market Sizing, Landscape and Future Published August 2010 The potential impact of Hadoop sales@the451group.com
  • 22. Open source database landscape 2011 Analytic Infobright (InfiniDB) MonetDB LucidDB Hadoop Hadoop Pig Hive ZooKeeper Mahout Avro
  • 23. Open source database landscape 2011 Hadoop Hadoop Pig Hive ZooKeeper Mahout Avro Analytic Infobright (InfiniDB) MonetDB LucidDB Cassandra CouchDB MongoDB NoSQL HBase Membase Riak
  • 24. Relevant reports “Database alternatives” Assessing the drivers behind the development and adoption of NoSQL and scalable SQL databases, as well as Hadoop Planned for April 2011 Role of open source in driving innovation COMING APRIL 2011
  • 25. SPRAINED RELATIONAL DATABASES Photo credit: Foxtongue on Flickr http://www.flickr.com/photos/foxtongue/4844016087/
  • 26.
  • 27. “An injury to ligaments… caused by being stretched beyond normal capacity”Wikipedia
  • 28. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance Relaxed consistency Agility Intricacy Necessity
  • 29. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency Agility Intricacy Necessity
  • 30. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility Intricacy Necessity
  • 31. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility – flexible, schema-free data models and agile development Intricacy Necessity
  • 32. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility – flexible, schema-free data models and agile development Intricacy – complex relationships and data types Necessity
  • 33. Scalability users application database hardware
  • 34. Scalability users users users application application application database hardware
  • 35. Scalability users users users application application application database hardware hardware hardware hardware hardware hardware hardware hardware
  • 36. Scalability users users users users users users users users application application application database hardware hardware hardware hardware hardware hardware hardware hardware
  • 37. Scalability users users users users users users users users application application application application application application database hardware hardware hardware hardware hardware hardware hardware hardware
  • 38. Scalability users users users users users users users users application application application application application application DATA – large volumes, structured and unstructured, real-time demands database hardware hardware hardware hardware hardware hardware hardware hardware
  • 39. Scalability users users users users users users users users application application application application application application BIG DATA – Volume, Variety and Velocity database hardware hardware hardware hardware hardware hardware hardware hardware
  • 40. Scalability Operational database Database Analytic database
  • 41. Scalability big audience real-timetransactional data management Database large scale data analysis big data
  • 42.
  • 45.
  • 48. data locality modellarge scale data analysis big data
  • 49.
  • 52.
  • 55. data locality modelCloudera’s Distribution for Apache Hadoop big data
  • 56. Scalability big audience Membase Membase Membase Membase Membase Membase Membase Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop big data
  • 57.
  • 58. real-time data collection, analysis
  • 59. shared data platform
  • 60.
  • 61. aggregation of mixed data sources
  • 62. structured and un/semi-structured data
  • 63. transform and loadCloudera’s Distribution for Apache Hadoop big data
  • 64. Target markets big audience Enterprise applications event monitoring sensor data compliance and regulatory reporting intelligence analysis fraud detection Web applications social games SaaS e-commerce systems clickstream analysis ad and offer targeting systems Membase Cloudera’s Distribution for Apache Hadoop big data
  • 65. Necessity BigTable and MapReduce – Google Dynamo – Amazon Hadoop, Pig, HBase – Yahoo Cassandra, Hive - Facebook Voldemort – Linkedin FlockDB – Twitter Hypertable – Zvents Neo4J – Windh Technologies Memcached – Danga Interactive MongoDB – Doubleclick Membase – Zynga
  • 66.
  • 67.
  • 68.
  • 70. Accelerating Ad Targeting Decisions with Hadoop and Membase PeroSubasic, Aol pero.subasic@teamaol.com
  • 71. Overview Online advertising overview Large-scale Analytics at Aol Current Architecture and Data Flows Hadoop+Membase current use cases RT Architecture Proposal RT Use Case: RT Contextual Segmentation of Users Conclusion
  • 72.
  • 73. CPC = Cost Per Click, e.g. $2 per click
  • 74.
  • 75.
  • 77.
  • 78. Use Cases Today data set enrichment: given a field in a data set stored on HDFS, enrich by adding related fields; media -> campaign -> advertiser chain blackboard for inter-process/job communication: contextual segmentation pipelines; predictive modeling can load per-campaign models to be used for large-scale scoring larger map-side joins (where HadoopDistributedCache and in-memory process/task cache is insufficient) aggregations with large number of item lookups, e.g. user-level contextual profiles aggregated from visited url contextual profies stored in memcache Flume integration for data flow reliability end recovery segment generation currently carried out through Hadoop pipelines and uploaded into server-side Membase for targeting but: strong tendency to move closer to ad serving motivates thinking about new architectures to reduce segment generation time
  • 79. RT Framework: Capture, Compute and Forward Flume Ingestion Data Feeds CAPTURE COMPUTE FORWARD Big Data Loop Membase (back-end) Compute Cluster Membase (front-end) and ad-serving logic Hadoop
  • 80. Features Ahead bucket lifecycle management (creation, sizing, deposition) asynchronous stream with all mutation operators iteration through key space without knowing the keys in advance (making key space Ord?) regex or range-based key iteration for finer-grain key space control bucket drain to HDFS event-based synchronization between instances (TAP)
  • 81. RT Contextual Segmentation Flume Ingestion Data Feeds User-ContentIDMapper Membase Active Event Frame Membase + ad-serving logic User-Segment Mapper UC Map US Short-term Map ContentID-Segment Map Event-based updates Daily Map updates Hadoop US Long-term Map
  • 82.
  • 83. Closing Remarks Exciting Times Need is real and recognized Technological capability is within reach Q/A Contact: pero.subasic@teamaol.com
  • 84.
  • 86. Frank Weigel, Director of Product Management, Couchbase (formerly Membase)
  • 87.