In the world of web ad and offer targeting, speed and scalability are just about everything. In this webinar, you will learn how the changing application landscape is driving the evolution of data management technology. You will also learn how AOL Advertising leverages Hadoop and Membase NoSQL database technology to rapidly process operational user data to achieve sub-millisecond performance.
AOL’s ad serving platforms see billions of impressions daily, and any incremental improvement in processing time translates as largely beneficial in meeting the read/write goal of five milliseconds. AOL uses Cloudera’s Distribution for Apache Hadoop to create user profiles then serves them from Membase.
Speakers: Matt Aslett, Senior Analyst, The 451 Group;
Pero Subasic, Chief Architect, AOL
Sponsors: Cloudera and Membase.
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Webinar: How AOL Accelerates Targeting Decisions with Hadoop and Membase Server
1. Welcome to today’s webinar How AOL Accelerates Ad Targeting Decisions with Hadoop and Membase Server Audio/Telephone: +1 (805) 309-0021 Access Code: 670-793-134 Audio PIN: Shown after joining the Webinar Host: John Kreisa, VP of Marketing, Cloudera
2. Housekeeping Ask questions at any time using the Questions panel Problems? Use the Chat panel Recording will be available
3. About the Webinar How AOL Accelerates Ad Targeting Decisions with Hadoop and Membase Server
4. Speakers Matt Aslett, Senior Analyst, Enterprise Software Matt covers data management software for The 451 Group's Information Management practice, including relational and non-relational databases, data warehousing and data caching. Matthew is also an expert in open source software and contributes regularly to reports produced through the 451 Commercial Adoption of Open Source (CAOS) Research Service, as well as to the 451 CAOS Theory blog. PeroSubasic, Chief Architect, AOL Pero works on research and development in new technologies and contextual advertising at Aol Advertising in Palo Alto. Over the past 4 years he was the Chief Architect of R&D distributed infrastructure which today comprises more than 1000 nodes in multiple data centers. He also led large-scale contextual analysis and segmentation projects and a variety of machine learning efforts at Aol, Yahoo and Cadence Design Systems and published patents and research papers in these areas.
5. NoSQL and Hadoop: Open source innovation and adoption drivers Matthew Aslett, senior analyst The 451 Group
6. Analyzing the business of Enterprise IT Innovation Unique Analysis of the Hosting, Managed Service, Third-Party Datacenter and Internet Infrastructure sectors The 451 Group The Uptime Institute is the leading independent think tank and research body serving the global datacenter industry.
19. Relevant reports Warehouse Optimization Ten considerations for choosing/building a data warehouse Published September 2009 The role of open source and emergence of Hadoop sales@the451group.com
21. Relevant reports Data Warehousing 2009-2013 Market Sizing, Landscape and Future Published August 2010 The potential impact of Hadoop sales@the451group.com
24. Relevant reports “Database alternatives” Assessing the drivers behind the development and adoption of NoSQL and scalable SQL databases, as well as Hadoop Planned for April 2011 Role of open source in driving innovation COMING APRIL 2011
27. “An injury to ligaments… caused by being stretched beyond normal capacity”Wikipedia
28. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance Relaxed consistency Agility Intricacy Necessity
29. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency Agility Intricacy Necessity
30. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility Intricacy Necessity
31. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility – flexible, schema-free data models and agile development Intricacy Necessity
32. SPRAINed relational databases SPRAIN: “An injury to ligaments… caused by being stretched beyond normal capacity” Wikipedia Six key drivers for NoSQL/Hadoop adoption Scalability Performance – performance does not necessarily mean scalability Relaxed consistency – where scalability is a given Agility – flexible, schema-free data models and agile development Intricacy – complex relationships and data types Necessity
55. data locality modelCloudera’s Distribution for Apache Hadoop big data
56. Scalability big audience Membase Membase Membase Membase Membase Membase Membase Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop Cloudera’s Distribution for Apache Hadoop big data
63. transform and loadCloudera’s Distribution for Apache Hadoop big data
64. Target markets big audience Enterprise applications event monitoring sensor data compliance and regulatory reporting intelligence analysis fraud detection Web applications social games SaaS e-commerce systems clickstream analysis ad and offer targeting systems Membase Cloudera’s Distribution for Apache Hadoop big data
70. Accelerating Ad Targeting Decisions with Hadoop and Membase PeroSubasic, Aol pero.subasic@teamaol.com
71. Overview Online advertising overview Large-scale Analytics at Aol Current Architecture and Data Flows Hadoop+Membase current use cases RT Architecture Proposal RT Use Case: RT Contextual Segmentation of Users Conclusion
78. Use Cases Today data set enrichment: given a field in a data set stored on HDFS, enrich by adding related fields; media -> campaign -> advertiser chain blackboard for inter-process/job communication: contextual segmentation pipelines; predictive modeling can load per-campaign models to be used for large-scale scoring larger map-side joins (where HadoopDistributedCache and in-memory process/task cache is insufficient) aggregations with large number of item lookups, e.g. user-level contextual profiles aggregated from visited url contextual profies stored in memcache Flume integration for data flow reliability end recovery segment generation currently carried out through Hadoop pipelines and uploaded into server-side Membase for targeting but: strong tendency to move closer to ad serving motivates thinking about new architectures to reduce segment generation time
79. RT Framework: Capture, Compute and Forward Flume Ingestion Data Feeds CAPTURE COMPUTE FORWARD Big Data Loop Membase (back-end) Compute Cluster Membase (front-end) and ad-serving logic Hadoop
80. Features Ahead bucket lifecycle management (creation, sizing, deposition) asynchronous stream with all mutation operators iteration through key space without knowing the keys in advance (making key space Ord?) regex or range-based key iteration for finer-grain key space control bucket drain to HDFS event-based synchronization between instances (TAP)
81. RT Contextual Segmentation Flume Ingestion Data Feeds User-ContentIDMapper Membase Active Event Frame Membase + ad-serving logic User-Segment Mapper UC Map US Short-term Map ContentID-Segment Map Event-based updates Daily Map updates Hadoop US Long-term Map
82.
83. Closing Remarks Exciting Times Need is real and recognized Technological capability is within reach Q/A Contact: pero.subasic@teamaol.com