O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem

581 visualizações

Publicada em

Large-scale testing of new data products or enhancements to existing products in a research and development environment can be a technical challenge for data scientists. In some cases, tools available to data scientists lack production-level capacity, whereas other tools do not provide the algorithms needed to run the methodology. At Nielsen, the Databricks platform provided a solution to both of these challenges. This breakout session will cover a specific Nielsen business case where two methodology enhancements were developed and tested at large-scale using the Databricks platform. Development and large-scale testing of these enhancements would not have been possible using standard database tools.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

How Nielsen Utilized Databricks for Large-Scale Research and Development with Matt VanLandeghem

  1. 1. Matt VanLandeghem, Nielsen How Nielsen Utilized Databricks for Large-scale Research and Development #EUent4
  2. 2. About Nielsen • Founded 1923 • Buy & Watch – Buy: Market Research – Watch: Audience Measurement • Not just TV! • Also Radio and Digital, including PC, Mobile, Connected Devices, Digital Audio, Digital TV • Digital Ad spending now meeting/exceeding TV Ad spending 2#EUent4
  3. 3. What is Nielsen Digital Ad Ratings (DAR)? • Measurement of computer, mobile, and over-the-top device audience – Comparable to TV ratings – Who is behind the screen? • Advertising campaigns – Primary focus is age/gender demographic breaks – On-Target Delivery (%) is a key metric • Global product – 25 countries 3#EUent4
  4. 4. How does DAR work? 4#EUent4 4.3 2.4 2.0 0 1 2 3 4 5 Third-party Demographics Report to ClientAd Impression “Big Data”Mobile, computer, over-the- top Overnight daily reporting of: Unique audience, Ad impressions, On-Target % Nielsen Bias-Correction Adjustment Focus of today’s presentation…
  5. 5. Nielsen Adjustments • “Big Data” is not perfect – Needs bias correction – Where the value of Nielsen’s high-quality panels really shines – Nielsen’s panels provide a “truth set” that can be used to develop models that adjust big data • 3 sources of bias – Misrepresentation – Misattribution – Non-coverage • Nielsen adjustments are an active area of Research and Development 5#EUent4
  6. 6. Nielsen Adjustments • Metered home PC behavior – Representative sample of U.S. homes – “Medium” data • Production impression data – Big data • What is the best way to create Nielsen adjustments AND test them in a production environment? • Foundation for Nielsen’s Databricks Use Case 6#EUent4
  7. 7. Nielsen Business Case • Recently created new DAR adjustment methodologies – Small-scale testing showed the new methodologies are an enhancement over current methodologies • Business requirement: test new methodologies on a large # of campaigns – Need to understand client impact – Large-scale testing could identify corner or edge cases where new methodologies could break down and cause a data quality issue – Small scale testing: ~20 campaigns – Large scale testing: ~4000 campaigns 7#EUent4
  8. 8. Databricks • Cluster management • Provide a friendly interface to Spark for our Data Scientists – Multiple programming languages – Create adjustment factors • Uses an algorithm not available in SQL – Link to production databases – Apply adjustment factors to production-level data – Analyze data with new adjustment factors applied 8#EUent4
  9. 9. Nielsen Business Case 9#EUent4 Aggregated panel data Netezza Cloud -Combine small and large data -Run all analyses in one place using PySpark/Spark SQL Data Lake Oracle Aggregated production data
  10. 10. 10#EUent4
  11. 11. 11#EUent4
  12. 12. Nielsen Business Case • Performance gains: – What would have taken 36 hours with standalone Python only took 1.5 hours in Spark/Databricks – Edge-cases identified • Advantages of one methodology over another also identified – Short turn-around if any revisions to methodology 12#EUent4
  13. 13. Nielsen Business Case • Other benefits – Reduced time from idea to deployment – Enhanced support/investigation once deployed • Client data inquiries and issues addressed quicker – Collaboration • Application Development teams • International data science teams • These new methodologies being tested in other products – Enhanced skillsets of data scientists 13#EUent4
  14. 14. Summary • At the end of the day, the Databricks/Spark technology allowed us to solve this business use case • The reduced R&D timeline plus extensive testing will allow enhanced methodologies to be available to our clients sooner 14#EUent4
  15. 15. Copyright © 2017 The Nielsen Company. Confidential and proprietary. Special thanks: Mala Sivarajan, Anil Singh