SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Analyzing Large-Scale User Data
    with Hadoop and HBase

Aaron Kimball – CTO



                           WibiData, Inc.
We can now collect more
data than at any time in
history.
Yesterday’s engineering challenge:
Fitting the problem into the
hardware.
Today’s constrained
resource is understanding.
How do we best apply data




            …to better serving our users?
The best products are user-centric
• Intuitive UI
• Continuously learning
  – Guided search
  – Smarter recommendations
• More effective service
What are we building toward?
What are we building toward?
What are we building toward?
What are we building toward?
What are we building toward?
Requirements




 1. Understand the user population
Requirements
               2. Respond to
               users in real time
Requirements




 3. Support graceful data evolution
Large-scale data science is hard
• What does a user look like?
  – What data is available about the user?
  – Which features are important?
  – Which features are correlated?
• How do I model this in MapReduce?
• How do I serve results in a timely fashion?
Tools of the trade
• Store all data about a user
  in one place
• Support real-time get/put,
  as well as MapReduce
Tools of the trade
             • Use complex data types to
               model complex data
             • Support extended data
               models over time
             • Retain support for legacy
               systems using older models
Tools of the trade
• Abstract computational
  model away from MapReduce
• Support computation over all
  users… or one user at a time
: for set-top boxes



Viewing/recording history
: for set-top boxes
                                       Libraries
                                 Device and User Analysis



Viewing/recording history



Personalized offers and
  recommendations
: for set-top boxes
                                       Libraries
                                 Device and User Analysis



Viewing/recording history



Personalized offers and
  recommendations




   Analysis for
product roadmap
: for set-top boxes
                                                Libraries
                                          Device and User Analysis



Viewing/recording history



Personalized offers and
  recommendations




   Analysis for
product roadmap             Tech support portal
: for set-top boxes
                                                Libraries
                                          Device and User Analysis



Viewing/recording history



Personalized offers and
  recommendations



                                                           Improved
   Analysis for
                                                           reports for
product roadmap             Tech support portal
                                                           advertisers
The future
•   More personalization
•   Adaptive UIs (self arranging dashboards)
•   Targeted content, ads
•   More effective customer service
Conclusions
• Applications are becoming increasingly user-
  centric
• Data drives this capability, but harnessing it
  requires a new distributed architecture
• The biggest challenge is allowing data
  scientists to effectively leverage the data
www.wibidata.com / @wibidata
   Aaron Kimball – aaron@wibidata.com

Mais conteúdo relacionado

Semelhante a Analyzing Large-Scale User Data with Hadoop and HBase

Software Programs for Data Analysis
Software Programs for Data AnalysisSoftware Programs for Data Analysis
Software Programs for Data Analysis
unmgrc
 
Building Personalized Applications with HBase
Building Personalized Applications with HBaseBuilding Personalized Applications with HBase
Building Personalized Applications with HBase
WibiData
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
yalisassoon
 
In memory analysis 衍華
In memory analysis 衍華In memory analysis 衍華
In memory analysis 衍華
Lawrence Huang
 

Semelhante a Analyzing Large-Scale User Data with Hadoop and HBase (20)

Software Programs for Data Analysis
Software Programs for Data AnalysisSoftware Programs for Data Analysis
Software Programs for Data Analysis
 
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos... Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 
Knowage 8 presentation
Knowage 8   presentationKnowage 8   presentation
Knowage 8 presentation
 
Digitisation workshop pres 2009(v1)
Digitisation workshop pres 2009(v1)Digitisation workshop pres 2009(v1)
Digitisation workshop pres 2009(v1)
 
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...
Hadoop World 2011: WibiData: Building Personalized Applications with HBase - ...
 
Building Personalized Applications with HBase
Building Personalized Applications with HBaseBuilding Personalized Applications with HBase
Building Personalized Applications with HBase
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
Self Service Reporting & Analytics For an Enterprise
Self Service Reporting & Analytics For an EnterpriseSelf Service Reporting & Analytics For an Enterprise
Self Service Reporting & Analytics For an Enterprise
 
Business analytics and data visualisation
Business analytics and data visualisationBusiness analytics and data visualisation
Business analytics and data visualisation
 
Conceptual Design of TAPipedia
Conceptual Design of TAPipediaConceptual Design of TAPipedia
Conceptual Design of TAPipedia
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
1 introba
1 introba1 introba
1 introba
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
In memory analysis 衍華
In memory analysis 衍華In memory analysis 衍華
In memory analysis 衍華
 
Birst for SAP HANA
Birst for SAP HANABirst for SAP HANA
Birst for SAP HANA
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)Digitisation Workshop Pres 2008(V1)
Digitisation Workshop Pres 2008(V1)
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Analyzing Large-Scale User Data with Hadoop and HBase

  • 1.
  • 2. Analyzing Large-Scale User Data with Hadoop and HBase Aaron Kimball – CTO WibiData, Inc.
  • 3. We can now collect more data than at any time in history.
  • 4. Yesterday’s engineering challenge: Fitting the problem into the hardware.
  • 6. How do we best apply data …to better serving our users?
  • 7. The best products are user-centric • Intuitive UI • Continuously learning – Guided search – Smarter recommendations • More effective service
  • 8. What are we building toward?
  • 9. What are we building toward?
  • 10. What are we building toward?
  • 11. What are we building toward?
  • 12. What are we building toward?
  • 13. Requirements 1. Understand the user population
  • 14. Requirements 2. Respond to users in real time
  • 15. Requirements 3. Support graceful data evolution
  • 16. Large-scale data science is hard • What does a user look like? – What data is available about the user? – Which features are important? – Which features are correlated? • How do I model this in MapReduce? • How do I serve results in a timely fashion?
  • 17.
  • 18. Tools of the trade • Store all data about a user in one place • Support real-time get/put, as well as MapReduce
  • 19. Tools of the trade • Use complex data types to model complex data • Support extended data models over time • Retain support for legacy systems using older models
  • 20. Tools of the trade • Abstract computational model away from MapReduce • Support computation over all users… or one user at a time
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. : for set-top boxes Viewing/recording history
  • 26. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations
  • 27. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Analysis for product roadmap
  • 28. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Analysis for product roadmap Tech support portal
  • 29. : for set-top boxes Libraries Device and User Analysis Viewing/recording history Personalized offers and recommendations Improved Analysis for reports for product roadmap Tech support portal advertisers
  • 30. The future • More personalization • Adaptive UIs (self arranging dashboards) • Targeted content, ads • More effective customer service
  • 31. Conclusions • Applications are becoming increasingly user- centric • Data drives this capability, but harnessing it requires a new distributed architecture • The biggest challenge is allowing data scientists to effectively leverage the data
  • 32. www.wibidata.com / @wibidata Aaron Kimball – aaron@wibidata.com