SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Leveraging Hadoop in Wikimart
             Roman Zykov
          Head of analytics
          http://wikimart.ru




London, Big Data World Europe, 20th September 2012
Key problem


To be or not to be….

Hadoop

Introduction
Key tasks for Wikimart

What
• BI tasks
• Web analytics (in-house solution)
• Recommendations on site
• Data services for marketing

Who
• Core analytics team
• Analytics members in other departments
• IT site operations
Problem

Too time consuming or too
expensive?
• Data volume
• # of data services
Map Reduce



       Standalone


DATA


       Map Reduce
Our idea

New platform for “Big Data” tasks only

• Start research on Map Reduce software
• First patient - recommendation engine

Difficulties
- no planned budget ->   Hadoop is free
- no experts        ->   learn it
- no hardware       ->   virtual cluster
Requirements for Hadoop


•   Easy scalable
•   Easy deployment
•   Easy integration
•   Less low level Java coding
•   SQL-like querries
Data flow




DWH
         Data feeds
Accomplishments

Recommendations
• Collaborative filtering (item-to-item on browsing history, PIG)
• Similar products (items attributes, PIG)
• Most popular items (browsing history + orders, HiveQL)
• Internal and external search recommendations (HiveQL)

Some statistics after 1 year
• >10% of revenue
• 3 months to launch
• Tens of gigabytes are processed 2 hours daily
• 1 crash only (cluster lost power)

Decision: Invest to Hardware cluster
End user

Internal high-level languages
• HiveQL
• Pig

Reporting
• Pre-aggregated data for OLAP
• RDBMS - front end
• OLAP and Reporting software should
  support HiveQL
Data Integration

• SQOOP
  • Parallel data exchange with RDBMS
    (MS SQL, MySQL, Oracle, Teradata… )
  • Incremental updates
  • HDFS, Hive, HBASE

• Talend Open Studio
Hadoop vs RDBMS

• Never replace RDBMS:
   • Latency
   • Weak capabilities of HiveQL vs SQL
• Only some tasks with offline processing:
   • Machine learning
   • Queries to Big tables
   • ….
• Real time: NOSQL
Hadoop myth


      Terabytes?
      Petabytes?

      Big tasks!
Conclusion

• Hadoop is not Rocket Science
• Intermediate data can be Big Data

Starter kit
• Hadoop management system
• Virtual hardware (cloud, virtual servers, etc)
• Offline data tasks
• Pig or HiveQL
• Sqoop: import data from existing data sources
Thank you!!!

     rzykov@gmail.com
linkedin.com/in/romanzykov

Mais conteúdo relacionado

Mais de Roman Zykov

Big data europe 2012 brochure (3)
Big data europe 2012 brochure (3)Big data europe 2012 brochure (3)
Big data europe 2012 brochure (3)Roman Zykov
 
Wikimart recommendations
Wikimart recommendationsWikimart recommendations
Wikimart recommendationsRoman Zykov
 
Hadoop implementation in Wikimart
Hadoop implementation in WikimartHadoop implementation in Wikimart
Hadoop implementation in WikimartRoman Zykov
 
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetricsGoogle Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetricsRoman Zykov
 
MIPhT presentation about BI
MIPhT presentation about BIMIPhT presentation about BI
MIPhT presentation about BIRoman Zykov
 
Owox rzykov kp_iexamples
Owox rzykov kp_iexamplesOwox rzykov kp_iexamples
Owox rzykov kp_iexamplesRoman Zykov
 
Roman zykovcertificates
Roman zykovcertificatesRoman zykovcertificates
Roman zykovcertificatesRoman Zykov
 
Wpaper 005 functionalism_new_approach
Wpaper 005 functionalism_new_approachWpaper 005 functionalism_new_approach
Wpaper 005 functionalism_new_approachRoman Zykov
 
Searchpatterns 100519055231-phpapp02
Searchpatterns 100519055231-phpapp02Searchpatterns 100519055231-phpapp02
Searchpatterns 100519055231-phpapp02Roman Zykov
 
Metrics drivendesign
Metrics drivendesignMetrics drivendesign
Metrics drivendesignRoman Zykov
 
Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 4Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 4Roman Zykov
 
Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 3Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 3Roman Zykov
 
Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 2Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 2Roman Zykov
 
Ozon в высшей школе экономики часть 1
Ozon в высшей школе экономики часть 1Ozon в высшей школе экономики часть 1
Ozon в высшей школе экономики часть 1Roman Zykov
 
Roman Zykov Certificates
Roman Zykov CertificatesRoman Zykov Certificates
Roman Zykov CertificatesRoman Zykov
 
Связной клуб
Связной клубСвязной клуб
Связной клубRoman Zykov
 
Complete Ga Power User Web
Complete Ga Power User WebComplete Ga Power User Web
Complete Ga Power User WebRoman Zykov
 
RIW2009 Анализ продвижения
RIW2009 Анализ продвиженияRIW2009 Анализ продвижения
RIW2009 Анализ продвиженияRoman Zykov
 

Mais de Roman Zykov (20)

Big data europe 2012 brochure (3)
Big data europe 2012 brochure (3)Big data europe 2012 brochure (3)
Big data europe 2012 brochure (3)
 
Wikimart recommendations
Wikimart recommendationsWikimart recommendations
Wikimart recommendations
 
Hadoop implementation in Wikimart
Hadoop implementation in WikimartHadoop implementation in Wikimart
Hadoop implementation in Wikimart
 
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetricsGoogle Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
 
MIPhT presentation about BI
MIPhT presentation about BIMIPhT presentation about BI
MIPhT presentation about BI
 
Owox rzykov kp_iexamples
Owox rzykov kp_iexamplesOwox rzykov kp_iexamples
Owox rzykov kp_iexamples
 
Owox rzykov
Owox rzykovOwox rzykov
Owox rzykov
 
Roman zykovcertificates
Roman zykovcertificatesRoman zykovcertificates
Roman zykovcertificates
 
Wpaper 005 functionalism_new_approach
Wpaper 005 functionalism_new_approachWpaper 005 functionalism_new_approach
Wpaper 005 functionalism_new_approach
 
Searchpatterns 100519055231-phpapp02
Searchpatterns 100519055231-phpapp02Searchpatterns 100519055231-phpapp02
Searchpatterns 100519055231-phpapp02
 
Metrics drivendesign
Metrics drivendesignMetrics drivendesign
Metrics drivendesign
 
E-commerce KPIs
E-commerce KPIsE-commerce KPIs
E-commerce KPIs
 
Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 4Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 4
 
Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 3Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 3
 
Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 2Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 2
 
Ozon в высшей школе экономики часть 1
Ozon в высшей школе экономики часть 1Ozon в высшей школе экономики часть 1
Ozon в высшей школе экономики часть 1
 
Roman Zykov Certificates
Roman Zykov CertificatesRoman Zykov Certificates
Roman Zykov Certificates
 
Связной клуб
Связной клубСвязной клуб
Связной клуб
 
Complete Ga Power User Web
Complete Ga Power User WebComplete Ga Power User Web
Complete Ga Power User Web
 
RIW2009 Анализ продвижения
RIW2009 Анализ продвиженияRIW2009 Анализ продвижения
RIW2009 Анализ продвижения
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Leveraging Hadoop to mine customer insights in a developing market

  • 1. Leveraging Hadoop in Wikimart Roman Zykov Head of analytics http://wikimart.ru London, Big Data World Europe, 20th September 2012
  • 2. Key problem To be or not to be…. Hadoop Introduction
  • 3. Key tasks for Wikimart What • BI tasks • Web analytics (in-house solution) • Recommendations on site • Data services for marketing Who • Core analytics team • Analytics members in other departments • IT site operations
  • 4. Problem Too time consuming or too expensive? • Data volume • # of data services
  • 5. Map Reduce Standalone DATA Map Reduce
  • 6. Our idea New platform for “Big Data” tasks only • Start research on Map Reduce software • First patient - recommendation engine Difficulties - no planned budget -> Hadoop is free - no experts -> learn it - no hardware -> virtual cluster
  • 7. Requirements for Hadoop • Easy scalable • Easy deployment • Easy integration • Less low level Java coding • SQL-like querries
  • 8. Data flow DWH Data feeds
  • 9. Accomplishments Recommendations • Collaborative filtering (item-to-item on browsing history, PIG) • Similar products (items attributes, PIG) • Most popular items (browsing history + orders, HiveQL) • Internal and external search recommendations (HiveQL) Some statistics after 1 year • >10% of revenue • 3 months to launch • Tens of gigabytes are processed 2 hours daily • 1 crash only (cluster lost power) Decision: Invest to Hardware cluster
  • 10. End user Internal high-level languages • HiveQL • Pig Reporting • Pre-aggregated data for OLAP • RDBMS - front end • OLAP and Reporting software should support HiveQL
  • 11. Data Integration • SQOOP • Parallel data exchange with RDBMS (MS SQL, MySQL, Oracle, Teradata… ) • Incremental updates • HDFS, Hive, HBASE • Talend Open Studio
  • 12. Hadoop vs RDBMS • Never replace RDBMS: • Latency • Weak capabilities of HiveQL vs SQL • Only some tasks with offline processing: • Machine learning • Queries to Big tables • …. • Real time: NOSQL
  • 13. Hadoop myth Terabytes? Petabytes? Big tasks!
  • 14. Conclusion • Hadoop is not Rocket Science • Intermediate data can be Big Data Starter kit • Hadoop management system • Virtual hardware (cloud, virtual servers, etc) • Offline data tasks • Pig or HiveQL • Sqoop: import data from existing data sources
  • 15. Thank you!!! rzykov@gmail.com linkedin.com/in/romanzykov