SlideShare uma empresa Scribd logo
1 de 12
Case Study of BigData use with MapR
           M7 in the Enterprise Datacenter
  Zeljko Dodlek
  Sales Director DACH
  zdodlek@maprtech.com
  +49 (0) 151 120 555 07
©MapR Technologies - Confidential   1
Agenda



        Ancestry Case Study
        MapR Overview
            Q&A




©MapR Technologies - Confidential   2
Ancestry use Case (page 1)

    What does Ancestry do?
Ancestry.com is an online family history service that uses machine
learning and several other statistical techniques to provide services
such as ancestry information and DNA sequencing to its users.


    Business Challenges?
10 Billion records in a 4 PB DataStore
40.000 Record collections (date of birth/death, census, military
status,….)
2+ Million subscribers
10+ Million registered users
DNA matching added to their offering
    ©MapR Technologies - Confidential   3
Ancestry use Case (page 2)

 Why MapR ?
HA Requirements for the NameNode & TaskTracker
Easy way to ingest Data into the cluster
Safe way for using different Jobs on the same cluster
Unified File & Table platform


Configuration
3 separate clusters
* DNA Matching
* Machine Learning
* Data Mining


    ©MapR Technologies - Confidential   4
MapRTech Overview
            Enterprise Grade Hadoop Distribution
            Innovations in the areas of the DataPlatform, Map&Reduce
             and HBase
            Enabling Customers to depend on our Hadoop Distribution
              –    No Single Points of Failure
              –    Guaranteeing SLA’s
              –    Easy to Install/run/expand
            Professional Services – Installation, consulting and training
            Support 7 x24




©MapR Technologies - Confidential                5
MapR Distribution




©MapR Technologies - Confidential   6
MapR’s value addition




                                    Distribution made for the enterprise
©MapR Technologies - Confidential                   7
Expanding Hadoop Use Cases


                                                              Hadoop APIs
                                                              for Hadoop
                                                              Applications


                                                                                   ODBC and JDBC for
                                    NFS for file-based
                                                                                      SQL-based
                                      applications
                                                                                     applications




                                                                                                Mission
                      Real-time                                                            critical and SLA
                     Applications                                                            dependent
                                                                                            Applications


                                                         Blue = MapR Innovations
©MapR Technologies - Confidential                                    8
No NameNode Architecture
Other Distributions (HDFS Federation)                                              MapR
                                          NAS
                                       APPLIANCE



                  A        B            C    D      E   F
                                                    NameNode
              NameNode                 NameNode    NameNode


                                                                           E
               DataNode                DataNode    DataNode
                                                                       A       F     C    D       E     D


               DataNode                DataNode    DataNode
                                                                       A       B     B    C        E    B


               DataNode                DataNode    DataNode
                                                                       A       D     C    F        B    F

                 Multiple single points of failure                   HA w/ automatic failover and re-replication
                 Limited to 50M files per NameNode                   Up to 1T files (> 5000x advantage)
                 Performance bottleneck                              Higher performance
                 Commercial NAS required                             100% commodity hardware
                 Metadata must fit in memory                         Metadata is persisted to disk

   ©MapR Technologies - Confidential                           9
Simplifying HBase Architecture


                          HBase

                             JVM


                             DFS    HBase

                             JVM     JVM

                            ext3    MapR    Unified


                           Disks    Disks    Disks


          Other Distributions

©MapR Technologies - Confidential    10
Selected MapR Customers
                                                                                                                         Global threat
                                                                                                                          analytics
    Intrusion detection & prevention                       Recommendation Engine                                       Virus analysis
    Forensic analysis                                      Family tree connections



Major Credit Card Company                                                                                      Clickstream Analysis
                                                                                   Log analysis               Quality profiling/field
     Recommendation Engine                                                        HBase                       failure analysis
     Fraud detection and Prevention



                                            Fraud                                                                     Customer
                                             Detection                                                                  Sentiment
                                            Channel            Advertising exchange                                  Network Analytics
                                             analytics           analysis and optimization



                                   Customer Revenue
                                    Analytics
                                                                Customer targeting                   Monitors and measures
                                   ETL Offload
                                                                Social media analysis                 behavior of online shoppers
    ©MapR Technologies - Confidential                                      11
Thank You




©MapR Technologies - Confidential   12

Mais conteúdo relacionado

Mais de Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseSwiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexitySwiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceSwiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketSwiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseSwiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computingSwiss Big Data User Group
 

Mais de Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Último

Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 

Último (20)

Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 

Case Study of BigData use with MapR M7 in the Enterprise Datacenter

  • 1. Case Study of BigData use with MapR M7 in the Enterprise Datacenter Zeljko Dodlek Sales Director DACH zdodlek@maprtech.com +49 (0) 151 120 555 07 ©MapR Technologies - Confidential 1
  • 2. Agenda  Ancestry Case Study  MapR Overview  Q&A ©MapR Technologies - Confidential 2
  • 3. Ancestry use Case (page 1)  What does Ancestry do? Ancestry.com is an online family history service that uses machine learning and several other statistical techniques to provide services such as ancestry information and DNA sequencing to its users.  Business Challenges? 10 Billion records in a 4 PB DataStore 40.000 Record collections (date of birth/death, census, military status,….) 2+ Million subscribers 10+ Million registered users DNA matching added to their offering ©MapR Technologies - Confidential 3
  • 4. Ancestry use Case (page 2)  Why MapR ? HA Requirements for the NameNode & TaskTracker Easy way to ingest Data into the cluster Safe way for using different Jobs on the same cluster Unified File & Table platform Configuration 3 separate clusters * DNA Matching * Machine Learning * Data Mining ©MapR Technologies - Confidential 4
  • 5. MapRTech Overview  Enterprise Grade Hadoop Distribution  Innovations in the areas of the DataPlatform, Map&Reduce and HBase  Enabling Customers to depend on our Hadoop Distribution – No Single Points of Failure – Guaranteeing SLA’s – Easy to Install/run/expand  Professional Services – Installation, consulting and training  Support 7 x24 ©MapR Technologies - Confidential 5
  • 7. MapR’s value addition Distribution made for the enterprise ©MapR Technologies - Confidential 7
  • 8. Expanding Hadoop Use Cases Hadoop APIs for Hadoop Applications ODBC and JDBC for NFS for file-based SQL-based applications applications Mission Real-time critical and SLA Applications dependent Applications Blue = MapR Innovations ©MapR Technologies - Confidential 8
  • 9. No NameNode Architecture Other Distributions (HDFS Federation) MapR NAS APPLIANCE A B C D E F NameNode NameNode NameNode NameNode E DataNode DataNode DataNode A F C D E D DataNode DataNode DataNode A B B C E B DataNode DataNode DataNode A D C F B F  Multiple single points of failure  HA w/ automatic failover and re-replication  Limited to 50M files per NameNode  Up to 1T files (> 5000x advantage)  Performance bottleneck  Higher performance  Commercial NAS required  100% commodity hardware  Metadata must fit in memory  Metadata is persisted to disk ©MapR Technologies - Confidential 9
  • 10. Simplifying HBase Architecture HBase JVM DFS HBase JVM JVM ext3 MapR Unified Disks Disks Disks Other Distributions ©MapR Technologies - Confidential 10
  • 11. Selected MapR Customers  Global threat analytics  Intrusion detection & prevention  Recommendation Engine  Virus analysis  Forensic analysis  Family tree connections Major Credit Card Company  Clickstream Analysis  Log analysis  Quality profiling/field  Recommendation Engine  HBase failure analysis  Fraud detection and Prevention  Fraud  Customer Detection Sentiment  Channel  Advertising exchange  Network Analytics analytics analysis and optimization  Customer Revenue Analytics  Customer targeting  Monitors and measures  ETL Offload  Social media analysis behavior of online shoppers ©MapR Technologies - Confidential 11
  • 12. Thank You ©MapR Technologies - Confidential 12

Notas do Editor

  1. MapR’s innovations have also expanded the use cases that are possible with Hadoop. Not only do we support the full Hadoop API set. MapR provides support for NFS so any file-based application can access the cluster with no changes or rewrites required. MapR provides ODBC support, so any database application or SQL-based tool can access and manipulate data in a MapR cluster. MapR supports real-time streaming access. This greatly expands the applications that are possible with Hadoop moving beyond a batch limitation. Finally, the full HA, DR and data protection capabilities of MapR allow mission critical apps to be deployed safely and allows administrators to meet stringent SLA targets.
  2. The Namenode today in Hadoop is a single point of failure, a scalability limitation, and a performance bottleneck.With MapR there is no dedicated NameNode. The NameNode function is distributed across the cluster. This provides major advantages in terms of HA, data loss avoidance, scalability and performance. Other distributions you have a bottleneck regardless of the number of nodes in the cluster. With other distributions the most number of files that you can support is 200M at the maximum and that is with an extremely high end server. 50% of the processing of Hadoop in Facebook is to pack and unpack files to try to work around this limitation. MapR scales uniformly.
  3. (ed. Note: this slide is a great white board slide to summarize M7)The stack on the left is a representation of the HBase architecture found in all other distributions. HBase is deployed on a VM that stores its data in the HDFS layer running on a JVM that in turn stores its data in the Linux file system (ext3) which writes the data to disk. This stack results in a lot of administrative tasks, performance issues, and reliability issues. A lot of the infrastructure within HBase is an attempt to make up for the deficiencies in HDFS. You basically have a database solution that needs to deal with random IO that runs on top of a write-once file system. The middle stack shows how MapR simplified the lower part of the stack with our M5 edition that replaced HDFS and the dependency on the Linux file system with a random read/write storage layer. However, HBase is still a separate infrastructure running on top the storage layer within M5. The region servers are separate and users still experience downtime and delays when recovering from node failures and snapshots.With M7 on the far right, MapR has now unified tables and files into a unified data platform. We’ve eliminated the separate HBase infrastructure. The environment is much simpler to manage by eliminating the various redundant components. We’ve provided a uniform data management layer across files and tables, we’ve provided a consistent data protection layer. Recovery from node failures is in seconds, there is 100% data locality, HBase can read directly from snapshots. Files and tables are in the same namespace, volumes, and directories.