SlideShare uma empresa Scribd logo
1 de 51
Integrating Hadoop into the Enterprise
Jonathan Seidman
Hadoop Summit 2012
June 14th, 2012
Who I Am

    • Solutions Architect, Partner Engineering
      Team.
    • Co-founder of Chicago Hadoop User
      Group and co-founder/organizer of
      Chicago Big Data.
    • jseidman@cloudera.com
    • @jseidman
    • cloudera.com/careers

2
                     ©2012 Cloudera, Inc. All Rights Reserved.
What I’ll Be Talking About
    • Some Background.
    • Common uses of Hadoop in an enterprise data
      infrastructure.
    • Hadoop Integration – the big picture.
    • Deeper dive:
      – Data import/export: Moving data between Hadoop
        and existing data stores.
      – ETL tools.
      – Business intelligence (BI) and analytic tools.
    • Example architectures and data flows.
    • Conclusions


3
                       ©2012 Cloudera, Inc. All Rights Reserved.
My Life Before Cloudera…




4
                ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop at Orbitz
                                    100.00%
                                                                                                     Queries
                                     90.00%
                                     80.00%                                                          Searches
                                            71.67%
                                     70.00%
                                     60.00%
                                     50.00%
                                     40.00%
                                                                                   34.30%
                                            31.87%
                                     30.00%
                                     20.00%
                                     10.00%
                                                                                   2.78%
                                       0.00%
                                                1    2   3   4   5   6   7   8   9 10 11 12 13 14 15 16 17 18 19 20




5
                 ©2012 Cloudera, Inc. All Rights Reserved.
But Hadoop Was An Isolated System



           Developers                                               Business Analysts Normal
                                                                    Users             Humans




6
                        ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop + the Data Warehouse…




7
                ©2012 Cloudera, Inc. All Rights Reserved.
…Enabled New Analyses




8
               ©2012 Cloudera, Inc. All Rights Reserved.
In our opinion, integration with existing IT systems
and software is critical, as we know enterprises will
not be replacing these technologies anytime soon.

    For Hadoop platforms this means integration with
    existing databases, data warehouses, and
    business-analytics and business-visualization
    tools. *




    * A near-term outlook for big data, Jo Maitland, GigaOM Pro, March 2012


9
                             ©2012 Cloudera, Inc. All Rights Reserved.
What Can We Do?
 • ETL
     – Scalable ETL – allows companies to meet SLA’s
       (inexpensively).
     – Agile – facilitates rapid modifications.
 • Moving analysis off of existing systems.
 • Sandbox for exploratory analytics.
 • Using Hadoop as an active archive.
 • Joining transactional data from a DB with
   interaction data.
 • Common theme: freeing up existing systems for
   tasks they’re better suited for.


10
                       ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise
  Data
Warehouse



Relational
Databases
                 Flume
       Data Import/Export                                                         ETL Tools



                            Appliances                                    NoSQL


 11
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Data Import/Export



     Enterprise
       Data
     Warehouse



     Relational
     Databases




12
                  ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Overview

 • Apache project designed to ease import
   and export of data between Hadoop and
   relational databases.
 • Provides functionality to do bulk imports
   and exports of data with HDFS, Hive and
   HBase.
 • Java based. Leverages MapReduce to
   transfer data in parallel.


13
                  ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Overview

 • Uses a “connector” abstraction.
 • Two types of connectors
     – Standard connectors are JDBC based.
     – Direct connectors use native database
       interfaces to improve performance.
 • Direct connectors are available for many
   open-source and commercial databases –
   MySQL, PostgreSQL, Oracle, SQL
   Server, Teradata, etc.

14
                    ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Import Flow

                Run import             Collect metadata

       Client                Sqoop

     Generate code,                               Pull data
     Execute MR job
                       MapReduce                         Map                  Map     Map

                              Write to Hadoop

                                                                             Hadoop




15
                                 ©2012 Cloudera, Inc. All Rights Reserved.
Sqoop Limitations

 Sqoop has some limitations, including:
 • Poor support for security.
       $ sqoop import –username scott –password tiger…
     – Sqoop can read command line options from
       an option file, but this still has holes.
 • Error prone syntax.
 • Tight coupling to JDBC model – not a
   good fit for non-RDBMS systems.


16
                      ©2012 Cloudera, Inc. All Rights Reserved.
Fortunately…

 Sqoop 2 (incubating) will address many of
 these limitations:
 •   Adds a web-based GUI.
 •   Centralized configuration.
 •   More flexible model.
 •   Improved security model.



17
                    ©2012 Cloudera, Inc. All Rights Reserved.
Informatica PowerExchange

 • Not just RDBMS integration – provides
   consistent, native integration between
   Hadoop and a range of data
   sources, databases, legacy
   systems, standard file formats, CRM…
 • Integrated with PowerCenter for pre/post-
   processing of data, administration, and
   metadata management.


18
                 ©2012 Cloudera, Inc. All Rights Reserved.
Power Exchange – Data Import

                      Access Data                            Pre-Process         Ingest Data
   Web server




Databases,            PowerExchange                           PowerCenter
Data Warehouse
                       Batch                                                       HDFS



Message Queues,
Email, Social Media    CDC                                                         HIVE
                                                             e.g.
                                                             Filter, Join, Cle
   ERP, CRM                                                  anse
                       Real-time


   Mainframe




 19
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Power Exchange – Data Export

Extract Data   Post-Process                             Deliver Data

                                                                          Web server




               PowerCenter                               PowerExchange
                                                                         Databases,
                                                                         Data Warehouse
 HDFS                                                      Batch




                                                           Real-time
                                                                           ERP, CRM
               e.g. Transform
               to target
               schema
                                                                           Mainframe




20
                             ©2012 Cloudera, Inc. All Rights Reserved.
Informatica PowerExchange
 1. Create Ingest or
 Extract Mapping



 2. Create Hadoop
 Connection




                               3. Configure Workflow




           4. Configure Hive
           Properties




21
                                             ©2012 Cloudera, Inc. All Rights Reserved.
There’s Always the Low-Tech Way…

                                                                         GreenPlum




                                                                GPLoad
 Hadoop                                                                  GreenPlum
Processing   Hive                                  Local Disk




                                                                         GreenPlum



22
                    ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise
  Data
Warehouse



Relational
Databases
                 Flume
       Data Import/Export                                                         ETL Tools



                            Appliances                                    NoSQL


 23
                                  ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools




24
             ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools




25
             ©2012 Cloudera, Inc. All Rights Reserved.
ETL – The Wikipedia Definition

 • Extract, transform and load (ETL) is a
   process in database usage and especially
   in data warehousing that involves:
     – Extracting data from outside sources
     – Transforming it to fit operational needs
     – Loading it into the end target (DB or data
       warehouse)

           http://en.wikipedia.org/wiki/Extract,_transform,_load



26
                           ©2012 Cloudera, Inc. All Rights Reserved.
ETL Tools

 • Very common use case for Hadoop.
 • Most ETL in Hadoop is still done through
   plain old MapReduce.
 • Companies want to leverage their existing
   developer skills – many enterprises have
   armies of SQL and ETL developers.




27
                 ©2012 Cloudera, Inc. All Rights Reserved.
Informatica HParser

 • Not exactly ETL – provides data
   transformation and parsing optimized for
   parallel processing on Hadoop.
 • Supports deeply hierarchical data and
   complex data formats.
 • Transformations are defined in a Windows
   UI and then deployed to a Hadoop Cluster
   for execution.


28
                 ©2012 Cloudera, Inc. All Rights Reserved.
HParser – How does it work?
                                         hadoop … dt-hadoop.jar
                                         … My_Parser /input/*/input*.txt

                                                                              HDFS




1. Develop a DT transformation
2. Deploy the transformation to Hadoop
3. Run DT on Hadoop to produce
   tabular data
4. Analyze the data with HIVE / PIG /
   MapReduce / Other…



 29
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Pentaho

 • Existing BI tools extended to support
   Hadoop.
 • Not just ETL – also provides data
   import/export, job
   orchestration, reporting, and analysis
   functionality.
 • Supports integration with HDFS, Hive and
   Hbase.
 • Community and Enterprise Editions
   offered.
30
                 ©2012 Cloudera, Inc. All Rights Reserved.
Pentaho
 • Primary component is
   Pentaho Data
   Integration (PDI), also
   known as Kettle.
 • PDI Provides a
   graphical drag-and-
   drop environment for
   defining ETL
   jobs, which interface
   with Java MapReduce
   to execute in-cluster
   transformations.

31
                   ©2012 Cloudera, Inc. All Rights Reserved.
Other ETL Solutions

 • Talend
     – Also following an open-source model.
     – Extending their existing data integration tools
       to data integration.
 • Pervasive RushAnalyzer
     – Software to build and run big data ETL, data
       transformation, mining and visualization on
       Hadoop.


32
                      ©2012 Cloudera, Inc. All Rights Reserved.
BI/Analytics Tools




Enterprise
  Data
Warehouse



Relational
Databases
                 Flume
       Data Import/Export                                                         ETL Tools



                            Appliances                                    NoSQL


 33
                                  ©2012 Cloudera, Inc. All Rights Reserved.
Business Intelligence/Analytics Tools




34
              ©2012 Cloudera, Inc. All Rights Reserved.
BI – The Forrester Research Definition

 "Business Intelligence is a set of
 methodologies, processes, architectures, an
 d technologies that transform raw data into
 meaningful and useful information used to
 enable more effective strategic, tactical, and
 operational insights and decision-making.” *


 * http://en.wikipedia.org/wiki/Business_intelligence


35
                                ©2012 Cloudera, Inc. All Rights Reserved.
Business Intelligence/Analytics Tools




     Relational      Data
                                           …
     Databases    Warehouses




36
                               ©2012 Cloudera, Inc. All Rights Reserved.
Cloudera ODBC Driver
 • Most of these tools use the
   ODBC standard.
 • Since Hive is an SQL-like                                         ODBC


   system it’s a good fit for                                    DRIVER

   ODBC.                                                             HIVEQL

 • ODBC driver for Hive is
   available, but has licensing                                HIVE SERVER



   issues.                                                        HIVE

 • Because of this, Cloudera
   developed it’s own
   drivers, available for free
   download.
37
                   ©2012 Cloudera, Inc. All Rights Reserved.
Hive ODBC Limitations

 • Hive does not have full SQL support.
 • Multi-user is currently not supported by
   Hive Server.
 • Poor support for security.
 • Dependent on Hive – data must be loaded
   in Hive to be available.
 • The Thrift API in the Hive Server doesn’t
   support common ODBC calls.

38
                 ©2012 Cloudera, Inc. All Rights Reserved.
Hive ODBC Limitations

The Hive community is working on Hive Server 2 to
address some of these limitations:
 • Improved support for multiple users.
 • Improved support for ODBC and JDBC
   drivers.
 • And better support for security is coming.




39
                   ©2012 Cloudera, Inc. All Rights Reserved.
MicroStrategy




40
                 ©2012 Cloudera, Inc. All Rights Reserved.
Tableau




41
           ©2012 Cloudera, Inc. All Rights Reserved.
Other BI Connectors

 • Microsoft ODBC Driver
     – Part of the Hadoop on Windows solution.
     – Provides connectivity for MS BI tools such as
       Excel, PowerPivot, etc.
 • MapR ODBC driver
     – Support for standard ODBC based tools.




42
                     ©2012 Cloudera, Inc. All Rights Reserved.
Analytic Tools


     – RHadoop project.

     – Integration of SAS analytics with Hadoop.

     – Integration of SAP HANA with Hadoop

     – Toad for Cloud


43
                        ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop Specific Tools – Karmasphere




44
             ©2012 Cloudera, Inc. All Rights Reserved.
Hadoop Specific Tools – Datameer




45
              ©2012 Cloudera, Inc. All Rights Reserved.
Example Integration




     Event           HParser                                       PowerCenter/Power     Data
                                     Hive                              Exchange
     Logs                                                                              Warehouse




 https://community.informatica.com/mpresources/Communities/IW2012/Docs/bos_65.pdf



46
                                    ©2012 Cloudera, Inc. All Rights Reserved.
Example – Migration of ETL


     Logs            Raw                                    ETL (SQL)             Target
                    Tables                                                        Tables


                                                            Data
                                                          Warehouse




                     HDFS                                       ETL
     Logs   Flume                                           (MapReduce)
                                                                          Sqoop       Target
                                                                                      Tables

                                                                                     Data
                                                                                   Warehouse



47
                             ©2012 Cloudera, Inc. All Rights Reserved.
What’s Missing?

 • Better tools for ETL without coding.
 • Better tools for data governance, data
   quality, etc.
     – Ensuring that data in Hadoop complies with
       policies, rules, etc.
 • Integration with commercial enterprise
   schedulers/workflow engines.
     – Although open-source workflow schedulers
       exist (e.g. Oozie).


48
                     ©2012 Cloudera, Inc. All Rights Reserved.
Conclusions
 • Hadoop integration is still in the early stages.
     – Expect to see new/better tools coming from both vendors
       and the open-source community.
 • Despite the relative immaturity of this space, there’s
   already a dizzying array of solutions available.
     – Choose solutions based on existing skills and tools already
       in use by your organization.
 • If using current BI tools integrated with Hive keep in
   mind that enhancements for multi-user, security, etc.
   are on the way.
 • And it bears repeating: always use the right tool for the
   job.
     – Hadoop won’t replace your data warehouses and
       databases, but will complement them.


49
                          ©2012 Cloudera, Inc. All Rights Reserved.
Thank
                   Questions?
      You!
             http://www.cloudera.com/partners/spotlight/

               +1 (888) 789-1488                         cloudera.com   twitter.com/
                                                                         cloudera
                 sales@cloudera.com

                                                                        facebook.com/
                                                                          cloudera




50
             ©2011 Cloudera, Inc. All Rights Reserved.
Lunch!
Lunch takes place in the Community Showcase (Hall 2)
Sessions will resume at 1:30pm




                                                       Page 51

Mais conteúdo relacionado

Mais procurados

Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemCapgemini
 
A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle AnalyticsRich Clayton
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Denodo
 
Intro to Big Data Analytics and the Hybrid Cloud
Intro to Big Data Analytics and the Hybrid CloudIntro to Big Data Analytics and the Hybrid Cloud
Intro to Big Data Analytics and the Hybrid CloudIan Balina
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
Informatica Solution for SWIFT Integration
Informatica Solution for SWIFT IntegrationInformatica Solution for SWIFT Integration
Informatica Solution for SWIFT IntegrationKim Loughead
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Denodo
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Jeffrey T. Pollock
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsDataWorks Summit
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Publicis Sapient Engineering
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Hitachi data systems and tsys success story
Hitachi data systems and tsys success storyHitachi data systems and tsys success story
Hitachi data systems and tsys success storyHitachi Vantara
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHortonworks
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Top 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsTop 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsHortonworks
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for AnalyticsKatharine Bierce
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Vasu S
 

Mais procurados (20)

Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle Analytics
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
 
Intro to Big Data Analytics and the Hybrid Cloud
Intro to Big Data Analytics and the Hybrid CloudIntro to Big Data Analytics and the Hybrid Cloud
Intro to Big Data Analytics and the Hybrid Cloud
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
SAP EIM
SAP EIM SAP EIM
SAP EIM
 
Informatica Solution for SWIFT Integration
Informatica Solution for SWIFT IntegrationInformatica Solution for SWIFT Integration
Informatica Solution for SWIFT Integration
 
Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes Logical Data Warehouse and Data Lakes
Logical Data Warehouse and Data Lakes
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
Using Hadoop for Cognitive Analytics
Using Hadoop for Cognitive AnalyticsUsing Hadoop for Cognitive Analytics
Using Hadoop for Cognitive Analytics
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Hitachi data systems and tsys success story
Hitachi data systems and tsys success storyHitachi data systems and tsys success story
Hitachi data systems and tsys success story
 
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and HortonworksHow to Become an Analytics Ready Insurer - with Informatica and Hortonworks
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Top 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data AnalyticsTop 5 Strategies for Retail Data Analytics
Top 5 Strategies for Retail Data Analytics
 
Data Mashups for Analytics
Data Mashups for AnalyticsData Mashups for Analytics
Data Mashups for Analytics
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
 

Destaque

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy snehal parikh
 
The future of real time information
The future of real time informationThe future of real time information
The future of real time informationthaiscarbonell1512
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageMarko Rodriguez
 
How to extract valueable information from real time data feeds
How to extract valueable information from real time data feedsHow to extract valueable information from real time data feeds
How to extract valueable information from real time data feedsGene Leybzon
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesData Science Society
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloudSankar Nagarajan
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataData Science Society
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014gethue
 
Data science challenges in flight search
Data science challenges in flight searchData science challenges in flight search
Data science challenges in flight searchData Science Society
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemIvo Vachkov
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming PatternMarko Rodriguez
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTSingleStore
 
Top industry use cases for streaming analytics
Top industry use cases for streaming analyticsTop industry use cases for streaming analytics
Top industry use cases for streaming analyticsIBM Analytics
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 

Destaque (20)

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
The future of real time information
The future of real time informationThe future of real time information
The future of real time information
 
Gremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming LanguageGremlin: A Graph-Based Programming Language
Gremlin: A Graph-Based Programming Language
 
How to extract valueable information from real time data feeds
How to extract valueable information from real time data feedsHow to extract valueable information from real time data feeds
How to extract valueable information from real time data feeds
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Real-time analytics with HBase
Real-time analytics with HBaseReal-time analytics with HBase
Real-time analytics with HBase
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companies
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloud
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open data
 
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
Hue: Big Data Web applications for Interactive Hadoop at Big Data Spain 2014
 
Data science challenges in flight search
Data science challenges in flight searchData science challenges in flight search
Data science challenges in flight search
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming Pattern
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
Top industry use cases for streaming analytics
Top industry use cases for streaming analyticsTop industry use cases for streaming analytics
Top industry use cases for streaming analytics
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 

Semelhante a Integrating Hadoop Into the Enterprise

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Hortonworks
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsItai Yaffe
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateClouderaUserGroups
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopHortonworks
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applicationsrussell_jurney
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 

Semelhante a Integrating Hadoop Into the Enterprise (20)

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management Monoliths
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Hortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics ApplicationsHortonworks: Agile Analytics Applications
Hortonworks: Agile Analytics Applications
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Integrating Hadoop Into the Enterprise

  • 1. Integrating Hadoop into the Enterprise Jonathan Seidman Hadoop Summit 2012 June 14th, 2012
  • 2. Who I Am • Solutions Architect, Partner Engineering Team. • Co-founder of Chicago Hadoop User Group and co-founder/organizer of Chicago Big Data. • jseidman@cloudera.com • @jseidman • cloudera.com/careers 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. What I’ll Be Talking About • Some Background. • Common uses of Hadoop in an enterprise data infrastructure. • Hadoop Integration – the big picture. • Deeper dive: – Data import/export: Moving data between Hadoop and existing data stores. – ETL tools. – Business intelligence (BI) and analytic tools. • Example architectures and data flows. • Conclusions 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. My Life Before Cloudera… 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. Hadoop at Orbitz 100.00% Queries 90.00% 80.00% Searches 71.67% 70.00% 60.00% 50.00% 40.00% 34.30% 31.87% 30.00% 20.00% 10.00% 2.78% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. But Hadoop Was An Isolated System Developers Business Analysts Normal Users Humans 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. Hadoop + the Data Warehouse… 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. …Enabled New Analyses 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. In our opinion, integration with existing IT systems and software is critical, as we know enterprises will not be replacing these technologies anytime soon. For Hadoop platforms this means integration with existing databases, data warehouses, and business-analytics and business-visualization tools. * * A near-term outlook for big data, Jo Maitland, GigaOM Pro, March 2012 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. What Can We Do? • ETL – Scalable ETL – allows companies to meet SLA’s (inexpensively). – Agile – facilitates rapid modifications. • Moving analysis off of existing systems. • Sandbox for exploratory analytics. • Using Hadoop as an active archive. • Joining transactional data from a DB with interaction data. • Common theme: freeing up existing systems for tasks they’re better suited for. 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. BI/Analytics Tools Enterprise Data Warehouse Relational Databases Flume Data Import/Export ETL Tools Appliances NoSQL 11 ©2012 Cloudera, Inc. All Rights Reserved.
  • 12. Data Import/Export Enterprise Data Warehouse Relational Databases 12 ©2012 Cloudera, Inc. All Rights Reserved.
  • 13. Sqoop Overview • Apache project designed to ease import and export of data between Hadoop and relational databases. • Provides functionality to do bulk imports and exports of data with HDFS, Hive and HBase. • Java based. Leverages MapReduce to transfer data in parallel. 13 ©2012 Cloudera, Inc. All Rights Reserved.
  • 14. Sqoop Overview • Uses a “connector” abstraction. • Two types of connectors – Standard connectors are JDBC based. – Direct connectors use native database interfaces to improve performance. • Direct connectors are available for many open-source and commercial databases – MySQL, PostgreSQL, Oracle, SQL Server, Teradata, etc. 14 ©2012 Cloudera, Inc. All Rights Reserved.
  • 15. Sqoop Import Flow Run import Collect metadata Client Sqoop Generate code, Pull data Execute MR job MapReduce Map Map Map Write to Hadoop Hadoop 15 ©2012 Cloudera, Inc. All Rights Reserved.
  • 16. Sqoop Limitations Sqoop has some limitations, including: • Poor support for security. $ sqoop import –username scott –password tiger… – Sqoop can read command line options from an option file, but this still has holes. • Error prone syntax. • Tight coupling to JDBC model – not a good fit for non-RDBMS systems. 16 ©2012 Cloudera, Inc. All Rights Reserved.
  • 17. Fortunately… Sqoop 2 (incubating) will address many of these limitations: • Adds a web-based GUI. • Centralized configuration. • More flexible model. • Improved security model. 17 ©2012 Cloudera, Inc. All Rights Reserved.
  • 18. Informatica PowerExchange • Not just RDBMS integration – provides consistent, native integration between Hadoop and a range of data sources, databases, legacy systems, standard file formats, CRM… • Integrated with PowerCenter for pre/post- processing of data, administration, and metadata management. 18 ©2012 Cloudera, Inc. All Rights Reserved.
  • 19. Power Exchange – Data Import Access Data Pre-Process Ingest Data Web server Databases, PowerExchange PowerCenter Data Warehouse Batch HDFS Message Queues, Email, Social Media CDC HIVE e.g. Filter, Join, Cle ERP, CRM anse Real-time Mainframe 19 ©2012 Cloudera, Inc. All Rights Reserved.
  • 20. Power Exchange – Data Export Extract Data Post-Process Deliver Data Web server PowerCenter PowerExchange Databases, Data Warehouse HDFS Batch Real-time ERP, CRM e.g. Transform to target schema Mainframe 20 ©2012 Cloudera, Inc. All Rights Reserved.
  • 21. Informatica PowerExchange 1. Create Ingest or Extract Mapping 2. Create Hadoop Connection 3. Configure Workflow 4. Configure Hive Properties 21 ©2012 Cloudera, Inc. All Rights Reserved.
  • 22. There’s Always the Low-Tech Way… GreenPlum GPLoad Hadoop GreenPlum Processing Hive Local Disk GreenPlum 22 ©2012 Cloudera, Inc. All Rights Reserved.
  • 23. BI/Analytics Tools Enterprise Data Warehouse Relational Databases Flume Data Import/Export ETL Tools Appliances NoSQL 23 ©2012 Cloudera, Inc. All Rights Reserved.
  • 24. ETL Tools 24 ©2012 Cloudera, Inc. All Rights Reserved.
  • 25. ETL Tools 25 ©2012 Cloudera, Inc. All Rights Reserved.
  • 26. ETL – The Wikipedia Definition • Extract, transform and load (ETL) is a process in database usage and especially in data warehousing that involves: – Extracting data from outside sources – Transforming it to fit operational needs – Loading it into the end target (DB or data warehouse) http://en.wikipedia.org/wiki/Extract,_transform,_load 26 ©2012 Cloudera, Inc. All Rights Reserved.
  • 27. ETL Tools • Very common use case for Hadoop. • Most ETL in Hadoop is still done through plain old MapReduce. • Companies want to leverage their existing developer skills – many enterprises have armies of SQL and ETL developers. 27 ©2012 Cloudera, Inc. All Rights Reserved.
  • 28. Informatica HParser • Not exactly ETL – provides data transformation and parsing optimized for parallel processing on Hadoop. • Supports deeply hierarchical data and complex data formats. • Transformations are defined in a Windows UI and then deployed to a Hadoop Cluster for execution. 28 ©2012 Cloudera, Inc. All Rights Reserved.
  • 29. HParser – How does it work? hadoop … dt-hadoop.jar … My_Parser /input/*/input*.txt HDFS 1. Develop a DT transformation 2. Deploy the transformation to Hadoop 3. Run DT on Hadoop to produce tabular data 4. Analyze the data with HIVE / PIG / MapReduce / Other… 29 ©2012 Cloudera, Inc. All Rights Reserved.
  • 30. Pentaho • Existing BI tools extended to support Hadoop. • Not just ETL – also provides data import/export, job orchestration, reporting, and analysis functionality. • Supports integration with HDFS, Hive and Hbase. • Community and Enterprise Editions offered. 30 ©2012 Cloudera, Inc. All Rights Reserved.
  • 31. Pentaho • Primary component is Pentaho Data Integration (PDI), also known as Kettle. • PDI Provides a graphical drag-and- drop environment for defining ETL jobs, which interface with Java MapReduce to execute in-cluster transformations. 31 ©2012 Cloudera, Inc. All Rights Reserved.
  • 32. Other ETL Solutions • Talend – Also following an open-source model. – Extending their existing data integration tools to data integration. • Pervasive RushAnalyzer – Software to build and run big data ETL, data transformation, mining and visualization on Hadoop. 32 ©2012 Cloudera, Inc. All Rights Reserved.
  • 33. BI/Analytics Tools Enterprise Data Warehouse Relational Databases Flume Data Import/Export ETL Tools Appliances NoSQL 33 ©2012 Cloudera, Inc. All Rights Reserved.
  • 34. Business Intelligence/Analytics Tools 34 ©2012 Cloudera, Inc. All Rights Reserved.
  • 35. BI – The Forrester Research Definition "Business Intelligence is a set of methodologies, processes, architectures, an d technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making.” * * http://en.wikipedia.org/wiki/Business_intelligence 35 ©2012 Cloudera, Inc. All Rights Reserved.
  • 36. Business Intelligence/Analytics Tools Relational Data … Databases Warehouses 36 ©2012 Cloudera, Inc. All Rights Reserved.
  • 37. Cloudera ODBC Driver • Most of these tools use the ODBC standard. • Since Hive is an SQL-like ODBC system it’s a good fit for DRIVER ODBC. HIVEQL • ODBC driver for Hive is available, but has licensing HIVE SERVER issues. HIVE • Because of this, Cloudera developed it’s own drivers, available for free download. 37 ©2012 Cloudera, Inc. All Rights Reserved.
  • 38. Hive ODBC Limitations • Hive does not have full SQL support. • Multi-user is currently not supported by Hive Server. • Poor support for security. • Dependent on Hive – data must be loaded in Hive to be available. • The Thrift API in the Hive Server doesn’t support common ODBC calls. 38 ©2012 Cloudera, Inc. All Rights Reserved.
  • 39. Hive ODBC Limitations The Hive community is working on Hive Server 2 to address some of these limitations: • Improved support for multiple users. • Improved support for ODBC and JDBC drivers. • And better support for security is coming. 39 ©2012 Cloudera, Inc. All Rights Reserved.
  • 40. MicroStrategy 40 ©2012 Cloudera, Inc. All Rights Reserved.
  • 41. Tableau 41 ©2012 Cloudera, Inc. All Rights Reserved.
  • 42. Other BI Connectors • Microsoft ODBC Driver – Part of the Hadoop on Windows solution. – Provides connectivity for MS BI tools such as Excel, PowerPivot, etc. • MapR ODBC driver – Support for standard ODBC based tools. 42 ©2012 Cloudera, Inc. All Rights Reserved.
  • 43. Analytic Tools – RHadoop project. – Integration of SAS analytics with Hadoop. – Integration of SAP HANA with Hadoop – Toad for Cloud 43 ©2012 Cloudera, Inc. All Rights Reserved.
  • 44. Hadoop Specific Tools – Karmasphere 44 ©2012 Cloudera, Inc. All Rights Reserved.
  • 45. Hadoop Specific Tools – Datameer 45 ©2012 Cloudera, Inc. All Rights Reserved.
  • 46. Example Integration Event HParser PowerCenter/Power Data Hive Exchange Logs Warehouse https://community.informatica.com/mpresources/Communities/IW2012/Docs/bos_65.pdf 46 ©2012 Cloudera, Inc. All Rights Reserved.
  • 47. Example – Migration of ETL Logs Raw ETL (SQL) Target Tables Tables Data Warehouse HDFS ETL Logs Flume (MapReduce) Sqoop Target Tables Data Warehouse 47 ©2012 Cloudera, Inc. All Rights Reserved.
  • 48. What’s Missing? • Better tools for ETL without coding. • Better tools for data governance, data quality, etc. – Ensuring that data in Hadoop complies with policies, rules, etc. • Integration with commercial enterprise schedulers/workflow engines. – Although open-source workflow schedulers exist (e.g. Oozie). 48 ©2012 Cloudera, Inc. All Rights Reserved.
  • 49. Conclusions • Hadoop integration is still in the early stages. – Expect to see new/better tools coming from both vendors and the open-source community. • Despite the relative immaturity of this space, there’s already a dizzying array of solutions available. – Choose solutions based on existing skills and tools already in use by your organization. • If using current BI tools integrated with Hive keep in mind that enhancements for multi-user, security, etc. are on the way. • And it bears repeating: always use the right tool for the job. – Hadoop won’t replace your data warehouses and databases, but will complement them. 49 ©2012 Cloudera, Inc. All Rights Reserved.
  • 50. Thank Questions? You! http://www.cloudera.com/partners/spotlight/ +1 (888) 789-1488 cloudera.com twitter.com/ cloudera sales@cloudera.com facebook.com/ cloudera 50 ©2011 Cloudera, Inc. All Rights Reserved.
  • 51. Lunch! Lunch takes place in the Community Showcase (Hall 2) Sessions will resume at 1:30pm Page 51

Notas do Editor

  1. Common theme: moving time, space, or processor intensive processing to Hadoop.
  2. Flume provides ingestion of streaming data (e.g. logs) into Hadoop.
  3. Client executesSqoop job.Sqoop interrogates DB for column names, types, etc.Based on extracted metadata, Sqoop creates source code for table class, and then kicks off MR job. This table class can be used for processing on extracted records.Sqoop by default will guess at a column for splitting data for distribution across the cluster. This can also be specified by client.
  4. Pentaho also has integration with NoSQL DBs (Mongo, Cassandra, etc.)
  5. Most of these tools integrate to existing data stores using the ODBC standard.
  6. MSTR and Tableau are tested and certified now with the Cloudera driver, but other standard ODBC based tools should also work, and more integrations will be supported soon.
  7. Also, Cloudera has implemented a solution for multi-user, which will also soon support authentication.
  8. In memory model supports low-latency queries.