SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   1




INTEGRATING BIG
DATA
Dataversity Webinar
Feb 7 2012
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   2




State of Data Today
©2012 Sixth Sense Advisors, Inc. All Rights Reserved    3




A Growing Trend
 Expectations for BI are changing w/o anyone telling us

  Requirement         Expectations                               Reality
     Speed         Speed of the Internet              Speed = Infra + Arch +
                                                            Design
  Accessibility      Accessibility of a                   BI Tool licenses &
                       Smartphone                              security
    Usability         IPAD - Mobility                   Web Enabled BI Tool
   Availability       Google Search                  Data & Report Metadata
    Delivery        Speed of questions                Methodology & Signoff
      Data         Access to everything                    Structured Data
   Scalability       Cloud (Amazon)                    Existing Infrastructure
      Cost        Cell phone or Free WIFI                        Millions
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   4



The	
  Wisdom	
  of	
  Crowds	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   5


Data	
  Deluge	
  =	
  Business	
  Insights	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   6	
  



   BIG	
  Data	
  
Structured             Current                       New

                      ERP
                      CRM
                      SCM


                     Content
                     Management
                     Systems

                     Email
                     Call Center

                     Documents
                     Contracts


UnStructured
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   7




What’s so Big about Big Data

            Velocity
            Volume
            Variety
           Complexity
           Ambiguity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   8


               So you are about to start the Big
               Data Project

   Tools                                                               Output




                     Data


instructions
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   9	
  




        The	
  Normal	
  Way	
  Results	
  In	
  ……..	
  




Image Source: Web
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   10	
  




  Why	
  Big	
  Data	
  can	
  Fail	
  on	
  the	
  RDBMS?	
  

                         New Data Types
   Current
                          New volume
     Data                                                             •  POOR
 Management               New analytics                                  Performance
   Platform                                                           •  Failed
(RDBMS + ETL             New workload                                    Programs
     +BI)                New metadata


                                                             Scalability; Sharding; ACID;
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   11	
  




BIG Data
•  Workload Demands                   •  Infrastructure
   •  Process dynamic data              Requirements
      content                             •  Scalable platform
   •  Process unstructured                •  Database independence
      data                                •  Fault tolerant
   •  Systems that can scale                 architectures
      up and scale out with               •  Low cost of acquisition
      high volume data                       and store
   •  Perform complex
                                          •  Supported by standard
      operations within                      toolsets
      reasonable response
      time
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   12




Hadoop


                                               Design Goals
                                               ü  System Shall Manage and
                                                   Heal Itself
                                               ü  Performance Shall Scale
                                                   Linearly
                                               ü  Compute Shall Move to
                                                   Data
                                               ü  Simple Core, Modular and
                                                   Extensible
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   13


Hadoop Differentiators

 Schema-on-Write: RDBMS                       Schema-on-Read: Hadoop
•    Schema must be created                   •    Data is simply copied to the file
     before data is loaded.                        store, no special transformation
                                                   is needed.
•    An explicit load operation has
     to take place which transforms           •    A SerDe (Serializer/Deserlizer)
     the data to the internal                      is applied during read time to
     structure of the database.                    extract the required columns.
•    New columns must be added                •    New data can start flowing
     explicitly before data for such               anytime and will appear
     columns can be loaded into                    retroactively once the SerDe is
     the database.                                 updated to parse them.
•    Read is Fast.                            •    Load is Fast
•    Standards/Governance.                    •    Evolving Schemas/Agility
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   14




Hadoop Known Limitations
•  Write-once model
•  A namespace with an extremely large number of files exceeds
   Namenode’s capacity to maintain
•  Cannot be mounted by exisiting OS
  •  Getting data in and out is tedious
  •  Virtual File System can solve problem
•  HDFS does not implement / support
   •  User quotas
   •  Access permissions
   •  Hard or soft links
   •  Data balancing schemes
•  No periodic checkpoints
•  Namenode is single point of failure
   •  Automatic restart and failover to another machine not yet supported
©2012 Sixth Sense Advisors, Inc. All Rights Reserved    15

   Hadoop Tips
•  Hadoop is useful                                    •  Implementation
   •  When you must process lots of                        •  Think big, start small
      unstructured data                                    •  Build on agile cycles
   •  When running batch jobs is                           •  Focus on the data, as you will
      acceptable                                              always develop schema on
   •  When you have access to lots of                         write.
      cheap hardware



                                                       •  Available Optimizations
•  Hadoop is not useful
                                                           •    Input to Maps
   •  For intense calculations with little or              •    Map only jobs
      no data                                              •    Combiner
   •  When your data is not self-contained                 •    Compression
                                                           •    Speculation
   •  When you need interactive results
                                                           •    Fault Tolerance
                                                           •    Buffer Size
                                                           •    Parallelism (threads)
                                                           •    Partitioner
                                                           •    Reporter
                                                           •    DistributedCache
                                                           •    Task child environment settings
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   16




 Hadoop Tips
•  Troubleshooting                                  •  Performance Tuning
  •  Are your partitions uniform?                       •  Increase the memory/buffer allocated
  •  Can you combine records at the map                      to the tasks
       side?                                            •    Increase the number of tasks that can
  •    Are maps reading off a DFS block                      be run in parallel
       worth of data?                                   •    Increase the number of threads that
  •    Are you running a single reduce wave                  serve the map outputs
       (unless the data size per reducers is            •    Disable unnecessary logging
       too big) ?                                       •    Turn on speculation
  •    Have you tried compressing                       •    Run reducers in one wave as they
       intermediate data & final data?                       tend to get expensive
  •    Are there buffer size issues                     •    Tune the usage of DistributedCache,
  •    Do you see unexplained “long tails”                   it can increase efficiency
  •    Are your CPU cores busy?
  •    Is at least one system resource being
       loaded?
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   17




NoSQL
•  Stands for Not Only SQL
•  Based on CAP Theorem
•  Usually do not require a fixed table schema nor do they
   use the concept of joins
•  All NoSQL offerings relax one or more of the ACID
   properties
•  NoSQL databases come in a variety of flavors
  •  XML (myXMLDB, Tamino, Sedna)
  •  Wide Column (Cassandra, Hbase, Big Table)
  •  Key/Value (Redis, Memcached with BerkleyDB)
  •  Graph (neo4j, InfoGrid)
  •  Document store (CouchDB, MongoDB)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved      18




 NoSQL Footprint

           Key       Amazon Dynamo
          Value


       Voldermort               Big       Google Big Table
                               Table
Size
                              HBase                                Lotus Notes
                                                         Doc
                                                       Database
                  Cassandra                                                                   Graph
                                                                                      Graph
                                                                                              Theory




                                   Complexity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   19




    NoSQL
•  Access and Query                      •  Best Practices
    •  RESTful interfaces (HTTP as an        •  Design for data collection
       accessAPI)                            •  Plan the data store
    •  Query languages other than SQL        •  Organize by type and semantics
        •  SPARQL - Query language for       •  Partition for performance
           the SemanticWeb                        •  Access and Query is run time
        •  Gremlin - the graph traversal             dependent
           language                          •  Horizontal scaling
        •  Sones Graph Query Language        •  Memory Caching
    •  Data Manipulation / Query API
        •  The Google BigTable
           DataStoreAPI
        •  The Neo4jTraversalAPI
    •  Serialization Formats
        •  JSON
        •  Thrift
        •  ProtoBuffers
        •  RDF
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   20




     Textual ETL Engine
Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of
data that can be analyzed by standard analytical tools


                                                         •     Textual ETL Engine provides a robust user
                                                               interface to define rules (or patterns / keywords)
                                                               to process unstructured or semi-structured data.
                                                         •     The rules engine encapsulates all the complexity
                                                               and lets the user define simple phrases and
                                                               keywords
                                                         •     Easy to implement and easy to realize ROI




•    Advantages                                               •    Disadvantages
       •  Simple to use                                              •  Not integrated with Hadoop as a rules
       •  No MR or Coding required for text analysis                    interface
          and mining                                                 •  Currently uses Sqoop for metadata
       •  Extensible by Taxonomy integration                            interchange with Hadoop or NoSQL
       •  Works on standard and new databases                           interfaces
       •  Produces a highly columnar key-value                       •  Current GA does not handle distributed
          store, ready for metadata integration                         processing outside Windows platform
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   21




Integration
•  All RDBMS vendors today are supporting Hadoop or NoSQL as
 an integration or extension
  •    Oracle Exalytics / Big Data Appliance
  •    Teradata Aster Appliance
  •    EMC Greenplum Appliance
  •    IBM BigInsights
  •    Microsoft Windows Azure Integration
•  There are multiple providers of Hadoop distribution
   •  CloudEra
   •  HortonWorks
   •  Zettaset
•  Adapters from vendors to interface with CloudEra or
 HortonWorks distributions of Hadoop are available today. There
 are integration efforts to release Hadoop as an integral engine
 across the RDBMS vendor platforms
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   22

           Conceptual	
  SoluEon	
  Architecture	
  
                                                  Metadata             MDM


              ETL
                                Data
OLTP          ELT
                              Warehouse                                            Reporting
              CDC
                                                                                   Analytics
                                                     DataMart’s                     Search
                                                                                     OLAP
                                                                                  Text Mining
                               Big Data                                         Content Analytics
BIG Data      Textual            DW                                            Knowledge Analytics
Content        ETL
 Email                         Taxonomy
  Docs
              And / Or

           MR / Ruby / Java
              (Hadoop)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   23




Integration Tips
•  The key to the castle in integrating Big Data is metadata
•  Whatever the tool, technology and technique, if you do not
   know your metadata, your integration will fail
•  Semantic technologies and architectures will be the way to
   process and integrate the Big Data, much akin to Web 2.0
   models
•  Data quality for Big Data is a very questionable goal. To get
   some semblance of quality, taxonomies and ontologies can be
   of help
•  3rd part data providers also provide keywords, trending tags
   and scores, these can provide a lot of integration support
•  Writing business rules for Big Data can be very cumbersome
   and not all programs can be written in MapReduce
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   24


Which Tool


  Application      Hadoop              NoSQL               Textual ETL
Machine Learning     x                     x
  Sentiments         x                     x                       x
Text Processing      x                     x                       x
Image Processing     x                     x
 Video Analytics     x                     x
  Log Parsing        x                     x                       x
  Collaborative      x                     x                       x
    Filtering
 Context Search                                                    x
Email & Content                                                    x
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   25

Success	
  Stories	
  
 •  Machine learning & Recommendation Engines – Amazon,
      Orbitz
 •    CRM - Consumer Analytics, Metrics, Social Network
      Analytics, Churn, Sentiment, Influencer, Proximity
 •    Finance – Fraud, Compliance
 •    Telco – CDR, Fraud
 •    Healthcare – Provider / Patient analytics, fraud, proactive
      care
 •    Lifesciences – clinical analytics, physician outreach
 •    Pharma – Pharmacovigilance, clinical trials
 •    Insurance – fraud, geo-spatial
 •    Manufacturing – warranty analytics, supplier quality
      metrics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   26




Data Science

Data Analytics                 Art & Science                          APPLIED SCIENCE

 Content                                                       User Interest Prediction
 Customer                                                         inventory prediction
 Product                                                              Machine learning
 Behaviors                                                              Pattern Mining
 Optimization                                                   Advanced Regression
 Big Data Processing & ETL                                                    Analysis



Business Intelligence
                                                                        Advanced Analytics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   27

Challenges	
  
 •  Resources	
  Availability	
  
 •  MR	
  is	
  hard	
  to	
  implement	
  
 •  Speech	
  to	
  text	
  
     •  ConversaEon	
  context	
  is	
  oJen	
  missing	
  
     •  Quality	
  of	
  recording	
  
     •  Accent	
  issues	
  
 •  Visual	
  data	
  tagging	
  
     •  Images	
  
     •  Text	
  embedded	
  within	
  images	
  
 •  Metadata	
  is	
  not	
  available	
  
 •  Data	
  is	
  not	
  trusted	
  	
  
 •  Content	
  management	
  plaMorm	
  capabiliEes	
  
 •  Ontologies	
  Ambiguity	
  
 •  Taxonomy	
  IntegraEon	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   28




Contact
•  Krish Krishnan
   rkrish1124@yahoo.com
       Twitter: @datagenius

Mais conteúdo relacionado

Mais procurados

Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache HadoopSuman Saurabh
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process Omid Vahdaty
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 

Mais procurados (20)

Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 

Destaque

Exploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in MuseumsExploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in MuseumsBrendan Ciecko
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collectionslljohnston
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Mia
 
Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies exouniversity
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015 Den Reymer
 
Introduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakdaIntroduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakdaBAINIDA
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Den Reymer
 

Destaque (9)

Exploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in MuseumsExploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in Museums
 
Liam
LiamLiam
Liam
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)
 
Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies
 
QlikView & Big Data
QlikView & Big DataQlikView & Big Data
QlikView & Big Data
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
 
Introduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakdaIntroduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakda
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017
 

Semelhante a Integrating Big Data Technologies

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Blueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biBlueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biDataWorks Summit
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...ArunshankarArjunan
 
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesSharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesHenry Ong
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessTeradata Aster
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesHortonworks
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopHortonworks
 

Semelhante a Integrating Big Data Technologies (20)

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Blueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biBlueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and bi
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
 
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesSharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
 

Mais de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Mais de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Integrating Big Data Technologies

  • 1. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 1 INTEGRATING BIG DATA Dataversity Webinar Feb 7 2012
  • 2. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 2 State of Data Today
  • 3. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 3 A Growing Trend Expectations for BI are changing w/o anyone telling us Requirement Expectations Reality Speed Speed of the Internet Speed = Infra + Arch + Design Accessibility Accessibility of a BI Tool licenses & Smartphone security Usability IPAD - Mobility Web Enabled BI Tool Availability Google Search Data & Report Metadata Delivery Speed of questions Methodology & Signoff Data Access to everything Structured Data Scalability Cloud (Amazon) Existing Infrastructure Cost Cell phone or Free WIFI Millions
  • 4. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 4 The  Wisdom  of  Crowds  
  • 5. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 5 Data  Deluge  =  Business  Insights  
  • 6. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 6   BIG  Data   Structured Current New ERP CRM SCM Content Management Systems Email Call Center Documents Contracts UnStructured
  • 7. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 7 What’s so Big about Big Data Velocity Volume Variety Complexity Ambiguity
  • 8. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 8 So you are about to start the Big Data Project Tools Output Data instructions
  • 9. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 9   The  Normal  Way  Results  In  ……..   Image Source: Web
  • 10. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 10   Why  Big  Data  can  Fail  on  the  RDBMS?   New Data Types Current New volume Data •  POOR Management New analytics Performance Platform •  Failed (RDBMS + ETL New workload Programs +BI) New metadata Scalability; Sharding; ACID;
  • 11. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 11   BIG Data •  Workload Demands •  Infrastructure •  Process dynamic data Requirements content •  Scalable platform •  Process unstructured •  Database independence data •  Fault tolerant •  Systems that can scale architectures up and scale out with •  Low cost of acquisition high volume data and store •  Perform complex •  Supported by standard operations within toolsets reasonable response time
  • 12. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 12 Hadoop Design Goals ü  System Shall Manage and Heal Itself ü  Performance Shall Scale Linearly ü  Compute Shall Move to Data ü  Simple Core, Modular and Extensible
  • 13. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 13 Hadoop Differentiators Schema-on-Write: RDBMS Schema-on-Read: Hadoop •  Schema must be created •  Data is simply copied to the file before data is loaded. store, no special transformation is needed. •  An explicit load operation has to take place which transforms •  A SerDe (Serializer/Deserlizer) the data to the internal is applied during read time to structure of the database. extract the required columns. •  New columns must be added •  New data can start flowing explicitly before data for such anytime and will appear columns can be loaded into retroactively once the SerDe is the database. updated to parse them. •  Read is Fast. •  Load is Fast •  Standards/Governance. •  Evolving Schemas/Agility
  • 14. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 14 Hadoop Known Limitations •  Write-once model •  A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain •  Cannot be mounted by exisiting OS •  Getting data in and out is tedious •  Virtual File System can solve problem •  HDFS does not implement / support •  User quotas •  Access permissions •  Hard or soft links •  Data balancing schemes •  No periodic checkpoints •  Namenode is single point of failure •  Automatic restart and failover to another machine not yet supported
  • 15. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 15 Hadoop Tips •  Hadoop is useful •  Implementation •  When you must process lots of •  Think big, start small unstructured data •  Build on agile cycles •  When running batch jobs is •  Focus on the data, as you will acceptable always develop schema on •  When you have access to lots of write. cheap hardware •  Available Optimizations •  Hadoop is not useful •  Input to Maps •  For intense calculations with little or •  Map only jobs no data •  Combiner •  When your data is not self-contained •  Compression •  Speculation •  When you need interactive results •  Fault Tolerance •  Buffer Size •  Parallelism (threads) •  Partitioner •  Reporter •  DistributedCache •  Task child environment settings
  • 16. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 16 Hadoop Tips •  Troubleshooting •  Performance Tuning •  Are your partitions uniform? •  Increase the memory/buffer allocated •  Can you combine records at the map to the tasks side? •  Increase the number of tasks that can •  Are maps reading off a DFS block be run in parallel worth of data? •  Increase the number of threads that •  Are you running a single reduce wave serve the map outputs (unless the data size per reducers is •  Disable unnecessary logging too big) ? •  Turn on speculation •  Have you tried compressing •  Run reducers in one wave as they intermediate data & final data? tend to get expensive •  Are there buffer size issues •  Tune the usage of DistributedCache, •  Do you see unexplained “long tails” it can increase efficiency •  Are your CPU cores busy? •  Is at least one system resource being loaded?
  • 17. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 17 NoSQL •  Stands for Not Only SQL •  Based on CAP Theorem •  Usually do not require a fixed table schema nor do they use the concept of joins •  All NoSQL offerings relax one or more of the ACID properties •  NoSQL databases come in a variety of flavors •  XML (myXMLDB, Tamino, Sedna) •  Wide Column (Cassandra, Hbase, Big Table) •  Key/Value (Redis, Memcached with BerkleyDB) •  Graph (neo4j, InfoGrid) •  Document store (CouchDB, MongoDB)
  • 18. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 18 NoSQL Footprint Key Amazon Dynamo Value Voldermort Big Google Big Table Table Size HBase Lotus Notes Doc Database Cassandra Graph Graph Theory Complexity
  • 19. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 19 NoSQL •  Access and Query •  Best Practices •  RESTful interfaces (HTTP as an •  Design for data collection accessAPI) •  Plan the data store •  Query languages other than SQL •  Organize by type and semantics •  SPARQL - Query language for •  Partition for performance the SemanticWeb •  Access and Query is run time •  Gremlin - the graph traversal dependent language •  Horizontal scaling •  Sones Graph Query Language •  Memory Caching •  Data Manipulation / Query API •  The Google BigTable DataStoreAPI •  The Neo4jTraversalAPI •  Serialization Formats •  JSON •  Thrift •  ProtoBuffers •  RDF
  • 20. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 20 Textual ETL Engine Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of data that can be analyzed by standard analytical tools •  Textual ETL Engine provides a robust user interface to define rules (or patterns / keywords) to process unstructured or semi-structured data. •  The rules engine encapsulates all the complexity and lets the user define simple phrases and keywords •  Easy to implement and easy to realize ROI •  Advantages •  Disadvantages •  Simple to use •  Not integrated with Hadoop as a rules •  No MR or Coding required for text analysis interface and mining •  Currently uses Sqoop for metadata •  Extensible by Taxonomy integration interchange with Hadoop or NoSQL •  Works on standard and new databases interfaces •  Produces a highly columnar key-value •  Current GA does not handle distributed store, ready for metadata integration processing outside Windows platform
  • 21. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 21 Integration •  All RDBMS vendors today are supporting Hadoop or NoSQL as an integration or extension •  Oracle Exalytics / Big Data Appliance •  Teradata Aster Appliance •  EMC Greenplum Appliance •  IBM BigInsights •  Microsoft Windows Azure Integration •  There are multiple providers of Hadoop distribution •  CloudEra •  HortonWorks •  Zettaset •  Adapters from vendors to interface with CloudEra or HortonWorks distributions of Hadoop are available today. There are integration efforts to release Hadoop as an integral engine across the RDBMS vendor platforms
  • 22. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 22 Conceptual  SoluEon  Architecture   Metadata MDM ETL Data OLTP ELT Warehouse Reporting CDC Analytics DataMart’s Search OLAP Text Mining Big Data Content Analytics BIG Data Textual DW Knowledge Analytics Content ETL Email Taxonomy Docs And / Or MR / Ruby / Java (Hadoop)
  • 23. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 23 Integration Tips •  The key to the castle in integrating Big Data is metadata •  Whatever the tool, technology and technique, if you do not know your metadata, your integration will fail •  Semantic technologies and architectures will be the way to process and integrate the Big Data, much akin to Web 2.0 models •  Data quality for Big Data is a very questionable goal. To get some semblance of quality, taxonomies and ontologies can be of help •  3rd part data providers also provide keywords, trending tags and scores, these can provide a lot of integration support •  Writing business rules for Big Data can be very cumbersome and not all programs can be written in MapReduce
  • 24. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 24 Which Tool Application Hadoop NoSQL Textual ETL Machine Learning x x Sentiments x x x Text Processing x x x Image Processing x x Video Analytics x x Log Parsing x x x Collaborative x x x Filtering Context Search x Email & Content x
  • 25. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 25 Success  Stories   •  Machine learning & Recommendation Engines – Amazon, Orbitz •  CRM - Consumer Analytics, Metrics, Social Network Analytics, Churn, Sentiment, Influencer, Proximity •  Finance – Fraud, Compliance •  Telco – CDR, Fraud •  Healthcare – Provider / Patient analytics, fraud, proactive care •  Lifesciences – clinical analytics, physician outreach •  Pharma – Pharmacovigilance, clinical trials •  Insurance – fraud, geo-spatial •  Manufacturing – warranty analytics, supplier quality metrics
  • 26. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 26 Data Science Data Analytics Art & Science APPLIED SCIENCE Content User Interest Prediction Customer inventory prediction Product Machine learning Behaviors Pattern Mining Optimization Advanced Regression Big Data Processing & ETL Analysis Business Intelligence Advanced Analytics
  • 27. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 27 Challenges   •  Resources  Availability   •  MR  is  hard  to  implement   •  Speech  to  text   •  ConversaEon  context  is  oJen  missing   •  Quality  of  recording   •  Accent  issues   •  Visual  data  tagging   •  Images   •  Text  embedded  within  images   •  Metadata  is  not  available   •  Data  is  not  trusted     •  Content  management  plaMorm  capabiliEes   •  Ontologies  Ambiguity   •  Taxonomy  IntegraEon  
  • 28. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 28 Contact •  Krish Krishnan rkrish1124@yahoo.com Twitter: @datagenius