SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
EMC Isilon Big Data
                                            Storage and Hadoop
                                            Analytics


                                              Jemish Patel




© Copyright 2012 EMC Corporation. All rights reserved.            1
Today’s Agenda
 • The Big Data Opportunity
 • Big Data Analytics with Hadoop
 • Technology Challenges of Hadoop
 • EMC’s Hadoop Solutions for the Enterprise
 • Q+A




© Copyright 2012 EMC Corporation. All rights reserved.   2
The Big Data
                                           Opportunity




© Copyright 2012 EMC Corporation. All rights reserved.    3
!!!
                                                                                     !!!
“Big Data Is Less
 About Size, And
 More About Freedom”
                       ―Techcrunch                                                                    !!!
                                                                                                !!!
                                      !!!
                                                 “Findings: ‘Big Data’ Is
                                                  More Extreme Than
                                                  Volume”                 “Big Data! It’s Real, It’s
                                                                       ― Gartner
                                                                                   Real-time, and It’s
                                                                                   Already Changing Your
                                                                                   World”
                                     “Total data:                                                     ―IDC

                !!!                   ‘bigger’ than big
                                      data”
                                                                          !!!
                                                         ― 451 Group
                                                                                   !!!

© Copyright 2012 EMC Corporation. All rights reserved.                                                       4
!!!
                                                                                         !!!
“Big Data Is Less
 About Size, And
 More About Freedom”
                       ―Techcrunch
                                                     THE ERA OF                                      !!!
                                                                                          !!!


                           BIG DATA
                                                 “Findings: ‘Big Data’ Is
                                      !!!
                                                  More Extreme Than
                                                  Volume”                 “Big Data! It’s Real, It’s
                                                                       ― Gartner    Real-time, and It’s
                                                                                    Already Changing Your
                                                          IS HERE                   World”
                                     “Total data:                                                   ―IDC
                                                                       !!!
                !!!                   ‘bigger’ than big
                                      data”                                        !!!
                                                         ― 451 Group




© Copyright 2012 EMC Corporation. All rights reserved.                                                      5
BIG DATA
                                                         IS TRANSFORMING
                                                         BUSINESS


© Copyright 2012 EMC Corporation. All rights reserved.                     6
Big Data in Action
• Healthcare

       – Leverage historical data to discover better
         treatments

• Financial Services

       – Data-driven banking stress tests & risk
         analysis

• Utilities

       – Machine-learning to predict service outages
         & prevent energy theft



 © Copyright 2012 EMC Corporation. All rights reserved.   7
Hadoop & Big Data




© Copyright 2012 EMC Corporation. All rights reserved.   8
The Promise of Big Data Analytics
 Leverage data assets to identify key
  trends and new business opportunities

 Analyze new sources of information to
  gain competitive advantages

 Take an agile approach to analytics that
  can adapt at the speed of business

 Scale your storage and analysis
  platform to handle Big Data’s volume,
  velocity and variety




© Copyright 2012 EMC Corporation. All rights reserved.   9
The Emergence of Hadoop
• Created 5-6 years ago by former Yahoo!
  Engineer, Doug Cutting
• Software platform designed to analyze
  massive amounts of unstructured data
• Two core components:
       – Hadoop Distributed File System (HDFS) (storage)

       – MapReduce (compute)

• Now a top-level Apache project backed by
  large, open source development community



© Copyright 2012 EMC Corporation. All rights reserved.     10
MapReduce
•"Map" step: The master node takes the input, divides it into
smaller sub-problems, and distributes them to worker nodes. A
worker node may do this again in turn, leading to a multi-level
tree structure. The worker node processes the smaller problem,
and passes the answer back to its master node.


•"Reduce" step: The master node then collects the answers to all
the sub-problems and combines them in some way to form the
output – the answer to the problem it was originally trying to
solve.




© Copyright 2012 EMC Corporation. All rights reserved.             11
MapReduce




© Copyright 2012 EMC Corporation. All rights reserved.   12
Services for MapReduce
•JobTracker – A master node that manages job submissions, scheduling
and reprocessing in case of job failures. Jobs consist of a mapper, a
reducer and a list of inputs.
•TaskTracker- Each slave node in the cluster runs a TaskTracker process.
The JobTracker instructs the TaskTrackers to run and monitor a task. A
task consists of a map or a reduce over a piece of data.




© Copyright 2012 EMC Corporation. All rights reserved.                     13
HDFS – Hadoop Distributed Filesystem
• HDFS is a filesystem designed for storing very large files with
streaming data access patterns, running on clusters of
commodity hardware.
•HDFS has a permissions model for files and directories that is
much like POSIX.




© Copyright 2012 EMC Corporation. All rights reserved.              14
Services for HDFS
•Namenode - manages the filesystem namespace. It maintains the
filesystem tree and the metadata for all the files and directories in the
tree. This information is stored persistently on the local disk in the form
of two files: the namespace image and the edit log.
•Datanode- Workhorses of the filesystem. They store and retrieve
blocks when they are told to (by clients or the namenode), and they
report back to the namenode periodically with lists of blocks that they
are storing.
•Secondary Namenode - Its main role is to periodically merge the
namespace image with the edit log to prevent the edit log from
becoming too large. The secondary namenode usually runs on a
separate physical machine



© Copyright 2012 EMC Corporation. All rights reserved.                        15
Hadoop Eco-System Components
 Pig - A high-level data-flow language and execution framework for parallel computation
 Mahout - A Scalable machine learning and data mining library
 Hive - A data warehouse infrastructure that provides data summarization and ad hoc
  querying (SQL)
 Hbase - A scalable, distributed database that supports structured data storage for large
  tables
 R(RHIPE) – Combines Hadoop + R analytics language



                       R                         Pig           Mahou        Hive        HBase
                    (RHIPE)                                      t
                                                            Ecosystem

         C                                               MapReduce – Compute Layer
                                                         (Job Scheduling / Execution)
         o
          r                                              HDFS – Storage Layer
                                                         (Hadoop Distributed Filesystem)
         e

© Copyright 2012 EMC Corporation. All rights reserved.                                          16
Why Hadoop is Important
 Pragmatic approach to analytics on a very large scale
        – Opens up new ways of gaining insights and identifying
          opportunities for businesses

 Designed to address the rise of unstructured data
        – Enterprise data to grow by 650% over next 5 years

        – More than 80% of this growth will be unstructured data




© Copyright 2012 EMC Corporation. All rights reserved.             17
Evolution of the Hadoop Market




         Innovators/                            Early Majority   Late Majority         Laggards
        Early Adopters




              Hadoop Early Adopters                                     Hadoop Early Majority



© Copyright 2012 EMC Corporation. All rights reserved.                                            18
Evolution of the Hadoop Market
             HADOOP PROFILE (TO DATE)




                      Pioneers and academics
                      Application Architect
                      Visionary

                      Open source / community driven
                      Build-your-own server, application &
                      storage infrastructure
                      Commodity components

                      Web 2.0
                      Universities
                      Life Sciences




                   Hadoop Early Adopters                     Hadoop Early Majority



© Copyright 2012 EMC Corporation. All rights reserved.                               19
Evolution of the Hadoop Market
    HADOOP PROFILE (TO DATE)                                 HADOOP PROFILE (EMERGING)




                      Pioneers and academics                  IT Manager & CIO
                      Application Architect                   Data Scientist
                      Visionary                               Line-of-business

                      Open source / community driven          Commercial distribution
                      Build-your-own server, application &    Turnkey solution
                      storage infrastructure
                                                              End-to-End Data protection
                      Commodity components

                      Web 2.0                                 Fortune 1000
                      Universities                            Financial Services
                      Life Sciences                           Retail




              Hadoop Early Adopters                                    Hadoop Early Majority



© Copyright 2012 EMC Corporation. All rights reserved.                                         20
Technology Challenges
                    of Hadoop




© Copyright 2012 EMC Corporation. All rights reserved.   21
Hadoop Architecture
   1. Data is ingested into the Hadoop File System (HDFS)
   2. Computation occurs inside Hadoop (MapReduce)
   3. Results are exported from HDFS for use




    Hadoop Data Node                          Hadoop Data Node   Hadoop Data Node

  Ethernet                                                                          Hadoop
                                                                                    Name Node




    Hadoop Data Node                          Hadoop Data Node   Hadoop Data Node




© Copyright 2012 EMC Corporation. All rights reserved.                                          22
Writing Data into Hadoop




© Copyright 2012 EMC Corporation. All rights reserved.   23
Reading Data from HDFS




© Copyright 2012 EMC Corporation. All rights reserved.   24
Technology Challenges of Hadoop
              Dedicated Storage Infrastructure
                                                         Hadoop DAS Environment
    1             – One-off for Hadoop only                            Name node

              Single Point of Failure
    2             – Namenode

              Lacking Enterprise Data Protection
    3             – No Snapshots, replication, backup

              Poor Storage Efficiency
    4             – 3X mirroring

              Fixed Scalability
    5             – Rigid compute to storage ratio

              Manual Import/Export
    6             – No protocol support




© Copyright 2012 EMC Corporation. All rights reserved.                             25
Technology Challenges of Hadoop
              Dedicated Storage Infrastructure
                                                         Hadoop DAS Environment
    1             – One-off for Hadoop only                            Namenode
                                                                1x

              Single Point of Failure
    2             – Namenode
                                                                1x            1x
              Lacking Enterprise Data Protection
    3             – No Snapshots, replication, backup

                                                                2x            2x
              Poor Storage Efficiency
    4             – 3X mirroring

              Fixed Scalability                                 2x            3x
    5             – Rigid compute to storage ratio

              Manual Import/Export                              3x            3x
    6             – No protocol support




© Copyright 2012 EMC Corporation. All rights reserved.                             26
EMC Addresses the Hadoop Challenge
              Dedicated Storage Infrastructure               Scale-Out Storage Platform
    1             – One-off for Hadoop only
                                                         1     – Multiple applications & workflows

              Single Point of Failure                        No Single Point of Failure
    2             – Namenode
                                                         2     – Distributed Namenode

              Lacking Enterprise Data Protection             End-to-End Data Protection
    3                                                    3     – SnapshotIQ, SyncIQ, NDMP Backup
                  – No Snapshots, replication, backup

                                                             Industry-Leading Storage Efficiency
              Poor Storage Efficiency                    4
    4             – 3X mirroring
                                                               – >80% Storage Utilization

                                                             Independent Scalability
              Fixed Scalability                          5
    5             – Rigid compute to storage ratio
                                                               – Add compute & storage separately

                                                             Multi-Protocol
    6
              Manual Import/Export                       6     – Industry standard protocols
                  – No protocol support                        – NFS, CIFS, FTP, HTTP, HDFS




© Copyright 2012 EMC Corporation. All rights reserved.                                               27
The EMC Isilon Advantage for Hadoop
                                                             Scale-Out Storage Platform
                                                         1     – Multiple applications & workflows

                                                             No Single Point of Failure
                                                         2     – Distributed Namenode

                                                             End-to-End Data Protection
                                                         3     – SnapshotIQ, SyncIQ, NDMP Backup

                                                             Industry-Leading Storage Efficiency
                                                         4     – >80% Storage Utilization

                                                             Independent Scalability
                                                         5     – Add compute & storage separately

                                                             Multi-Protocol
                                                         6     – Industry standard protocols
                                                               – NFS, CIFS, FTP, HTTP, HDFS




© Copyright 2012 EMC Corporation. All rights reserved.                                               28
Writing into Hadoop with Isilon




•Isilon becomes the namenode as well as the data node
•Provides scalability and protection of the data.
•Hadoop cluster no longer has a single point of failure and no longer writes
multiple 64MB-128MB chunks of data to datanodes


© Copyright 2012 EMC Corporation. All rights reserved.                         29
Reading Hadoop Data with Isilon




Data is read off the cluster back to the compute nodes.
 The datanodes are now just compute nodes and are
independent of the data in the Hadoop cluster.
        –Benefits are that the Hadoop hardware can be upgraded without the need
        for migration of the data



© Copyright 2012 EMC Corporation. All rights reserved.                            30
Industry’s First and Only Scale-Out Storage
Solution with Native Hadoop Integration


                                                         Accelerating the Benefits of
                                                         Hadoop for the Enterprise

                                                         Reducing Risk

                                                         End-to-End Data Protection

                                                         Organizational
                                                         Knowledge/Experience
© Copyright 2012 EMC Corporation. All rights reserved.                                  31
EMC’s Enterprise Hadoop Solution
EMC Greenplum HD and EMC Isilon Scale-Out Storage


                                                          Apache Hadoop certified by
                                                           Greenplum
         Compute




                                                          Simple platform management and
                                                           control
                                                          Parallel analytics access with
                                                           Greenplum Database
         Storage




© Copyright 2012 EMC Corporation. All rights reserved.                                      32
Greenplum: Not Just About Technology
                                                 • Data Science teams will become the
                                                   driving force for success with big data
                                                   analytics
                                                 • Greenplum is committed to the future
                                                   of data science
                                                         – University data science program collaboration
                                                           with Stanford and UC Berkeley
                                                         – Community investment including the
                                                           Greenplum Analytic Workbench, Community
                                                           edition software, and Data Science Summits

                                                 • Greenplum built its own Data Science
                                                   practice
                                                         – Leading PhDs with analytic tools expertise



© Copyright 2012 EMC Corporation. All rights reserved.                                                     33
Hadoop in Action




© Copyright 2012 EMC Corporation. All rights reserved.   34
Customer Case Study
   Purdue University




                                                         Leading Big Ten university renowned
                                                         worldwide for its research and academic
                                                         excellence.

Background

Challenge

Solution



© Copyright 2012 EMC Corporation. All rights reserved.                                             35
Customer Case Study
   Purdue University




                                                         • Large Hadoop environment for
                                                           researchers in Statistics Department

                                                         • No central storage infrastructure, leading
                                                           to many different, disparate islands of
                                                           data without consistent protection or
Background                                                 performance

Challenge                                                • Small IT staff managing large amounts of
                                                           data and hundreds of data-intensive users
Solution



© Copyright 2012 EMC Corporation. All rights reserved.                                                  36
Customer Case Study
   Purdue University


                                                         • Deployed Isilon with HDFS, which
                                                           plugged seamlessly into their Hadoop
                                                           environment

                                                         • Created a single, shared storage resource
                                                           for data computing and analytics

                                                         • Delivered a highly reliable and flexible
                                                           storage infrastructure that protected data
Background                                                 from loss or corruption
Challenge                                                • Eliminated need to migrate data between
                                                           storage silos, delivering immediate
Solution                                                   accessibility and significantly higher
                                                           performance




© Copyright 2012 EMC Corporation. All rights reserved.                                                  37
Customer Case Study
   Purdue University




                                                         “We tested EMC Isilon with Hadoop in our
                                                         statistics department, which must often
                                                         analyze huge data sets. EMC Isilon's multi-
                                                         protocol capabilities provided fast and
                                                         reliable delivery of data to our statisticians,
                                                         demonstrating the potential to increase the
Background                                               time spent on actually doing the science,
                                                         while reducing management costs.”
Challenge
                                                         Alex Younts,
                                                         Purdue University
Solution



© Copyright 2012 EMC Corporation. All rights reserved.                                                     38
Customer Case Study
   Global Shipping & Transportation Co.




                                                         Leading Global Shipping and Transportation
                                                         company.

Background

Challenge

Solution



© Copyright 2012 EMC Corporation. All rights reserved.                                                39
Customer Case Study
   Global Shipping & Transportation Co.



                                                         • Large amounts of data in different
                                                           formats from various business units.
                                                           Focused on E-commerce self service site
                                                           with semi-structured (XML) and
                                                           unstructured log data

                                                         • Looking to optimize their current ways of
                                                           analyzing this data regardless of format.
Background
                                                         • They wanted to understand what devices
Challenge                                                  were accessing their self-service site in
                                                           order to measure usage patterns to
                                                           enhance user experience on their E-
Solution                                                   commerce site




© Copyright 2012 EMC Corporation. All rights reserved.                                                 40
Customer Case Study
   Global Shipping & Transportation Co.


                                                         • Using Isilon with HDFS as the central
                                                           storage for their Hadoop environment,
                                                           they eliminated any ETL steps as data
                                                           could simply be copied over standard
                                                           protocols

                                                         • Created a single, shared storage resource
                                                           for data analytics regardless of structured,
                                                           semi-structured or unstructured data
Background                                                 queries across their entire data set.
Challenge                                                • Delivered a highly reliable and flexible
                                                           storage infrastructure that enabled
Solution                                                   mechanisms such as backup and archive
                                                           to be part of their analytics workflow




© Copyright 2012 EMC Corporation. All rights reserved.                                                    41
Questions?




© Copyright 2012 EMC Corporation. All rights reserved.        42
Thank You!




© Copyright 2012 EMC Corporation. All rights reserved.        43
Provide Feedback & Win!


                                                          125 attendees will receive
                                                           $100 iTunes gift cards. To
                                                           enter the raffle, simply
                                                           complete:
                                                            – 5 sessions surveys
                                                            – The conference survey

                                                          Download the EMC World
                                                           Conference App to learn
                                                           more: emcworld.com/app



© Copyright 2012 EMC Corporation. All rights reserved.                                  44
© Copyright 2012 EMC Corporation. All rights reserved.   45
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for High Impact Business Insight

Mais conteúdo relacionado

Mais procurados

HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Optimizing Lustre and GPFS with DDN
Optimizing Lustre and GPFS with DDNOptimizing Lustre and GPFS with DDN
Optimizing Lustre and GPFS with DDNinside-BigData.com
 
Making Sense of Big data with Hadoop
Making Sense of Big data with HadoopMaking Sense of Big data with Hadoop
Making Sense of Big data with HadoopGwen (Chen) Shapira
 
NetApp - 10martie2011
NetApp - 10martie2011NetApp - 10martie2011
NetApp - 10martie2011Agora Group
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Hortonworks
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop ClustersPerformance Issues on Hadoop Clusters
Performance Issues on Hadoop ClustersXiao Qin
 
IoT Story: From Edge to HDP
IoT Story: From Edge to HDPIoT Story: From Edge to HDP
IoT Story: From Edge to HDPDataWorks Summit
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopHortonworks
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scaleinside-BigData.com
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-finalHaluk Ulubay
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopDataWorks Summit
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Gina Tragos
 
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureDataWorks Summit
 
S100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aS100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aTony Pearson
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardwareinside-BigData.com
 
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...inside-BigData.com
 

Mais procurados (20)

HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Optimizing Lustre and GPFS with DDN
Optimizing Lustre and GPFS with DDNOptimizing Lustre and GPFS with DDN
Optimizing Lustre and GPFS with DDN
 
Making Sense of Big data with Hadoop
Making Sense of Big data with HadoopMaking Sense of Big data with Hadoop
Making Sense of Big data with Hadoop
 
NetApp - 10martie2011
NetApp - 10martie2011NetApp - 10martie2011
NetApp - 10martie2011
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Performance Issues on Hadoop Clusters
Performance Issues on Hadoop ClustersPerformance Issues on Hadoop Clusters
Performance Issues on Hadoop Clusters
 
IoT Story: From Edge to HDP
IoT Story: From Edge to HDPIoT Story: From Edge to HDP
IoT Story: From Edge to HDP
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scale
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
 
DDN Product Update from SC13
DDN Product Update from SC13DDN Product Update from SC13
DDN Product Update from SC13
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with Hadoop
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Next Generation Data Protection Architecture
Next Generation Data Protection Architecture
 
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
 
S100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804aS100296 data-footprint-orlando-v1804a
S100296 data-footprint-orlando-v1804a
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
 
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
DDN GS7K - Easy-to-deploy, High Performance Scale-Out Parallel File System Ap...
 

Semelhante a Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for High Impact Business Insight

Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop InnoTech
 
Sujal and scott fina lb
Sujal and scott fina lbSujal and scott fina lb
Sujal and scott fina lbTina Jiang
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonKhazret Sapenov
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Recent developments in data analytics and big data
Recent developments in data analytics and big dataRecent developments in data analytics and big data
Recent developments in data analytics and big dataDez Blanchfield
 
Beyond the Internet: Seamless Global Communication
Beyond the Internet: Seamless Global CommunicationBeyond the Internet: Seamless Global Communication
Beyond the Internet: Seamless Global CommunicationJerry Fishenden
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Cloudera, Inc.
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationDoug Denton
 
Information Management in the Age of Big Data
Information Management in the Age of Big DataInformation Management in the Age of Big Data
Information Management in the Age of Big Databigdatasyd
 
Presentation dell - into the cloud with dell
Presentation   dell - into the cloud with dellPresentation   dell - into the cloud with dell
Presentation dell - into the cloud with dellxKinAnx
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrPranav Kulkarni
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxVaishnavGhadge1
 

Semelhante a Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for High Impact Business Insight (20)

Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
Sujal and scott fina lb
Sujal and scott fina lbSujal and scott fina lb
Sujal and scott fina lb
 
101 ab 1415-1445
101 ab 1415-1445101 ab 1415-1445
101 ab 1415-1445
 
101 ab 1415-1445
101 ab 1415-1445101 ab 1415-1445
101 ab 1415-1445
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawson
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Recent developments in data analytics and big data
Recent developments in data analytics and big dataRecent developments in data analytics and big data
Recent developments in data analytics and big data
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Beyond the Internet: Seamless Global Communication
Beyond the Internet: Seamless Global CommunicationBeyond the Internet: Seamless Global Communication
Beyond the Internet: Seamless Global Communication
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Information Management in the Age of Big Data
Information Management in the Age of Big DataInformation Management in the Age of Big Data
Information Management in the Age of Big Data
 
Presentation dell - into the cloud with dell
Presentation   dell - into the cloud with dellPresentation   dell - into the cloud with dell
Presentation dell - into the cloud with dell
 
Big_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptxBig_Data_ppt[1] (1).pptx
Big_Data_ppt[1] (1).pptx
 
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj BongirrIntroduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj Bongirr
 
big-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptxbig-data-8722-m8RQ3h1.pptx
big-data-8722-m8RQ3h1.pptx
 

Mais de EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

Mais de EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Último

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Último (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for High Impact Business Insight

  • 1. EMC Isilon Big Data Storage and Hadoop Analytics Jemish Patel © Copyright 2012 EMC Corporation. All rights reserved. 1
  • 2. Today’s Agenda • The Big Data Opportunity • Big Data Analytics with Hadoop • Technology Challenges of Hadoop • EMC’s Hadoop Solutions for the Enterprise • Q+A © Copyright 2012 EMC Corporation. All rights reserved. 2
  • 3. The Big Data Opportunity © Copyright 2012 EMC Corporation. All rights reserved. 3
  • 4. !!! !!! “Big Data Is Less About Size, And More About Freedom” ―Techcrunch !!! !!! !!! “Findings: ‘Big Data’ Is More Extreme Than Volume” “Big Data! It’s Real, It’s ― Gartner Real-time, and It’s Already Changing Your World” “Total data: ―IDC !!! ‘bigger’ than big data” !!! ― 451 Group !!! © Copyright 2012 EMC Corporation. All rights reserved. 4
  • 5. !!! !!! “Big Data Is Less About Size, And More About Freedom” ―Techcrunch THE ERA OF !!! !!! BIG DATA “Findings: ‘Big Data’ Is !!! More Extreme Than Volume” “Big Data! It’s Real, It’s ― Gartner Real-time, and It’s Already Changing Your IS HERE World” “Total data: ―IDC !!! !!! ‘bigger’ than big data” !!! ― 451 Group © Copyright 2012 EMC Corporation. All rights reserved. 5
  • 6. BIG DATA IS TRANSFORMING BUSINESS © Copyright 2012 EMC Corporation. All rights reserved. 6
  • 7. Big Data in Action • Healthcare – Leverage historical data to discover better treatments • Financial Services – Data-driven banking stress tests & risk analysis • Utilities – Machine-learning to predict service outages & prevent energy theft © Copyright 2012 EMC Corporation. All rights reserved. 7
  • 8. Hadoop & Big Data © Copyright 2012 EMC Corporation. All rights reserved. 8
  • 9. The Promise of Big Data Analytics  Leverage data assets to identify key trends and new business opportunities  Analyze new sources of information to gain competitive advantages  Take an agile approach to analytics that can adapt at the speed of business  Scale your storage and analysis platform to handle Big Data’s volume, velocity and variety © Copyright 2012 EMC Corporation. All rights reserved. 9
  • 10. The Emergence of Hadoop • Created 5-6 years ago by former Yahoo! Engineer, Doug Cutting • Software platform designed to analyze massive amounts of unstructured data • Two core components: – Hadoop Distributed File System (HDFS) (storage) – MapReduce (compute) • Now a top-level Apache project backed by large, open source development community © Copyright 2012 EMC Corporation. All rights reserved. 10
  • 11. MapReduce •"Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. •"Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve. © Copyright 2012 EMC Corporation. All rights reserved. 11
  • 12. MapReduce © Copyright 2012 EMC Corporation. All rights reserved. 12
  • 13. Services for MapReduce •JobTracker – A master node that manages job submissions, scheduling and reprocessing in case of job failures. Jobs consist of a mapper, a reducer and a list of inputs. •TaskTracker- Each slave node in the cluster runs a TaskTracker process. The JobTracker instructs the TaskTrackers to run and monitor a task. A task consists of a map or a reduce over a piece of data. © Copyright 2012 EMC Corporation. All rights reserved. 13
  • 14. HDFS – Hadoop Distributed Filesystem • HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware. •HDFS has a permissions model for files and directories that is much like POSIX. © Copyright 2012 EMC Corporation. All rights reserved. 14
  • 15. Services for HDFS •Namenode - manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree. This information is stored persistently on the local disk in the form of two files: the namespace image and the edit log. •Datanode- Workhorses of the filesystem. They store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing. •Secondary Namenode - Its main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large. The secondary namenode usually runs on a separate physical machine © Copyright 2012 EMC Corporation. All rights reserved. 15
  • 16. Hadoop Eco-System Components  Pig - A high-level data-flow language and execution framework for parallel computation  Mahout - A Scalable machine learning and data mining library  Hive - A data warehouse infrastructure that provides data summarization and ad hoc querying (SQL)  Hbase - A scalable, distributed database that supports structured data storage for large tables  R(RHIPE) – Combines Hadoop + R analytics language R Pig Mahou Hive HBase (RHIPE) t Ecosystem C MapReduce – Compute Layer (Job Scheduling / Execution) o r HDFS – Storage Layer (Hadoop Distributed Filesystem) e © Copyright 2012 EMC Corporation. All rights reserved. 16
  • 17. Why Hadoop is Important  Pragmatic approach to analytics on a very large scale – Opens up new ways of gaining insights and identifying opportunities for businesses  Designed to address the rise of unstructured data – Enterprise data to grow by 650% over next 5 years – More than 80% of this growth will be unstructured data © Copyright 2012 EMC Corporation. All rights reserved. 17
  • 18. Evolution of the Hadoop Market Innovators/ Early Majority Late Majority Laggards Early Adopters Hadoop Early Adopters Hadoop Early Majority © Copyright 2012 EMC Corporation. All rights reserved. 18
  • 19. Evolution of the Hadoop Market HADOOP PROFILE (TO DATE) Pioneers and academics Application Architect Visionary Open source / community driven Build-your-own server, application & storage infrastructure Commodity components Web 2.0 Universities Life Sciences Hadoop Early Adopters Hadoop Early Majority © Copyright 2012 EMC Corporation. All rights reserved. 19
  • 20. Evolution of the Hadoop Market HADOOP PROFILE (TO DATE) HADOOP PROFILE (EMERGING) Pioneers and academics IT Manager & CIO Application Architect Data Scientist Visionary Line-of-business Open source / community driven Commercial distribution Build-your-own server, application & Turnkey solution storage infrastructure End-to-End Data protection Commodity components Web 2.0 Fortune 1000 Universities Financial Services Life Sciences Retail Hadoop Early Adopters Hadoop Early Majority © Copyright 2012 EMC Corporation. All rights reserved. 20
  • 21. Technology Challenges of Hadoop © Copyright 2012 EMC Corporation. All rights reserved. 21
  • 22. Hadoop Architecture 1. Data is ingested into the Hadoop File System (HDFS) 2. Computation occurs inside Hadoop (MapReduce) 3. Results are exported from HDFS for use Hadoop Data Node Hadoop Data Node Hadoop Data Node Ethernet Hadoop Name Node Hadoop Data Node Hadoop Data Node Hadoop Data Node © Copyright 2012 EMC Corporation. All rights reserved. 22
  • 23. Writing Data into Hadoop © Copyright 2012 EMC Corporation. All rights reserved. 23
  • 24. Reading Data from HDFS © Copyright 2012 EMC Corporation. All rights reserved. 24
  • 25. Technology Challenges of Hadoop Dedicated Storage Infrastructure Hadoop DAS Environment 1 – One-off for Hadoop only Name node Single Point of Failure 2 – Namenode Lacking Enterprise Data Protection 3 – No Snapshots, replication, backup Poor Storage Efficiency 4 – 3X mirroring Fixed Scalability 5 – Rigid compute to storage ratio Manual Import/Export 6 – No protocol support © Copyright 2012 EMC Corporation. All rights reserved. 25
  • 26. Technology Challenges of Hadoop Dedicated Storage Infrastructure Hadoop DAS Environment 1 – One-off for Hadoop only Namenode 1x Single Point of Failure 2 – Namenode 1x 1x Lacking Enterprise Data Protection 3 – No Snapshots, replication, backup 2x 2x Poor Storage Efficiency 4 – 3X mirroring Fixed Scalability 2x 3x 5 – Rigid compute to storage ratio Manual Import/Export 3x 3x 6 – No protocol support © Copyright 2012 EMC Corporation. All rights reserved. 26
  • 27. EMC Addresses the Hadoop Challenge Dedicated Storage Infrastructure Scale-Out Storage Platform 1 – One-off for Hadoop only 1 – Multiple applications & workflows Single Point of Failure No Single Point of Failure 2 – Namenode 2 – Distributed Namenode Lacking Enterprise Data Protection End-to-End Data Protection 3 3 – SnapshotIQ, SyncIQ, NDMP Backup – No Snapshots, replication, backup Industry-Leading Storage Efficiency Poor Storage Efficiency 4 4 – 3X mirroring – >80% Storage Utilization Independent Scalability Fixed Scalability 5 5 – Rigid compute to storage ratio – Add compute & storage separately Multi-Protocol 6 Manual Import/Export 6 – Industry standard protocols – No protocol support – NFS, CIFS, FTP, HTTP, HDFS © Copyright 2012 EMC Corporation. All rights reserved. 27
  • 28. The EMC Isilon Advantage for Hadoop Scale-Out Storage Platform 1 – Multiple applications & workflows No Single Point of Failure 2 – Distributed Namenode End-to-End Data Protection 3 – SnapshotIQ, SyncIQ, NDMP Backup Industry-Leading Storage Efficiency 4 – >80% Storage Utilization Independent Scalability 5 – Add compute & storage separately Multi-Protocol 6 – Industry standard protocols – NFS, CIFS, FTP, HTTP, HDFS © Copyright 2012 EMC Corporation. All rights reserved. 28
  • 29. Writing into Hadoop with Isilon •Isilon becomes the namenode as well as the data node •Provides scalability and protection of the data. •Hadoop cluster no longer has a single point of failure and no longer writes multiple 64MB-128MB chunks of data to datanodes © Copyright 2012 EMC Corporation. All rights reserved. 29
  • 30. Reading Hadoop Data with Isilon Data is read off the cluster back to the compute nodes.  The datanodes are now just compute nodes and are independent of the data in the Hadoop cluster. –Benefits are that the Hadoop hardware can be upgraded without the need for migration of the data © Copyright 2012 EMC Corporation. All rights reserved. 30
  • 31. Industry’s First and Only Scale-Out Storage Solution with Native Hadoop Integration Accelerating the Benefits of Hadoop for the Enterprise Reducing Risk End-to-End Data Protection Organizational Knowledge/Experience © Copyright 2012 EMC Corporation. All rights reserved. 31
  • 32. EMC’s Enterprise Hadoop Solution EMC Greenplum HD and EMC Isilon Scale-Out Storage  Apache Hadoop certified by Greenplum Compute  Simple platform management and control  Parallel analytics access with Greenplum Database Storage © Copyright 2012 EMC Corporation. All rights reserved. 32
  • 33. Greenplum: Not Just About Technology • Data Science teams will become the driving force for success with big data analytics • Greenplum is committed to the future of data science – University data science program collaboration with Stanford and UC Berkeley – Community investment including the Greenplum Analytic Workbench, Community edition software, and Data Science Summits • Greenplum built its own Data Science practice – Leading PhDs with analytic tools expertise © Copyright 2012 EMC Corporation. All rights reserved. 33
  • 34. Hadoop in Action © Copyright 2012 EMC Corporation. All rights reserved. 34
  • 35. Customer Case Study Purdue University Leading Big Ten university renowned worldwide for its research and academic excellence. Background Challenge Solution © Copyright 2012 EMC Corporation. All rights reserved. 35
  • 36. Customer Case Study Purdue University • Large Hadoop environment for researchers in Statistics Department • No central storage infrastructure, leading to many different, disparate islands of data without consistent protection or Background performance Challenge • Small IT staff managing large amounts of data and hundreds of data-intensive users Solution © Copyright 2012 EMC Corporation. All rights reserved. 36
  • 37. Customer Case Study Purdue University • Deployed Isilon with HDFS, which plugged seamlessly into their Hadoop environment • Created a single, shared storage resource for data computing and analytics • Delivered a highly reliable and flexible storage infrastructure that protected data Background from loss or corruption Challenge • Eliminated need to migrate data between storage silos, delivering immediate Solution accessibility and significantly higher performance © Copyright 2012 EMC Corporation. All rights reserved. 37
  • 38. Customer Case Study Purdue University “We tested EMC Isilon with Hadoop in our statistics department, which must often analyze huge data sets. EMC Isilon's multi- protocol capabilities provided fast and reliable delivery of data to our statisticians, demonstrating the potential to increase the Background time spent on actually doing the science, while reducing management costs.” Challenge Alex Younts, Purdue University Solution © Copyright 2012 EMC Corporation. All rights reserved. 38
  • 39. Customer Case Study Global Shipping & Transportation Co. Leading Global Shipping and Transportation company. Background Challenge Solution © Copyright 2012 EMC Corporation. All rights reserved. 39
  • 40. Customer Case Study Global Shipping & Transportation Co. • Large amounts of data in different formats from various business units. Focused on E-commerce self service site with semi-structured (XML) and unstructured log data • Looking to optimize their current ways of analyzing this data regardless of format. Background • They wanted to understand what devices Challenge were accessing their self-service site in order to measure usage patterns to enhance user experience on their E- Solution commerce site © Copyright 2012 EMC Corporation. All rights reserved. 40
  • 41. Customer Case Study Global Shipping & Transportation Co. • Using Isilon with HDFS as the central storage for their Hadoop environment, they eliminated any ETL steps as data could simply be copied over standard protocols • Created a single, shared storage resource for data analytics regardless of structured, semi-structured or unstructured data Background queries across their entire data set. Challenge • Delivered a highly reliable and flexible storage infrastructure that enabled Solution mechanisms such as backup and archive to be part of their analytics workflow © Copyright 2012 EMC Corporation. All rights reserved. 41
  • 42. Questions? © Copyright 2012 EMC Corporation. All rights reserved. 42
  • 43. Thank You! © Copyright 2012 EMC Corporation. All rights reserved. 43
  • 44. Provide Feedback & Win!  125 attendees will receive $100 iTunes gift cards. To enter the raffle, simply complete: – 5 sessions surveys – The conference survey  Download the EMC World Conference App to learn more: emcworld.com/app © Copyright 2012 EMC Corporation. All rights reserved. 44
  • 45. © Copyright 2012 EMC Corporation. All rights reserved. 45