SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Modern Data Architecture
…for Non-Stop Hadoop

© Hortonworks Inc. 2013

Page 1
Your Presenters
• Jagane Sundar (@jagane)
– CTO of Big Data at WANdisco
–  Co-founder of AltoStor and former Director of
Engineering in Yahoo’s Hadoop group
–  Managed Hadoop 0.20.204 release for Yahoo

• Rohit Bakhshi (@Rohit2b)
– Product Management at Hortonworks
–  Focus on HDP Platform Services, Hadoop
Core and Windows enablement
–  Enjoy live jazz and expresso

© Hortonworks Inc. 2013

Page 2
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop in the MDA
• WANdisco’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 3
APPLICATIONS	
  

Existing Data Architecture
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  
DEV	
  &	
  DATA	
  
TOOLS	
  

SOURCES	
  

DATA	
  	
  SYSTEM	
  

BUILD	
  &	
  
TEST	
  

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MPP	
  

MANAGE	
  &	
  
MONITOR	
  

REPOSITORIES	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013

Page 4
APPLICATIONS	
  

Existing Data Architecture
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

DATA	
  	
  SYSTEM	
  

2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
RDBMS	
  

EDW	
  

MPP	
  

REPOSITORIES	
  

15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  

SOURCES	
  

Source: IDC

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013

Page 5
APPLICATIONS	
  

Modern Data Architecture Enabled
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  
DEV	
  &	
  DATA	
  
TOOLS	
  

SOURCES	
  

DATA	
  	
  SYSTEM	
  

BUILD	
  &	
  
TEST	
  

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MANAGE	
  &	
  
MONITOR	
  

MPP	
  

REPOSITORIES	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 6
Drivers of Hadoop Adoption
Architectural
A Modern Data
Architecture

New Business
Applications

Complement your existing data
systems: the right workload in the
right place

Types of Big Data
•  CRM, ERP
•  Server log
•  Clickstream

•  Sentiment/Social
•  Machine/Sensor
•  Geo-locations

© Hortonworks Inc. 2013 - Confidential

Page 7
Opportunity in types of data
1.  Sentiment
Understand how your customers feel about your brand and
products – right now

2.  Clickstream
Capture and analyze website visitors’ data trails and
optimize your website

3.  Sensor/Machine
Discover patterns in data streaming automatically from
remote sensors and machines

4.  Geographic

Value

Analyze location-based data to manage operations where
they occur

5.  Server Logs
Research logs to diagnose process failures and prevent
security breaches

6.  Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages,
emails, and documents

© Hortonworks Inc. 2013 - Confidential

Page 8
3

Requirements for Hadoop Adoption
Requirements for Hadoop’s Role
in the Modern Data Architecture

Integrated

Interoperable with
existing data center
investments

Key Services
Skills

Platform, operational and
data services essential for
the enterprise

Leverage your existing
skills: development,
operations, analytics

© Hortonworks Inc. 2013 - Confidential

Page 9
Requirements for Enterprise Hadoop

1
2
3

Key Services
Platform, Operational and
Data services essential
for the enterprise

OPERATIONAL	
  
SERVICES	
  
AMBARI	
  

HBASE	
  

CORE	
  

PIG	
  

SQOOP	
  
LOAD	
  &	
  	
  
EXTRACT	
  

Skills

	
  
	
  

PLATFORM	
  	
  
SERVICES	
  

Integrated

MAP	
  	
  
REDUCE	
  
	
  

NFS	
  

TEZ	
  

YARN	
  	
  	
  

WebHDFS	
  

KNOX*	
  

HIVE	
  &	
  

HCATALOG	
  

HDFS	
  
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots

HORTONWORKS	
  	
  
DATA	
  PLATFORM	
  (HDP)	
  

Engineered with existing
data center investments
OS/VM	
  

© Hortonworks Inc. 2013 - Confidential

FLUME	
  

FALCON*	
  
OOZIE	
  

Leverage your existing
skills: development,
analytics, operations

DATA	
  
SERVICES	
  

Cloud	
  

Appliance	
  
Page 10
Requirements for Enterprise Hadoop

3

Leverage your existing
skills: development,
analytics, operations

Integration

DEVELOP	
  
ANALYZE	
  

2

Skills

Platform, operational and
data services essential
for the enterprise

OPERATE	
  

1

Key Services
COLLECT	
  

PROCESS	
  

BUILD	
  

EXPLORE	
  

QUERY	
  

DELIVER	
  

PROVISION	
  

MANAGE	
  

MONITOR	
  

Engineered with existing
data center investments

© Hortonworks Inc. 2013 - Confidential

Page 11
Familiar and Existing Tools

3

Leverage your existing
skills: development,
analytics, operations

Integration

DEVELOP	
  
ANALYZE	
  

2

Skills

Platform, operational and
data services essential
for the enterprise

OPERATE	
  

1

Key Services
COLLECT	
  

PROCESS	
  

BUILD	
  

EXPLORE	
  

QUERY	
  

DELIVER	
  

BusinessObjects BI

PROVISION	
  

MANAGE	
  

MONITOR	
  

Interoperable with existing
data center investments

© Hortonworks Inc. 2013 - Confidential

Page 12
APPLICATIONS	
  

Requirements for Enterprise Hadoop
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

Integrated with
DEV	
  &	
  DATA	
  
TOOLS	
  

Applications
BUILD	
  &	
  

DATA	
  	
  SYSTEM	
  

Business Intelligence,
TEST	
  
Developer IDEs,
Data Integration

SOURCES	
  

3

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MANAGE	
  &	
  
Systems
MONITOR	
  

MPP	
  

Data Systems & Storage,
Systems Management

REPOSITORIES	
  

Platforms

Integration 	
  
Exis4ng	
  Sources	
  

Engineered with Lexisting
(CRM,	
  ERP,	
  Clickstream,	
   ogs)	
  
data center investments

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Operating Systems,
Virtualization, Cloud,
Appliances

Page 13
DATA	
  SYSTEM	
  

APPLICATIONS	
  

WANdisco in the Modern Data Architecture
BusinessObjects BI

DEV	
  &	
  DATA	
  TOOLS	
  

OPERATIONAL	
  TOOLS	
  
RDBMS	
  

EDW	
  

HANA

MPP	
  

SOURCES	
  

INFRASTRUCTURE	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 14
Non-Stop Hadoop for Hortonworks
•  Non-stop technology delivers continuous uptime
with no data loss
•  One Hadoop cluster across data centers any
distance
•  Eliminates the bottleneck of a single active
NameNode
•  Automatic backup, failover and recovery within
across data centers
•  LAN-speed read and write

© Hortonworks Inc. 2013 - Confidential

Page 15
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s role in the MDA
• WANdisco’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 16
WANdisco Background
u 

WANdisco: Wide Area Network Distributed Computing
–  Enterprise ready, high availability software solutions that enable globally distributed
organizations to meet today’s data challenges of secure storage, scalability and availability

u 

Leader in tools for software engineers – Subversion
–  Apache Software Foundation sponsor

u 

Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND)

u 

US patented active-active replication technology granted, November 2012

u 

Global locations
–  San Ramon (CA)
–  Chengdu (China)
–  Tokyo (Japan)
–  Boston (MA)
–  Sheffield (UK)
–  Belfast (UK)

© WANdisco 2013

/ page 17
Customers

© WANdisco 2013
WANdisco
u 

Overarching theme - We’re enabling global protection against:
•  Data loss
•  Downtime
•  Loss of Intellectual Property
•  Loss of revenue/time to market
•  Falling behind the competition

© WANdisco 2013
Non-Stop Hadoop
Extending HDFS across Data Centers
u 

Single HDFS that spans multiple Data
Centers across the world

u 

Provides 100% Uptime for Hadoop

u 

Built as an extension on top of
Apache Hadoop HDFS

u 

100 % HDFS / 100% compatibility
with Hadoop applications –
Applications run unmodified

u 

Applications can run in any Data
Center

u 

Not Simple Mirroring or a Copy

© WANdisco 2013
WANdisco DConE
Distributed Coordination Engine
u 

WANdisco’s patented WAN capable Paxos implementation
–  Mathematically proven
–  Provides distributed co-ordination of File system metadata
• 
• 

Create, Modify, Delete

• 

u 

Active-Active (All locations)
Share nothing (No Leader)

No restrictions on distance between data centers
–  US Patent granted for time independent implementation of Paxos

u 

Not based on SAN block device synchronization such as EMC SRDF
– 

SAN block replication has distance limits resulting from the inability of file systems such as
NTFS and ext4 to tolerate long RTTs to block storage

– 

Possible distribution of corrupted blocks

© WANdisco 2013
Apache Hadoop

© WANdisco 2013

/ page 22
Apache Hadoop

© WANdisco 2013

/ page 23
Apache Hadoop

© WANdisco 2013

/ page 24
Apache Hadoop

© WANdisco 2013

/ page 25
Non-Stop Hadoop over WAN
Continuous availability

© WANdisco 2013

/ page 26
Non-Stop Hadoop over WAN
Continuous availability

© WANdisco 2013

/ page 27
Non-Stop Hadoop over WAN
Continuous availability

© WANdisco 2013

/ page 28
Non-Stop Hadoop over WAN
Continuous availability

© WANdisco 2013

/ page 29
Non-Stop Hadoop over WAN
Unlimited performance and scalability

© WANdisco 2013

/ page 30
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 31
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 32
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 33
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 34
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 35
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 36
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 37
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 38
Non-Stop Hadoop
u  Architecture
–  Non-Intrusive - Not Simple Mirroring or a Copy
–  Does not modify Apache Hadoop
–  Runs on HDP 2 and later

u  Provides

100% Uptime for Hadoop

–  Provides Continuous Availability of HDFS Data
–  Guarantees 100% Uptime of HDFS During all 4 Categories of Failures
u 

Enables HDFS to be Deployed Globally – Across the WAN
–  Extends HDFS Across Multiple Data Centers
–  Unifies the HDFS Namespace
–  Exceeds Business Continuity Requirements for SLAs and Compliance

u 

Load Balances NameNode Traffic for Increased Scalability

© WANdisco 2013
DEMO

DEMO

© WANdisco 2013

/ page 40
Use Cases for Non-Stop Hadoop with
Hortonworks
u 

Disaster Recovery
–  Data is as current as possible (no periodic synchronizations)
–  Virtually zero downtime to recover from regional data center failure
–  Regulatory compliance

u 

Load Balancing

u 

Multi Data Center Ingest
–  Information doesn’t need to be sent to one DC and then copied back to the other using DistCP
–  Parallel ingest methods don’t require redirected data streams

u 

Global MapReduce
–  Global Click Stream Analysis
–  Global Log Analysis
–  Etc.

u 

Maximize Resource Utilization
–  All data centers can be used to run different jobs concurrently

© WANdisco 2013

/ page 41
Key Takeaways
Non-Stop Hadoop for Hortonworks
u 

Non-Stop Hadoop make Hadoop Enterprise/Production Ready

u 

Load balancing eliminates the bottleneck of a single NameNode

u 

Active-Active replication solves the Hadoop high availability issue

u 

No job restarts or lost time for NameNode failures (Continuous Availability)

u 

Single HDFS across multiple data centers
–  No out of sync issues
–  No Load Balancer maintenance problems

u 

Data Centers can be located at any distance from each other

u 

If any Data Center fails, applications can be run on any other replicated Data
Center

u 

If a Data Center is completely lost, any other replica of that Data Center can be
used to restore it

© WANdisco 2013

/ page 42
Next Steps:
More about Non-Stop Hadoop for Hortonworks
http://www.wandisco.com/hadoop/non-stop-hadoophortonworks

Get started on Hadoop with Hortonworks
Sandbox
http://hortonworks.com/hadoop-tutorial/

Try Non-Stop Hadoop for Hortonworks
Contact us: WANdisco@hortonworks.com

© Hortonworks Inc. 2013

Page 43

Mais conteúdo relacionado

Mais procurados

Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudLeons Petražickis
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldDataWorks Summit
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprisesmarkgrover
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Cedric CARBONE
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionDataWorks Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 

Mais procurados (20)

Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 

Destaque

Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Disaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesDisaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesSpiceworks
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recoverySandeep Singh
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopDataWorks Summit
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IThhuihhui
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)Bopyo Hong
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityCloudera, Inc.
 
Integrating Docker with Mesos and Marathon
Integrating Docker with Mesos and MarathonIntegrating Docker with Mesos and Marathon
Integrating Docker with Mesos and MarathonRishabh Chaudhary
 
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & MergingWANdisco Plc
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 Chris Almond
 

Destaque (20)

Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Disaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesDisaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup Strategies
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Disaster Recovery Plan for IT
Disaster Recovery Plan for ITDisaster Recovery Plan for IT
Disaster Recovery Plan for IT
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Biokimia
BiokimiaBiokimia
Biokimia
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and VisibilityWhat the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
 
Integrating Docker with Mesos and Marathon
Integrating Docker with Mesos and MarathonIntegrating Docker with Mesos and Marathon
Integrating Docker with Mesos and Marathon
 
Distcp
DistcpDistcp
Distcp
 
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014 WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
 
HBase Snapshots
HBase SnapshotsHBase Snapshots
HBase Snapshots
 

Semelhante a Modern Data Architecture for Non-Stop Hadoop

Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIKognitio
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Michael Hiskey
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsKognitio
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack EuropeHortonworks
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 

Semelhante a Modern Data Architecture for Non-Stop Hadoop (20)

Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BI
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analyticsWeb Briefing: Unlock the power of Hadoop to enable interactive analytics
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 

Mais de Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mais de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Último (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Modern Data Architecture for Non-Stop Hadoop

  • 1. Modern Data Architecture …for Non-Stop Hadoop © Hortonworks Inc. 2013 Page 1
  • 2. Your Presenters • Jagane Sundar (@jagane) – CTO of Big Data at WANdisco –  Co-founder of AltoStor and former Director of Engineering in Yahoo’s Hadoop group –  Managed Hadoop 0.20.204 release for Yahoo • Rohit Bakhshi (@Rohit2b) – Product Management at Hortonworks –  Focus on HDP Platform Services, Hadoop Core and Windows enablement –  Enjoy live jazz and expresso © Hortonworks Inc. 2013 Page 2
  • 3. Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • WANdisco’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 3
  • 4. APPLICATIONS   Existing Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MPP   MANAGE  &   MONITOR   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Page 4
  • 5. APPLICATIONS   Existing Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DATA    SYSTEM   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Page 5
  • 6. APPLICATIONS   Modern Data Architecture Enabled Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   MONITOR   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 6
  • 7. Drivers of Hadoop Adoption Architectural A Modern Data Architecture New Business Applications Complement your existing data systems: the right workload in the right place Types of Big Data •  CRM, ERP •  Server log •  Clickstream •  Sentiment/Social •  Machine/Sensor •  Geo-locations © Hortonworks Inc. 2013 - Confidential Page 7
  • 8. Opportunity in types of data 1.  Sentiment Understand how your customers feel about your brand and products – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4.  Geographic Value Analyze location-based data to manage operations where they occur 5.  Server Logs Research logs to diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2013 - Confidential Page 8
  • 9. 3 Requirements for Hadoop Adoption Requirements for Hadoop’s Role in the Modern Data Architecture Integrated Interoperable with existing data center investments Key Services Skills Platform, operational and data services essential for the enterprise Leverage your existing skills: development, operations, analytics © Hortonworks Inc. 2013 - Confidential Page 9
  • 10. Requirements for Enterprise Hadoop 1 2 3 Key Services Platform, Operational and Data services essential for the enterprise OPERATIONAL   SERVICES   AMBARI   HBASE   CORE   PIG   SQOOP   LOAD  &     EXTRACT   Skills     PLATFORM     SERVICES   Integrated MAP     REDUCE     NFS   TEZ   YARN       WebHDFS   KNOX*   HIVE  &   HCATALOG   HDFS   Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS     DATA  PLATFORM  (HDP)   Engineered with existing data center investments OS/VM   © Hortonworks Inc. 2013 - Confidential FLUME   FALCON*   OOZIE   Leverage your existing skills: development, analytics, operations DATA   SERVICES   Cloud   Appliance   Page 10
  • 11. Requirements for Enterprise Hadoop 3 Leverage your existing skills: development, analytics, operations Integration DEVELOP   ANALYZE   2 Skills Platform, operational and data services essential for the enterprise OPERATE   1 Key Services COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   PROVISION   MANAGE   MONITOR   Engineered with existing data center investments © Hortonworks Inc. 2013 - Confidential Page 11
  • 12. Familiar and Existing Tools 3 Leverage your existing skills: development, analytics, operations Integration DEVELOP   ANALYZE   2 Skills Platform, operational and data services essential for the enterprise OPERATE   1 Key Services COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   BusinessObjects BI PROVISION   MANAGE   MONITOR   Interoperable with existing data center investments © Hortonworks Inc. 2013 - Confidential Page 12
  • 13. APPLICATIONS   Requirements for Enterprise Hadoop Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   Integrated with DEV  &  DATA   TOOLS   Applications BUILD  &   DATA    SYSTEM   Business Intelligence, TEST   Developer IDEs, Data Integration SOURCES   3 OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   Systems MONITOR   MPP   Data Systems & Storage, Systems Management REPOSITORIES   Platforms Integration   Exis4ng  Sources   Engineered with Lexisting (CRM,  ERP,  Clickstream,   ogs)   data center investments © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Operating Systems, Virtualization, Cloud, Appliances Page 13
  • 14. DATA  SYSTEM   APPLICATIONS   WANdisco in the Modern Data Architecture BusinessObjects BI DEV  &  DATA  TOOLS   OPERATIONAL  TOOLS   RDBMS   EDW   HANA MPP   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 14
  • 15. Non-Stop Hadoop for Hortonworks •  Non-stop technology delivers continuous uptime with no data loss •  One Hadoop cluster across data centers any distance •  Eliminates the bottleneck of a single active NameNode •  Automatic backup, failover and recovery within across data centers •  LAN-speed read and write © Hortonworks Inc. 2013 - Confidential Page 15
  • 16. Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • WANdisco’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 16
  • 17. WANdisco Background u  WANdisco: Wide Area Network Distributed Computing –  Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability u  Leader in tools for software engineers – Subversion –  Apache Software Foundation sponsor u  Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) u  US patented active-active replication technology granted, November 2012 u  Global locations –  San Ramon (CA) –  Chengdu (China) –  Tokyo (Japan) –  Boston (MA) –  Sheffield (UK) –  Belfast (UK) © WANdisco 2013 / page 17
  • 19. WANdisco u  Overarching theme - We’re enabling global protection against: •  Data loss •  Downtime •  Loss of Intellectual Property •  Loss of revenue/time to market •  Falling behind the competition © WANdisco 2013
  • 20. Non-Stop Hadoop Extending HDFS across Data Centers u  Single HDFS that spans multiple Data Centers across the world u  Provides 100% Uptime for Hadoop u  Built as an extension on top of Apache Hadoop HDFS u  100 % HDFS / 100% compatibility with Hadoop applications – Applications run unmodified u  Applications can run in any Data Center u  Not Simple Mirroring or a Copy © WANdisco 2013
  • 21. WANdisco DConE Distributed Coordination Engine u  WANdisco’s patented WAN capable Paxos implementation –  Mathematically proven –  Provides distributed co-ordination of File system metadata •  •  Create, Modify, Delete •  u  Active-Active (All locations) Share nothing (No Leader) No restrictions on distance between data centers –  US Patent granted for time independent implementation of Paxos u  Not based on SAN block device synchronization such as EMC SRDF –  SAN block replication has distance limits resulting from the inability of file systems such as NTFS and ext4 to tolerate long RTTs to block storage –  Possible distribution of corrupted blocks © WANdisco 2013
  • 22. Apache Hadoop © WANdisco 2013 / page 22
  • 23. Apache Hadoop © WANdisco 2013 / page 23
  • 24. Apache Hadoop © WANdisco 2013 / page 24
  • 25. Apache Hadoop © WANdisco 2013 / page 25
  • 26. Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 26
  • 27. Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 27
  • 28. Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 28
  • 29. Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 29
  • 30. Non-Stop Hadoop over WAN Unlimited performance and scalability © WANdisco 2013 / page 30
  • 31. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 31
  • 32. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 32
  • 33. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 33
  • 34. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 34
  • 35. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 35
  • 36. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 36
  • 37. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 37
  • 38. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 38
  • 39. Non-Stop Hadoop u  Architecture –  Non-Intrusive - Not Simple Mirroring or a Copy –  Does not modify Apache Hadoop –  Runs on HDP 2 and later u  Provides 100% Uptime for Hadoop –  Provides Continuous Availability of HDFS Data –  Guarantees 100% Uptime of HDFS During all 4 Categories of Failures u  Enables HDFS to be Deployed Globally – Across the WAN –  Extends HDFS Across Multiple Data Centers –  Unifies the HDFS Namespace –  Exceeds Business Continuity Requirements for SLAs and Compliance u  Load Balances NameNode Traffic for Increased Scalability © WANdisco 2013
  • 41. Use Cases for Non-Stop Hadoop with Hortonworks u  Disaster Recovery –  Data is as current as possible (no periodic synchronizations) –  Virtually zero downtime to recover from regional data center failure –  Regulatory compliance u  Load Balancing u  Multi Data Center Ingest –  Information doesn’t need to be sent to one DC and then copied back to the other using DistCP –  Parallel ingest methods don’t require redirected data streams u  Global MapReduce –  Global Click Stream Analysis –  Global Log Analysis –  Etc. u  Maximize Resource Utilization –  All data centers can be used to run different jobs concurrently © WANdisco 2013 / page 41
  • 42. Key Takeaways Non-Stop Hadoop for Hortonworks u  Non-Stop Hadoop make Hadoop Enterprise/Production Ready u  Load balancing eliminates the bottleneck of a single NameNode u  Active-Active replication solves the Hadoop high availability issue u  No job restarts or lost time for NameNode failures (Continuous Availability) u  Single HDFS across multiple data centers –  No out of sync issues –  No Load Balancer maintenance problems u  Data Centers can be located at any distance from each other u  If any Data Center fails, applications can be run on any other replicated Data Center u  If a Data Center is completely lost, any other replica of that Data Center can be used to restore it © WANdisco 2013 / page 42
  • 43. Next Steps: More about Non-Stop Hadoop for Hortonworks http://www.wandisco.com/hadoop/non-stop-hadoophortonworks Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/hadoop-tutorial/ Try Non-Stop Hadoop for Hortonworks Contact us: WANdisco@hortonworks.com © Hortonworks Inc. 2013 Page 43