SlideShare a Scribd company logo
1 of 23
Making the Case for Hadoop
in a Large Enterprise
15th
April 2015
Making the Case for Hadoop in a Large Enterprise
Alan Spanos
Data Exploitation Manager
Jay Aubby
Architect
Making the Case for Hadoop in a Large Enterprise 4
Contents
1. How to Approach a Business Case for Hadoop
2. BA as a Case Study
3. Technical Implementation – HDP 2.2
Making the Case for Hadoop in a Large Enterprise 5
1. How to Approach a Business
Case for Hadoop
Making the Case for Hadoop in a Large Enterprise 6
Things to Consider
When running a project to deploy Hadoop Enterprise Wide,
consider the following areas
Business
Strategy
Enterprise
Hadoop
Additional
Complexity
Current BI
Landscape
Key
Stakeholders
User Base
Financial
Processes
Making the Case for Hadoop in a Large Enterprise 7
Selecting your ‘Anchor’ Use Case
Choosing a use case which has continual and undeniable value
is key. This keys the focus on solving the technical challenges.
• Suitable use cases include:
 Data archiving of ‘cold’ data on EDW
 Move sandboxes from EDW to Hadoop
 Move ETL / ELT layer to Hadoop from RDBMS
• Doing something new requires a ‘leap of faith’, which is
something many finance teams are unwilling to make
• Simpler projects will help pay for platform set up, which
can, in future, enable more speculative projects
Making the Case for Hadoop in a Large Enterprise 8
Architecture vs Deployment
Hadoop’s modular architecture enables it to quickly and easily
accommodate future use cases
Making the Case for Hadoop in a Large Enterprise 9
Reuse not Reinvention
Reusing existing staff, processes and concepts, is key to
minimising the cost of implementing hadoop
Function EDW Hadoop
Data Governance EDW Data Governance EDW Data Governance
Administrations EDW DBA EDW DBA
Hardware Support IT Operations IT Operations
Platform Support EDW Support EDW Support
Security EDW Security EDW Security
Development EDW Dev EDW Dev
Users BI / Analysts BI / Analysts
Making the Case for Hadoop in a Large Enterprise
IT, Information and Data
Governance, Processes,
Security and Operations
IT and Data Innovation
Cost
Architectural Fit
ROI TCO
Return
Getting the Balance Right is Key
Proving business value of Hadoop is often easier once BI / IT a
useable environment is already available!
Making the Case for Hadoop in a Large Enterprise 11
2. BA as a Case Study
Making the Case for Hadoop in a Large Enterprise 12
Anchor Use Case – Data Archive for
Legal Cases
Simple cost saving project identified as ideal anchor case
Project Objectives
•Agree technical architecture for enabling user case in Hadoop.
•Ensure agreed technical architecture is scalable for future
planned uses of Hadoop in BA.
•Deliver agreed technical architecture.
•Deliver BI and IT processes to support new infrastructure.
Project cash positive after 12 months, with order of magnitude
Opex savings once implemented.
Making the Case for Hadoop in a Large Enterprise 13
Keeping It Simple!
Hadoop is complex enough and the biggest risk in any project is
peoples ability to cope with the change
User
Function EDW Hadoop
Logon Corporate LDAP Corporate LDAP
BI Tools Existing Tools Existing Tools + Hue
Direct Access Method SQL HiveQL
Fault & Support KITE KITE
Data Definitions EDW Data Dictionary EDW Data Dictionary
Data structures are mastered in EDW for archiving project
Making the Case for Hadoop in a Large Enterprise 14
3. Technical Implementation
Making the Case for Hadoop in a Large Enterprise
BA Architecture Challenges
Security
• Multiple identity providers across the group
• For data access all security is controlled at the DB level via security roles
Infrastructure Constraints
• Policy of virtual infrastructure only
• Infrastructure Patterns (Puppet)
IT Service Delivery
• Alignment and integration with existing service delivery models
• Lots of application areas but only one equipped to support the Hadoop
environment (EDW team)
Information Security
• Policies for information security are based upon actor, role, business unit,
department, etc.
• Strong conformance and consistency
Support
• Due to segregated responsibilities application support is spread across
multiple COE’
• No to open source
Making the Case for Hadoop in a Large Enterprise
Challenges of Hadoop Implementation
Making the Case for Hadoop in a Large Enterprise
Hortonworks 2.2
Hadoop for the enterprise
17
Making the Case for Hadoop in a Large Enterprise
Hadoop Implementation Principals
Reuse don’t reinvent
• Reuse all EDW functions, capabilities and processes e.g. Data and Information Governance,
Database Administration, Support, and Exploitation
• Example of provisioning access, Corporate Directory Request Hadoop access will be just
another check box
• Naming standards and data/information definitions
The masterof the data structure will remain the source systems
• All data structures must be kept in sync between systems ensuring alignment with corporate
data/information dictionary.
The data archive/migration process will be inline with current process which is
managed and controlled by EDWteam
• Modifying the existing approach
Alignment with BA EA vision
• The initial rollout of Hadoop is to introduce the capabilities to BA. It is upon this platform other
business capabilities can be prototyped, validated and built extending the use of Hadoop.
Capability Governance
• Hadoop has lots of capabilities so governing and controlling what gets rolled out is critical to its
success
• BA Hadoop readiness assessment
18
Making the Case for Hadoop in a Large Enterprise
Capability Governance Example
Making the Case for Hadoop in a Large Enterprise
Hortonworks
Addressing Architectural Challenges with HDP 2.2.
20
Security Infrastructure Service Delivery
Information Security Support
Kerberos
(Authentication)
Ranger
(Authorisation and
Policies)
BA LDAP
(Identity Provider)
Ambari, Falcon
& Zookeeper
(Management )
YARN
(Resource
Manager)
Oozie
(Scheduler)
BA Puppet
(Provisioning) BA BMC
(Monitoring)
Control-M
(Scheduler)
BA Kite
(Support Ticket)
20
HDFS
(File System)
YARN
(Resource
Manager)
Kerberos
(Authentication)
Ranger
(Authorisation)
Ambari, Falcon
& Zookeeper
(Management )
Oozie
(Scheduler)
Core
Making the Case for Hadoop in a Large Enterprise 21
HDFS
(File System)
YARN
(Resource
Manager)
Kerberos
(Authentication)
Ranger
(Authorisation)
Ambari, Falcon
& Zookeeper
(Management )
Oozie
(Scheduler)
Core
Acquisition Processing Persistent Exploitation
Sqoop
(Bulk Transfer of
Data)
Pig
(Transformation)
Flume
(Collect, move
Log data)
Kafka
(Message queue)
Storm
(Computational
Engine)
Analytics
Visualisation
Excel
(Analysis)
Hue
(Web Interrogation
Client)
Hive
(SQL Datastore)
Solr
(Search)
Hbase
(NoSQL)
BatchNearReal-time
HCatelog
(Metastore)
Making the Case for Hadoop in a Large Enterprise
Building Business Solutions
Business Initiative IT Initiative Architecture Building
Blocks
Solution Building
Blocks
Improve Efficiency
& Reduce Costs
Data Archiving • Bulk data transfer
• Persistent data
store
• Off-line query
capability
• Enforce Data and
Information Security
policies
• Ability to Restore
• Scoop
• Hive
• Kerberos
• Ranger
.... .... •... •....
Making the Case for Hadoop in a Large Enterprise-British Airways

More Related Content

What's hot

Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

What's hot (20)

RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Large Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache SparkLarge Scale Geospatial Indexing and Analysis on Apache Spark
Large Scale Geospatial Indexing and Analysis on Apache Spark
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 

Viewers also liked

Hadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business UnitHadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business Unit
DataWorks Summit
 

Viewers also liked (19)

Hadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business UnitHadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business Unit
 
Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
 
Enterprise Data Management - Data Lake - A Perspective
Enterprise Data Management - Data Lake - A PerspectiveEnterprise Data Management - Data Lake - A Perspective
Enterprise Data Management - Data Lake - A Perspective
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 

Similar to Making the Case for Hadoop in a Large Enterprise-British Airways

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Similar to Making the Case for Hadoop in a Large Enterprise-British Airways (20)

Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureWhen Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Making the Case for Hadoop in a Large Enterprise-British Airways

  • 1.
  • 2. Making the Case for Hadoop in a Large Enterprise 15th April 2015
  • 3. Making the Case for Hadoop in a Large Enterprise Alan Spanos Data Exploitation Manager Jay Aubby Architect
  • 4. Making the Case for Hadoop in a Large Enterprise 4 Contents 1. How to Approach a Business Case for Hadoop 2. BA as a Case Study 3. Technical Implementation – HDP 2.2
  • 5. Making the Case for Hadoop in a Large Enterprise 5 1. How to Approach a Business Case for Hadoop
  • 6. Making the Case for Hadoop in a Large Enterprise 6 Things to Consider When running a project to deploy Hadoop Enterprise Wide, consider the following areas Business Strategy Enterprise Hadoop Additional Complexity Current BI Landscape Key Stakeholders User Base Financial Processes
  • 7. Making the Case for Hadoop in a Large Enterprise 7 Selecting your ‘Anchor’ Use Case Choosing a use case which has continual and undeniable value is key. This keys the focus on solving the technical challenges. • Suitable use cases include:  Data archiving of ‘cold’ data on EDW  Move sandboxes from EDW to Hadoop  Move ETL / ELT layer to Hadoop from RDBMS • Doing something new requires a ‘leap of faith’, which is something many finance teams are unwilling to make • Simpler projects will help pay for platform set up, which can, in future, enable more speculative projects
  • 8. Making the Case for Hadoop in a Large Enterprise 8 Architecture vs Deployment Hadoop’s modular architecture enables it to quickly and easily accommodate future use cases
  • 9. Making the Case for Hadoop in a Large Enterprise 9 Reuse not Reinvention Reusing existing staff, processes and concepts, is key to minimising the cost of implementing hadoop Function EDW Hadoop Data Governance EDW Data Governance EDW Data Governance Administrations EDW DBA EDW DBA Hardware Support IT Operations IT Operations Platform Support EDW Support EDW Support Security EDW Security EDW Security Development EDW Dev EDW Dev Users BI / Analysts BI / Analysts
  • 10. Making the Case for Hadoop in a Large Enterprise IT, Information and Data Governance, Processes, Security and Operations IT and Data Innovation Cost Architectural Fit ROI TCO Return Getting the Balance Right is Key Proving business value of Hadoop is often easier once BI / IT a useable environment is already available!
  • 11. Making the Case for Hadoop in a Large Enterprise 11 2. BA as a Case Study
  • 12. Making the Case for Hadoop in a Large Enterprise 12 Anchor Use Case – Data Archive for Legal Cases Simple cost saving project identified as ideal anchor case Project Objectives •Agree technical architecture for enabling user case in Hadoop. •Ensure agreed technical architecture is scalable for future planned uses of Hadoop in BA. •Deliver agreed technical architecture. •Deliver BI and IT processes to support new infrastructure. Project cash positive after 12 months, with order of magnitude Opex savings once implemented.
  • 13. Making the Case for Hadoop in a Large Enterprise 13 Keeping It Simple! Hadoop is complex enough and the biggest risk in any project is peoples ability to cope with the change User Function EDW Hadoop Logon Corporate LDAP Corporate LDAP BI Tools Existing Tools Existing Tools + Hue Direct Access Method SQL HiveQL Fault & Support KITE KITE Data Definitions EDW Data Dictionary EDW Data Dictionary Data structures are mastered in EDW for archiving project
  • 14. Making the Case for Hadoop in a Large Enterprise 14 3. Technical Implementation
  • 15. Making the Case for Hadoop in a Large Enterprise BA Architecture Challenges Security • Multiple identity providers across the group • For data access all security is controlled at the DB level via security roles Infrastructure Constraints • Policy of virtual infrastructure only • Infrastructure Patterns (Puppet) IT Service Delivery • Alignment and integration with existing service delivery models • Lots of application areas but only one equipped to support the Hadoop environment (EDW team) Information Security • Policies for information security are based upon actor, role, business unit, department, etc. • Strong conformance and consistency Support • Due to segregated responsibilities application support is spread across multiple COE’ • No to open source
  • 16. Making the Case for Hadoop in a Large Enterprise Challenges of Hadoop Implementation
  • 17. Making the Case for Hadoop in a Large Enterprise Hortonworks 2.2 Hadoop for the enterprise 17
  • 18. Making the Case for Hadoop in a Large Enterprise Hadoop Implementation Principals Reuse don’t reinvent • Reuse all EDW functions, capabilities and processes e.g. Data and Information Governance, Database Administration, Support, and Exploitation • Example of provisioning access, Corporate Directory Request Hadoop access will be just another check box • Naming standards and data/information definitions The masterof the data structure will remain the source systems • All data structures must be kept in sync between systems ensuring alignment with corporate data/information dictionary. The data archive/migration process will be inline with current process which is managed and controlled by EDWteam • Modifying the existing approach Alignment with BA EA vision • The initial rollout of Hadoop is to introduce the capabilities to BA. It is upon this platform other business capabilities can be prototyped, validated and built extending the use of Hadoop. Capability Governance • Hadoop has lots of capabilities so governing and controlling what gets rolled out is critical to its success • BA Hadoop readiness assessment 18
  • 19. Making the Case for Hadoop in a Large Enterprise Capability Governance Example
  • 20. Making the Case for Hadoop in a Large Enterprise Hortonworks Addressing Architectural Challenges with HDP 2.2. 20 Security Infrastructure Service Delivery Information Security Support Kerberos (Authentication) Ranger (Authorisation and Policies) BA LDAP (Identity Provider) Ambari, Falcon & Zookeeper (Management ) YARN (Resource Manager) Oozie (Scheduler) BA Puppet (Provisioning) BA BMC (Monitoring) Control-M (Scheduler) BA Kite (Support Ticket) 20 HDFS (File System) YARN (Resource Manager) Kerberos (Authentication) Ranger (Authorisation) Ambari, Falcon & Zookeeper (Management ) Oozie (Scheduler) Core
  • 21. Making the Case for Hadoop in a Large Enterprise 21 HDFS (File System) YARN (Resource Manager) Kerberos (Authentication) Ranger (Authorisation) Ambari, Falcon & Zookeeper (Management ) Oozie (Scheduler) Core Acquisition Processing Persistent Exploitation Sqoop (Bulk Transfer of Data) Pig (Transformation) Flume (Collect, move Log data) Kafka (Message queue) Storm (Computational Engine) Analytics Visualisation Excel (Analysis) Hue (Web Interrogation Client) Hive (SQL Datastore) Solr (Search) Hbase (NoSQL) BatchNearReal-time HCatelog (Metastore)
  • 22. Making the Case for Hadoop in a Large Enterprise Building Business Solutions Business Initiative IT Initiative Architecture Building Blocks Solution Building Blocks Improve Efficiency & Reduce Costs Data Archiving • Bulk data transfer • Persistent data store • Off-line query capability • Enforce Data and Information Security policies • Ability to Restore • Scoop • Hive • Kerberos • Ranger .... .... •... •....

Editor's Notes

  1. Use this slide as a placeholder before your presentation starts.
  2. Please ensure you complete the title and date information on this slide or else use slide #1. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  3. Please select and edit presenter name and title as appropriate. Footer title must be amended in the Slide Master. See View > Master > Slide Master to edit. Summarise your presentation title to fit. Alternatively use the date of your presentation.
  4. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  5. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  6. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  7. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  8. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  9. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  10. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  11. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  12. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  13. The font currently used on this slide is Mylius Sans which should be replaced with Mylius Modern as soon as this is made available.
  14. Use this slide as a placeholder before your presentation starts.