O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Building A Modern Data Architecture (MDA) 
Using Enterprise Hadoop 
Slim Baltagi, Systems Architect 
Hortonworks Inc. 
Pag...
Your Presenter 
Slim Baltagi 
• Currently a Systems Architect in the Professional Services Organization of 
Hortonworks in...
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonw...
Traditional Data Architecture Under Pressure 
DATA 
SYSTEM 
APPLICATIONS 
SOURCES 
Business 
Analy:cs 
Custom 
Applica:ons...
A Modern Data Architecture for New Data 
DATA 
SYSTEM 
APPLICATIONS 
Business 
Analy:cs 
Custom 
Applica:ons 
Packaged 
Ap...
Enterprise Goals for the Modern Data Architecture 
Batch Interactive Real-Time 
Page 6 © Hortonworks Inc. 2011 – 2014. All...
1. Drivers for a Modern Data Architecture (MDA) 
• Semi-Structured and Unstructured – NEW DATA 
Unstructured documents, em...
1. Drivers for a Modern Data Architecture 
(Continued) 
• A dvanced Analytics – NEW ANALYTICS APPS 
Unlike schema-on-write...
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonw...
2. What’s a Modern Data Architecture (MDA)? 
• Apache Hadoop is a core component of a Modern Data Architecture, 
allowing ...
4. Hadoop’s role in an MDA 
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
3. What’s a Modern Data Architecture (MDA)? 
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Horton...
Key Drivers of Hadoop 
Batch Interactive Real-Time 
YARN: Data Operating System 
HDFS: ° ° Hadoop ° ° Distributed ° ° File...
Hadoop: It’s About Scale & Structure 
Required on write Required on read 
Standards and structured Multiple Structures 
pr...
YARN and HDP Enables the Modern Data Architecture 
Hortonworks Data Platform 2.2 
GOVERNANCE BATCH, INTERACTIVE & REAL-TIM...
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Horton...
Shift to Data-driven Means Treating Data like 
Capital 
A shift in Advertising 
From mass branding …to 1x1 targeting 
A sh...
Create New Applications from New Types of Data 
INDUSTRY USE CASE Sentiment 
Page 22 © Hortonworks Inc. 2011 – 2014. All R...
4.1 Advertising 
• Mine Grocery & Drug Store POS Data to Identify High-Value Shoppers 
• Target Ads to Customers in Specif...
5. Use Cases related to an MDA (Continued) 
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.2 Financial Services 
• Screen New Account Applications for Risk of Default 
• Monetize Anonymous Banking Data in Second...
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.3 Healthcare 
• Access Genomic Data for Medical Trials 
• Monitor Patient Vitals in Real-Time 
• Reduce Cardiac Re-Admit...
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.4 Manufacturing 
• Assure Just-In-Time Delivery of Raw Materials 
• Control Quality with Real-Time & Historical Assembly...
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.5 Oil & Gas 
• Slow Decline Curves with Production Parameter Optimization 
• Define Operational Set Points for Each Well...
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.6 Public Sector 
• Understand Public Sentiment About Government Performance 
• Protect Critical Networks from Threats (B...
4.7 Retail 
• Build a 360 degrees View of the Customer 
• Analyze Brand Sentiment 
• Localize & Personalize Promotions 
• ...
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
4.8 Telecom 
• Analyze Call Detail Records (CDRs) 
• Service Equipment Proactively 
• Rationalize Infrastructure Investmen...
Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What is a Data Lake? 
• An architectural pattern in the data center that uses Hadoop to deliver 
deeper insight across a l...
HDP Data Lake Reference Architecture 
NFS 
Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Manage Steps 1-4: ...
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Horton...
5. Learn More … 
Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Hortonworks Inc. 2013 
Page 41 
Resource L...
Outline 
1. Drivers 
for an 
MDA 
2. What’s 
an MDA 
Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
© Horton...
6. Q&A… 
Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Thank you!
Próximos SlideShares
Carregando em…5
×

Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Architecture with Enterprise Hadoop)

1.049 visualizações

Publicada em

Mr. Slim Baltagi is a Systems Architect at Hortonworks, with over 4 years of Hadoop experience working on 9 Big Data projects: Advanced Customer Analytics, Supply Chain Analytics, Medical Coverage Discovery, Payment Plan Recommender, Research Driven Call List for Sales, Prime Reporting Platform, Customer Hub, Telematics, Historical Data Platform; with Fortune 100 clients and global companies from Financial Services, Insurance, Healthcare and Retail.

Mr. Slim Baltagi has worked in various architecture, design, development and consulting roles at.
Accenture, CME Group, TransUnion, Syntel, Allstate, TransAmerica, Credit Suisse, Chicago Board Options Exchange, Federal Reserve Bank of Chicago, CNA, Sears, USG, ACNielsen, Deutshe Bahn.

Mr. Baltagi has also over 14 years of IT experience with an emphasis on full life cycle development of Enterprise Web applications using Java and Open-Source software. He holds a master’s degree in mathematics and is an ABD in computer science from Université Laval, Québec, Canada.

Languages: Java, Python, JRuby, JEE , PHP, SQL, HTML, XML, XSLT, XQuery, JavaScript, UML, JSON
Databases: Oracle, MS SQL Server, MYSQL, PostreSQL
Software: Eclipse, IBM RAD, JUnit, JMeter, YourKit, PVCS, CVS, UltraEdit, Toad, ClearCase, Maven, iText, Visio, Japser Reports, Alfresco, Yslow, Terracotta, Toad, SoapUI, Dozer, Sonar, Git
Frameworks: Spring, Struts, AppFuse, SiteMesh, Tiles, Hibernate, Axis, Selenium RC, DWR Ajax , Xstream
Distributed Computing/Big Data: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, R, RHadoop, Cloudera CDH4, MapR M7, Hortonworks HDP 2.1

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Architecture with Enterprise Hadoop)

  1. 1. Building A Modern Data Architecture (MDA) Using Enterprise Hadoop Slim Baltagi, Systems Architect Hortonworks Inc. Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Open-BDA Hadoop Summit 2014 November 18th, 2014
  2. 2. Your Presenter Slim Baltagi • Currently a Systems Architect in the Professional Services Organization of Hortonworks in the central region (US and Canada). • Over 4 years of Hadoop experience working on 9 Big Data projects. • Slim has over 16 years of IT experience working in various architecture, design, development and consulting roles. • Slim Baltagi holds a master’s degree in Mathematics and is an ABD in computer science from Université Laval, Québec, Canada. • Twitter: @SlimBaltagi Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  3. 3. Outline 1. Drivers for an MDA 2. What’s an MDA Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 3
  4. 4. Traditional Data Architecture Under Pressure DATA SYSTEM APPLICATIONS SOURCES Business Analy:cs Custom Applica:ons Packaged Applica:ons RDBMS SILO SILO EDW MPP SILO SILO SILO SILO Exis:ng Sources (CRM, ERP, Clickstream, Logs) Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 85% Data growth: New Data Types OLTP, ERP, CRM Systems Unstructured docs, emails Server logs Social/Web Data Sensor. Machine Data Geoloca:on Clickstream Source: IDC ?? " Can’t manage new data paradigm " Constrains data to specific schema " Siloed data " Limited scalability " Economically unfeasible " Limited analytics
  5. 5. A Modern Data Architecture for New Data DATA SYSTEM APPLICATIONS Business Analy:cs Custom Applica:ons Packaged Applica:ons RDBMS EDW MPP REPOSITORIES SOURCES Exis:ng Sources (CRM, ERP, Clickstream, Logs) Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved OLTP, ERP, CRM Systems Unstructured documents, emails Clickstream Server logs Sen>ment, Web Data Sensor. Machine Data Geoloca>on New Data Requirements: • Scale • Economics • Flexibility Traditional Data Architecture
  6. 6. Enterprise Goals for the Modern Data Architecture Batch Interactive Real-Time Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ü Centrally manage new and existing data ü Provide single view of the customer, product, supply chain ü Run batch, interactive & real time analytic applications on shared datasets ü Assure enterprise-grade security, operations and governance ü Leverage new and existing data center infrastructure investments ü Scalable and affordable; low cost per TB ü Deployment flexibility DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N CRM ERP Other 1 ° ° ° ° ° ° HDFS (Hadoop Distributed File System) SOURCES EXISTING Systems Clickstream Web & Social Geoloca:on Sensor & Machine Server Logs Unstructured
  7. 7. 1. Drivers for a Modern Data Architecture (MDA) • Semi-Structured and Unstructured – NEW DATA Unstructured documents, emails, Sentiment, Web Data, Sensor, Machine Data, Geolocation, ... • Enterprise Data Warehouse Optimization – REDUCED COSTS Low-value computing tasks such as ETL consume significant EDW resources. When offloaded to Hadoop, these ETL processes can be performed much more efficiently, freeing up your data warehouse to perform high-value functions like analytics and operations. Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  8. 8. 1. Drivers for a Modern Data Architecture (Continued) • A dvanced Analytics – NEW ANALYTICS APPS Unlike schema-on-write, which transforms data into specified schema upon load, Hadoop empowers you to store data in any format, and then create schema at that moment when you choose to analyze your data. This unprecedented flexibility opens up new possibilities for iterative analytics and delivers new business value. • Single Cluster, Multiple Workloads – ANY WORKLOAD With Apache Hadoop YARN supporting multiple access methods (such as batch, interactive, streaming and real-time) on a common data set, Hadoop enables you to transform and view data in multiple ways simultaneously, dramatically reducing time to insight. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  9. 9. Outline 1. Drivers for an MDA 2. What’s an MDA Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 9
  10. 10. 2. What’s a Modern Data Architecture (MDA)? • Apache Hadoop is a core component of a Modern Data Architecture, allowing organizations to collect, store, analyze and manipulate massive quantities of data on their own terms—regardless of the source of that data, how old it is, where it is stored, or under what format. • The Hortonworks Data Platform (HDP) delivers Enterprise Apache Hadoop, deeply integrated with existing systems to create a highly efficient, highly scalable way to manage all your enterprise data. • Integrate new & existing data sets, with existing tools & skills. • Make all data available for shared access and processing in multitenant infrastructure • Batch, interactive & real-time use cases Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  11. 11. 4. Hadoop’s role in an MDA Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  12. 12. 3. What’s a Modern Data Architecture (MDA)? Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  13. 13. Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  14. 14. Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  15. 15. Outline 1. Drivers for an MDA 2. What’s an MDA Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 15
  16. 16. Key Drivers of Hadoop Batch Interactive Real-Time YARN: Data Operating System HDFS: ° ° Hadoop ° ° Distributed ° ° File ° System ° ° Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DEV & DATA TOOLS Build & Test OPERATIONS TOOLS Provision, Manage & Monitor DATA SYSTEM REPOSITORIES SOURCES RDBMS EDW MPP APPLICATIONS Business Analy:cs Custom Applica:ons Packaged Applica:ons Unlock New Approach to Analy:cs • Agile analy>cs via “Schema on Read” with ability to store all data in na>ve format • Create new apps from new types of data A Op:mize Investments, Cut Costs • Focus EDW on high value workloads • Use commodity servers & storage to enable all data (original and historical) to be accessible for ongoing explora>on B Enable a Modern Data Architecture • Integrate new & exis>ng data sets • Make all data available for shared access and processing in mul>tenant infrastructure • Batch, interac>ve & real-­‐>me use cases • Integrated with exis>ng tools & skills C EXISTING Systems Clickstream Web & Social Geoloca:on Sensor & Machine Server Logs Unstructured
  17. 17. Hadoop: It’s About Scale & Structure Required on write Required on read Standards and structured Multiple Structures processing Limited, no data processing Processing coupled with data Structured data types Multi and unstructured Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop schema governance best fit use Complex ACID Transactions Operational Data Store Data Discovery Processing unstructured data Interactive Analytics Traditional RDBMS SCALE (storage & processing) Optimized, reliable transactions Optimized for analytics
  18. 18. YARN and HDP Enables the Modern Data Architecture Hortonworks Data Platform 2.2 GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS Java Scala Cascading Tez Stream Storm NoSQL HBase Accumulo Sli der Slider In- Memory Spark YARN: Data Operating System (Cluster Resource Management) Script Pig SQL Hive TezTez 1 ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) ° ° ° ° ° ° ° ° ° ° ° ° Search Solr Others ISV Engines ° ° ° ° ° ° ° ° ° ° Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Linux Windows Deployment Choice On-Premises Cloud Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN is the architectural center of Hadoop and HDP • YARN enables a common data set across all applications • Batch, interactive & real-time workloads • Support multi-tenant access & processing HDP enables Apache Hadoop to become Enterprise Viable Data Platform with centralized services • Security • Governance • Operations • Productization Enabled broad ecosystem adoption Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Authentication Authorization Audit Data Protection Storage: HDFS Resources: YARN Access: Hive Pipeline: Falcon Cluster: Ranger Cluster: Knox Hortonworks drove this innovation of Hadoop through YARN
  19. 19. Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  20. 20. Outline 1. Drivers for an MDA 2. What’s an MDA Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 20
  21. 21. Shift to Data-driven Means Treating Data like Capital A shift in Advertising From mass branding …to 1x1 targeting A shift in Financial Services From educated investing …to automated algorithms A shift in Healthcare From mass treatment …to designer medicine A shift in Retail …to real-t From static branding ime personalization A shift in Manufacturing From break then fix …to repair before break Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop enables organizations to cost effectively store and use all of the data available in a way that shifts the business from… Reactive Proactive
  22. 22. Create New Applications from New Types of Data INDUSTRY USE CASE Sentiment Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved & Web Clickstream & Behavior Machine & Sensor Geographic Server Logs Structured & Unstructured Financial Services New Account Risk Screens ✔ ✔ Trading Risk ✔ ✔ Insurance Underwriting ✔ ✔ ✔ Telecom Call Detail Records (CDR) ✔ ✔ Infrastructure Investment ✔ ✔ Real-time Bandwidth Allocation ✔ ✔ ✔ ✔ ✔ Retail 360° View of the Customer ✔ ✔ ✔ Localized, Personalized Promotions ✔ Website Optimization ✔ Manufacturing Supply Chain and Logistics ✔ Assembly Line Quality Assurance ✔ Crowd-sourced Quality Assurance ✔ Healthcare Use Genomic Data in Medical Trials ✔ ✔ Monitor Patient Vitals in Real-Time ✔ ✔ Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔ Improve Prescription Adherence ✔ ✔ ✔ ✔ Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔ Monitor Rig Safety in Real-Time ✔ ✔ ✔ Government ETL Offload/Federal Budgetary Pressures ✔ ✔ Sentiment Analysis for Government Programs ✔
  23. 23. 4.1 Advertising • Mine Grocery & Drug Store POS Data to Identify High-Value Shoppers • Target Ads to Customers in Specific Cultural or Linguistic Segments • Syndicate Videos According to Behavior, Demographics & Channel • ETL Toy Market Research Data for Longer Retention & Deeper Insight • Optimize Online Ad Placement for Retail Websites Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  24. 24. 5. Use Cases related to an MDA (Continued) Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  25. 25. 4.2 Financial Services • Screen New Account Applications for Risk of Default • Monetize Anonymous Banking Data in Secondary Markets • Improve Underwriting Efficiency for Usage-Based Auto Insurance • Analyze Insurance Claims with a Shared Data Lake • Maintain Sub-Second SLAs with a Hadoop “Ticker Plant” • Surveillance of Trading Logs for Anti-Laundering Analysis Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  26. 26. Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  27. 27. 4.3 Healthcare • Access Genomic Data for Medical Trials • Monitor Patient Vitals in Real-Time • Reduce Cardiac Re-Admittance Rates • Machine Learning to Screen for Autism with In-Home Testing • Store Medical Research Data Forever • Recruit Research Cohorts for Pharmaceutical Trials • Track Equipment and Medicines with RFID Data • Improve Prescription Adherence Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  28. 28. Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  29. 29. 4.4 Manufacturing • Assure Just-In-Time Delivery of Raw Materials • Control Quality with Real-Time & Historical Assembly Line Data • Avoid Stoppages with Proactive Equipment Maintenance • Increase Yields in Drug Manufacturing • Crowdsource Quality Assurance Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  30. 30. Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  31. 31. 4.5 Oil & Gas • Slow Decline Curves with Production Parameter Optimization • Define Operational Set Points for Each Well & Receive Alerts on Deviations • Optimize Lease Bidding with Reliable Yield Predictions • Report on Compliance with Environmental , Health and Safety Regulations • Repair Equipment Preventatively with Targeted Maintenance • Integrate Exploration with Seismic Image Processing Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  32. 32. Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  33. 33. 4.6 Public Sector • Understand Public Sentiment About Government Performance • Protect Critical Networks from Threats (Both Internal and External) • Prevent Fraud and Waste • Analyze Social Media to Identify Terrorist Threats • Decrease Budget Pressures by Offloading Expensive SQL Workloads • Crowdsource Reporting for Repairs to Roads and Public Infrastructure • Fulfill “Open Records” and Freedom of Information Requests Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  34. 34. 4.7 Retail • Build a 360 degrees View of the Customer • Analyze Brand Sentiment • Localize & Personalize Promotions • Optimize Websites • Optimize Store Layouts Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  35. 35. Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  36. 36. 4.8 Telecom • Analyze Call Detail Records (CDRs) • Service Equipment Proactively • Rationalize Infrastructure Investments • Recommend Next Product to Buy (NPTB) • Allocate Bandwidth in Real-time • Develop New Products Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  37. 37. Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  38. 38. What is a Data Lake? • An architectural pattern in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale § But What is it? – It is a PLATFORM for your data. (It is not a database) – Multipurpose open PLATFORM to land all data in a single place and interact with it many ways. § A platform that allows for the ecosystem to provide higher level services (SAS, SAP, Microsoft, Streaming, MPP, In-memory, etc..) § Provides first class APIs and frameworks to enable this integration § Provides first class data management capabilities (metadata management, security, transformation pipelines, replication, retention, etc..) Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Page 38
  39. 39. HDP Data Lake Reference Architecture NFS Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Manage Steps 1-4: Data Management with Falcon Step 4: Schedule and Orchestrate HIVE PIG Mahout Step 3: Transform, Aggregate & Materialize compute & storage . . . SOLR MR2 . . . . . compute AMBARI Knox – Perimeter Level Security & storage . . YARN Data Lake HDP Grid INTERACTIVE Hive Server (Tez/Stinger) Stream Processing, Real-time Search, MPI YARN Apps Page 39 HCATALOG (table & user-defined metadata) Step 2: Model/Apply Metadata Use Case Type 1: Materialize & Exchange Opens up Hadoop to many new use cases Query/ Analytics/Reporting Tools Tableau/Excel Datameer/Platfora/SAP Use Case Type 2: Explore/Visualize FALCON (data pipeline & flow management) Oozie (Batch scheduler) (data processing) Exchange HBase Client Sqoop/Hive Downstream Data Sources OLTP HBase EDW (Teradata) Storm SAS TEZ Ingestion SQOOP FLUME Web HDFS SOURCE DATA ClickStream Data Sales Transaction/Data Product Data Marketing/ Inventory Social Data EDW File JMS REST HTTP Streaming STORM Step 1:Extract & Load
  40. 40. Outline 1. Drivers for an MDA 2. What’s an MDA Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 40
  41. 41. 5. Learn More … Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 Page 41 Resource Location MDA White Paper http://info.hortonworks.com/data-lake-hadoop-whitepaper.html Learn more about Modern Data Architecture (MDA) MDA Web Page http://hortonworks.com/hadoop-modern-data-architecture/ Explore Use Cases by Industry Hortonworks Sandbox http://hortonworks.com/products/hortonworks-sandbox/ Get Started on Hadoop with Hortonworks Sandbox Hadoop Tutorials http://info.hortonworks.com/On-demand-Tutorials_Sign-Up-Page.html On-Demand Hadoop Tutorials Delivered to Your Inbox Enterprise Data Lake http://hortonworks.com/blog/enterprise-hadoop-journey-data-lake/ Enterprise Hadoop and the journey to Data Lake
  42. 42. Outline 1. Drivers for an MDA 2. What’s an MDA Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved © Hortonworks Inc. 2013 3. Hadoop’s role in an MDA 4. Use Cases related to an MDA 5. Learn More 6. Q&A Page 42
  43. 43. 6. Q&A… Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank you!

×