SlideShare uma empresa Scribd logo
1 de 15
DWH & BigData – 
architecture approaches 
Odessa 
Vladimir Slobodianiuk 
Date: 2014 
www.luxoft.com
Agenda 
www.luxoft.com 
1 
2 
Big Data – what is it 
Hadoop vs RDBMS – pros and cons 
3 Hadoop & Enterprise architecture 
4 Hadoop as ETL engine
Big Data 
– what is it 
www.luxoft.com
Current state 
 Big data - is an all-encompassing term for any collection of data sets so large and 
complex that it becomes difficult to process using traditional data processing 
applications. 
www.luxoft.com
Limitations & Problems 
www.luxoft.com 
 Big data is difficult to work with using 
most relational databases, requiring 
instead massively parallel software 
running on tens, hundreds, or even 
thousands of servers 
 eBay.com uses two data warehouses at 7.5 petabytes 
 Walmart handles more than 1 million customer 
transactions every hour 
 Facebook handles 50 billion photos from its user base 
 In 2012, the Obama administration announced the Big 
Data Research and Development Initiative
Hadoop vs RDBMS 
www.luxoft.com
CORE HADOOP - MapReduce 
In 2004, Google published a paper on a process called MapReduce 
www.luxoft.com 
 DISTRIBUTED 
COMPUTING 
FRAMEWORK 
 Process large jobs in 
parallel across many 
nodes and combine the 
results
Hadoop Structure 
www.luxoft.com 
 HDFS is a distributed file system designed to run on commodity hardware 
 HBase store data rows in labelled tables (sortable key and an arbitrary number of columns) 
 Hive provide data summarization, query, and analysis (SQL-like interface) 
 Pig is a platform for analyzing large data sets that consists of a high-level language
Hadoop vs RDBMS 
www.luxoft.com 
Hadoop RDBMS 
 Performance for relational data 
 Machine query optimization 
 Mature workload management 
 High concurrency interactive query 
processing 
 Schema-less Model 
 Human query optimization 
 Ability to create complex dataflow 
with multiple inputs and outputs 
 Parallelize many Analytic Functions 
How might this change in the future 
 Query Optimization Improvements in Hive 
– Statistics, better join ordering, more join types, etc 
 Startup Time Improvements 
– Simpler query plans to pass out 
 Runtime Performance Improvements
Hadoop & 
Enterprise architecture 
www.luxoft.com
Classic architecture approach 
www.luxoft.com
Hadoop & Enterprise architecture 
www.luxoft.com
Luxoft Big Data R&D 
Hadoop as ETL Data Quality tool 
www.luxoft.com 
BENEFITS 
 Reduced TCO (commodity hardware usage) 
 Traceability of all the data quality issues 
 Hadoop becomes clean data tool. 
PROBLEM 
Traditional tools show poor performance in exception 
and data cleansing. 
SOLUTION 
Hadoop transforms the data into single format and 
processes it using data cleansing workflows.
Summary 
Big Data: 
 
Cutting edge of DI technologies 
 
State-of-the-art design approaches 
 
A bit more than simple development, it's some of art, art 
of data management 
www.luxoft.com
THANK YOU 
www.luxoft.com

Mais conteúdo relacionado

Mais procurados

Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Jordan Chung
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworksAmal Targhi
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksAmazon Web Services
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunitiesMohammed Guller
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 

Mais procurados (20)

Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
BigData
BigDataBigData
BigData
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
introduction to big data frameworks
introduction to big data frameworksintroduction to big data frameworks
introduction to big data frameworks
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 

Semelhante a DWH & big data architecture approaches

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureKovid Academy
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 

Semelhante a DWH & big data architecture approaches (20)

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Hadoop
HadoopHadoop
Hadoop
 
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 How to use Hadoop for operational and transactional purposes by RODRIGO MERI... How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
How to use Hadoop for operational and transactional purposes by RODRIGO MERI...
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
How pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architectureHow pig and hadoop fit in data processing architecture
How pig and hadoop fit in data processing architecture
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 

Mais de Luxoft

Luxoft Horizon - Intuitive, Insightful, Innovative
Luxoft Horizon - Intuitive, Insightful, InnovativeLuxoft Horizon - Intuitive, Insightful, Innovative
Luxoft Horizon - Intuitive, Insightful, InnovativeLuxoft
 
The power of visualization
The power of visualization The power of visualization
The power of visualization Luxoft
 
Prototype that first
Prototype that first Prototype that first
Prototype that first Luxoft
 
Creating a flexible strategy using causal diagrams
Creating a flexible strategy using causal diagrams Creating a flexible strategy using causal diagrams
Creating a flexible strategy using causal diagrams Luxoft
 
When and why do we need agile
When and why do we need agile When and why do we need agile
When and why do we need agile Luxoft
 
Coach me if you can
Coach me if you can Coach me if you can
Coach me if you can Luxoft
 

Mais de Luxoft (6)

Luxoft Horizon - Intuitive, Insightful, Innovative
Luxoft Horizon - Intuitive, Insightful, InnovativeLuxoft Horizon - Intuitive, Insightful, Innovative
Luxoft Horizon - Intuitive, Insightful, Innovative
 
The power of visualization
The power of visualization The power of visualization
The power of visualization
 
Prototype that first
Prototype that first Prototype that first
Prototype that first
 
Creating a flexible strategy using causal diagrams
Creating a flexible strategy using causal diagrams Creating a flexible strategy using causal diagrams
Creating a flexible strategy using causal diagrams
 
When and why do we need agile
When and why do we need agile When and why do we need agile
When and why do we need agile
 
Coach me if you can
Coach me if you can Coach me if you can
Coach me if you can
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

DWH & big data architecture approaches

  • 1. DWH & BigData – architecture approaches Odessa Vladimir Slobodianiuk Date: 2014 www.luxoft.com
  • 2. Agenda www.luxoft.com 1 2 Big Data – what is it Hadoop vs RDBMS – pros and cons 3 Hadoop & Enterprise architecture 4 Hadoop as ETL engine
  • 3. Big Data – what is it www.luxoft.com
  • 4. Current state  Big data - is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications. www.luxoft.com
  • 5. Limitations & Problems www.luxoft.com  Big data is difficult to work with using most relational databases, requiring instead massively parallel software running on tens, hundreds, or even thousands of servers  eBay.com uses two data warehouses at 7.5 petabytes  Walmart handles more than 1 million customer transactions every hour  Facebook handles 50 billion photos from its user base  In 2012, the Obama administration announced the Big Data Research and Development Initiative
  • 6. Hadoop vs RDBMS www.luxoft.com
  • 7. CORE HADOOP - MapReduce In 2004, Google published a paper on a process called MapReduce www.luxoft.com  DISTRIBUTED COMPUTING FRAMEWORK  Process large jobs in parallel across many nodes and combine the results
  • 8. Hadoop Structure www.luxoft.com  HDFS is a distributed file system designed to run on commodity hardware  HBase store data rows in labelled tables (sortable key and an arbitrary number of columns)  Hive provide data summarization, query, and analysis (SQL-like interface)  Pig is a platform for analyzing large data sets that consists of a high-level language
  • 9. Hadoop vs RDBMS www.luxoft.com Hadoop RDBMS  Performance for relational data  Machine query optimization  Mature workload management  High concurrency interactive query processing  Schema-less Model  Human query optimization  Ability to create complex dataflow with multiple inputs and outputs  Parallelize many Analytic Functions How might this change in the future  Query Optimization Improvements in Hive – Statistics, better join ordering, more join types, etc  Startup Time Improvements – Simpler query plans to pass out  Runtime Performance Improvements
  • 10. Hadoop & Enterprise architecture www.luxoft.com
  • 12. Hadoop & Enterprise architecture www.luxoft.com
  • 13. Luxoft Big Data R&D Hadoop as ETL Data Quality tool www.luxoft.com BENEFITS  Reduced TCO (commodity hardware usage)  Traceability of all the data quality issues  Hadoop becomes clean data tool. PROBLEM Traditional tools show poor performance in exception and data cleansing. SOLUTION Hadoop transforms the data into single format and processes it using data cleansing workflows.
  • 14. Summary Big Data:  Cutting edge of DI technologies  State-of-the-art design approaches  A bit more than simple development, it's some of art, art of data management www.luxoft.com