SlideShare a Scribd company logo
1 of 16
Download to read offline
Infrastructure and Stack




              Presented by John Dougherty, Viriton
                          10/03/2012
                  john.dougherty@viriton.com
What is Hadoop?

• Apache’s implementation of Google’s BigTable
• Uses a Java VM in order to parse instructions
• Uses sequential writes & column based file
  structures with HDFS
• Grants the ability to read/write/manipulate very
  large data sets/structures.
What is Hadoop? (cont.)



         VS.
What is BigTable
• Contains the framework that was based on,
  and is used in, hadoop

• Uses a commodity approach to hardware

• Extreme scalability and redundancy
Commodity Perspective
• Commercial Hardware cost vs. failure rate
  – Roughly double the cost of commodity
  – Roughly 5% failure rate


• Commodity Hardware cost vs. failure rate
  – Roughly half the cost of commodity
  – Rougly 10-15% failure rate
Breaking Down the Complexity
What is HDFS
• Backend file system for the Hadoop platform

• Allows for easy operability/node management

• Certain technologies can replace or augment
  – Hbase (Augments HDFS)
  – Cassandra (Replaces HDFS)
What works with Hadoop?
• Middleware and connectivity tools improve functionality

• Hive, Pig, Cassandra (all sub-projects of Apache’s Hadoop)
  help to connect and utilize

• Each application set has different uses




                       Pig
Layout of Middleware
Schedulers/Configurators
• Zookeeper
  – Helps you in configuring many nodes
  – Can be integrated easily
• Oozie
  – A job resource/scheduler for hadoop
  – Open source
• Flume
  – Concatenator/Aggregator (Dist. log collection)
Middleware
• Hive
  – Data warehouse, connects natively to hadoop’s internals
  – Uses HiveQL to create queries
  – Easily extendable with plugins/macros
• Pig
  – Hive-like in that it uses its own query language (pig latin)
  – Easily extendable, more like SQL than Hive
• Sqoop
  – Connects databases and datasets
  – Limited, but powerful
How can Hadoop/Hbase/MapReduce help?

• You have a very large data set(s)
• You require results on your data in a timely
  manner
• You don’t enjoy spending millions on
  infrastructure
• Your data is large enough to cause a classic
  RDBMS headaches
Column Based Data
• Developer woes
  – Extract/Transfer/Load is a concern for complicated
    schemas
  – Egress/Ingress between existing queries/results
    becomes complicated
  – Solutions are deployed with walls of functionality
  – Hard questions turn into hard queries
Column Based Data (cont.)
• Developer joys
  – You can now process PB, into EB, and beyond
  – Your extended datasets can be aggregated, not
    easily; but also unlike ever before
  – You can extend your daily queries to include
    historical data, even incorporating into existing
    real-time data usage
Future Projects/Approaches

• Cross discipline data sharing/comparisons
• Complex statistical models re-constructed
• Massive data set conglomeration and
  standardization (Public sector data, mostly)
How some software makes it easier
• Alteryx
   – Very similar to Talend for interface, visual
   – Allows easy integration into reporting (Crystal Reports)
• Qubole
   – This will be expanded on shortly
   – Easy to use interface and management of data
• Hortonworks (Open Source)
   – Management utility for internal cluster deployments
• Cloudera (Open, to an extent)
   – Management utility from Cloudera, also for internal deployments

More Related Content

What's hot

SpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012Joydeep Sen Sarma
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseDavid Lauzon
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainingsGeek Trainings
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 

What's hot (20)

Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
SpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache Hadoop
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
HBase
HBaseHBase
HBase
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Introduction to hbase
Introduction to hbaseIntroduction to hbase
Introduction to hbase
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Google cloud certification data engineer
Google cloud certification data engineerGoogle cloud certification data engineer
Google cloud certification data engineer
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainings
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
Sql over hadoop ver 3
Sql over hadoop ver 3Sql over hadoop ver 3
Sql over hadoop ver 3
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 

Viewers also liked

Pre cd and artist research
Pre cd and artist researchPre cd and artist research
Pre cd and artist research61141
 
Rosalia de Castro
Rosalia de CastroRosalia de Castro
Rosalia de CastroDeni-sa
 
Vir’s ib educators ankeeta
Vir’s ib educators ankeetaVir’s ib educators ankeeta
Vir’s ib educators ankeetaparth_damania
 
Aca advocacy
Aca advocacyAca advocacy
Aca advocacyschacctf
 
Evolución de los avances tecnológicos
Evolución de los avances tecnológicosEvolución de los avances tecnológicos
Evolución de los avances tecnológicosMerlys Escarpeta
 
Top 150 global design firms
Top 150 global design firmsTop 150 global design firms
Top 150 global design firmsSamar Momin
 
Rosalia de Castro
Rosalia de CastroRosalia de Castro
Rosalia de CastroDeni-sa
 
Catedra virtual de cultura ciudadana
Catedra virtual de cultura ciudadanaCatedra virtual de cultura ciudadana
Catedra virtual de cultura ciudadanaLuisa Paternina
 
페차쿠차
페차쿠차페차쿠차
페차쿠차ekwjd4793
 
Jiit 2013 14 project presentation aniket mishra
Jiit 2013 14 project presentation aniket mishraJiit 2013 14 project presentation aniket mishra
Jiit 2013 14 project presentation aniket mishraAniket Mishra
 
페차쿠차_ 조연진
페차쿠차_ 조연진페차쿠차_ 조연진
페차쿠차_ 조연진연진 조
 

Viewers also liked (20)

Pre cd and artist research
Pre cd and artist researchPre cd and artist research
Pre cd and artist research
 
Qtllb
QtllbQtllb
Qtllb
 
Applying Big Data
Applying Big DataApplying Big Data
Applying Big Data
 
Usability ppt
Usability pptUsability ppt
Usability ppt
 
Rosalia de Castro
Rosalia de CastroRosalia de Castro
Rosalia de Castro
 
Vir’s ib educators ankeeta
Vir’s ib educators ankeetaVir’s ib educators ankeeta
Vir’s ib educators ankeeta
 
Big Data ROI
Big Data ROIBig Data ROI
Big Data ROI
 
Aca advocacy
Aca advocacyAca advocacy
Aca advocacy
 
SEO Pricing & Cost
SEO Pricing & CostSEO Pricing & Cost
SEO Pricing & Cost
 
Evolución de los avances tecnológicos
Evolución de los avances tecnológicosEvolución de los avances tecnológicos
Evolución de los avances tecnológicos
 
Enc 3241 document_design1
Enc 3241 document_design1Enc 3241 document_design1
Enc 3241 document_design1
 
PRUEBA TOEFL
PRUEBA TOEFLPRUEBA TOEFL
PRUEBA TOEFL
 
Enc 3241 color
Enc 3241 colorEnc 3241 color
Enc 3241 color
 
Top 150 global design firms
Top 150 global design firmsTop 150 global design firms
Top 150 global design firms
 
Subculture hippie
Subculture hippieSubculture hippie
Subculture hippie
 
Rosalia de Castro
Rosalia de CastroRosalia de Castro
Rosalia de Castro
 
Catedra virtual de cultura ciudadana
Catedra virtual de cultura ciudadanaCatedra virtual de cultura ciudadana
Catedra virtual de cultura ciudadana
 
페차쿠차
페차쿠차페차쿠차
페차쿠차
 
Jiit 2013 14 project presentation aniket mishra
Jiit 2013 14 project presentation aniket mishraJiit 2013 14 project presentation aniket mishra
Jiit 2013 14 project presentation aniket mishra
 
페차쿠차_ 조연진
페차쿠차_ 조연진페차쿠차_ 조연진
페차쿠차_ 조연진
 

Similar to Hadoop Infrastructure (Oct. 3rd, 2012)

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basicssaili mane
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 

Similar to Hadoop Infrastructure (Oct. 3rd, 2012) (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Hadoop Infrastructure (Oct. 3rd, 2012)

  • 1. Infrastructure and Stack Presented by John Dougherty, Viriton 10/03/2012 john.dougherty@viriton.com
  • 2. What is Hadoop? • Apache’s implementation of Google’s BigTable • Uses a Java VM in order to parse instructions • Uses sequential writes & column based file structures with HDFS • Grants the ability to read/write/manipulate very large data sets/structures.
  • 3. What is Hadoop? (cont.) VS.
  • 4. What is BigTable • Contains the framework that was based on, and is used in, hadoop • Uses a commodity approach to hardware • Extreme scalability and redundancy
  • 5. Commodity Perspective • Commercial Hardware cost vs. failure rate – Roughly double the cost of commodity – Roughly 5% failure rate • Commodity Hardware cost vs. failure rate – Roughly half the cost of commodity – Rougly 10-15% failure rate
  • 6. Breaking Down the Complexity
  • 7. What is HDFS • Backend file system for the Hadoop platform • Allows for easy operability/node management • Certain technologies can replace or augment – Hbase (Augments HDFS) – Cassandra (Replaces HDFS)
  • 8. What works with Hadoop? • Middleware and connectivity tools improve functionality • Hive, Pig, Cassandra (all sub-projects of Apache’s Hadoop) help to connect and utilize • Each application set has different uses Pig
  • 10. Schedulers/Configurators • Zookeeper – Helps you in configuring many nodes – Can be integrated easily • Oozie – A job resource/scheduler for hadoop – Open source • Flume – Concatenator/Aggregator (Dist. log collection)
  • 11. Middleware • Hive – Data warehouse, connects natively to hadoop’s internals – Uses HiveQL to create queries – Easily extendable with plugins/macros • Pig – Hive-like in that it uses its own query language (pig latin) – Easily extendable, more like SQL than Hive • Sqoop – Connects databases and datasets – Limited, but powerful
  • 12. How can Hadoop/Hbase/MapReduce help? • You have a very large data set(s) • You require results on your data in a timely manner • You don’t enjoy spending millions on infrastructure • Your data is large enough to cause a classic RDBMS headaches
  • 13. Column Based Data • Developer woes – Extract/Transfer/Load is a concern for complicated schemas – Egress/Ingress between existing queries/results becomes complicated – Solutions are deployed with walls of functionality – Hard questions turn into hard queries
  • 14. Column Based Data (cont.) • Developer joys – You can now process PB, into EB, and beyond – Your extended datasets can be aggregated, not easily; but also unlike ever before – You can extend your daily queries to include historical data, even incorporating into existing real-time data usage
  • 15. Future Projects/Approaches • Cross discipline data sharing/comparisons • Complex statistical models re-constructed • Massive data set conglomeration and standardization (Public sector data, mostly)
  • 16. How some software makes it easier • Alteryx – Very similar to Talend for interface, visual – Allows easy integration into reporting (Crystal Reports) • Qubole – This will be expanded on shortly – Easy to use interface and management of data • Hortonworks (Open Source) – Management utility for internal cluster deployments • Cloudera (Open, to an extent) – Management utility from Cloudera, also for internal deployments