SlideShare a Scribd company logo
1 of 16
Download to read offline
Infrastructure and Stack




              Presented by John Dougherty, Viriton
                          10/03/2012
                  john.dougherty@viriton.com
What is Hadoop?

• Apache’s implementation of Google’s BigTable
• Uses a Java VM in order to parse instructions
• Uses sequential writes & column based file
  structures with HDFS
• Grants the ability to read/write/manipulate very
  large data sets/structures.
What is Hadoop? (cont.)



         VS.
What is BigTable
• Contains the framework that was based on,
  and is used in, hadoop

• Uses a commodity approach to hardware

• Extreme scalability and redundancy
Commodity Perspective
• Commercial Hardware cost vs. failure rate
  – Roughly double the cost of commodity
  – Roughly 5% failure rate


• Commodity Hardware cost vs. failure rate
  – Roughly half the cost of commodity
  – Rougly 10-15% failure rate
Breaking Down the Complexity
What is HDFS
• Backend file system for the Hadoop platform

• Allows for easy operability/node management

• Certain technologies can replace or augment
  – Hbase (Augments HDFS)
  – Cassandra (Replaces HDFS)
What works with Hadoop?
• Middleware and connectivity tools improve functionality

• Hive, Pig, Cassandra (all sub-projects of Apache’s Hadoop)
  help to connect and utilize

• Each application set has different uses




                       Pig
Layout of Middleware
Schedulers/Configurators
• Zookeeper
  – Helps you in configuring many nodes
  – Can be integrated easily
• Oozie
  – A job resource/scheduler for hadoop
  – Open source
• Flume
  – Concatenator/Aggregator (Dist. log collection)
Middleware
• Hive
  – Data warehouse, connects natively to hadoop’s internals
  – Uses HiveQL to create queries
  – Easily extendable with plugins/macros
• Pig
  – Hive-like in that it uses its own query language (pig latin)
  – Easily extendable, more like SQL than Hive
• Sqoop
  – Connects databases and datasets
  – Limited, but powerful
How can Hadoop/Hbase/MapReduce help?

• You have a very large data set(s)
• You require results on your data in a timely
  manner
• You don’t enjoy spending millions on
  infrastructure
• Your data is large enough to cause a classic
  RDBMS headaches
Column Based Data
• Developer woes
  – Extract/Transfer/Load is a concern for complicated
    schemas
  – Egress/Ingress between existing queries/results
    becomes complicated
  – Solutions are deployed with walls of functionality
  – Hard questions turn into hard queries
Column Based Data (cont.)
• Developer joys
  – You can now process PB, into EB, and beyond
  – Your extended datasets can be aggregated, not
    easily; but also unlike ever before
  – You can extend your daily queries to include
    historical data, even incorporating into existing
    real-time data usage
Future Projects/Approaches

• Cross discipline data sharing/comparisons
• Complex statistical models re-constructed
• Massive data set conglomeration and
  standardization (Public sector data, mostly)
How some software makes it easier
• Alteryx
   – Very similar to Talend for interface, visual
   – Allows easy integration into reporting (Crystal Reports)
• Qubole
   – This will be expanded on shortly
   – Easy to use interface and management of data
• Hortonworks (Open Source)
   – Management utility for internal cluster deployments
• Cloudera (Open, to an extent)
   – Management utility from Cloudera, also for internal deployments

More Related Content

What's hot

SpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012Joydeep Sen Sarma
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseDavid Lauzon
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainingsGeek Trainings
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataCloudera, Inc.
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 

What's hot (20)

Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
SpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache Hadoop
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
HBase
HBaseHBase
HBase
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Introduction to hbase
Introduction to hbaseIntroduction to hbase
Introduction to hbase
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
Google cloud certification data engineer
Google cloud certification data engineerGoogle cloud certification data engineer
Google cloud certification data engineer
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainings
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
Sql over hadoop ver 3
Sql over hadoop ver 3Sql over hadoop ver 3
Sql over hadoop ver 3
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 

Viewers also liked

Pre cd and artist research
Pre cd and artist researchPre cd and artist research
Pre cd and artist research61141
 
Rosalia de Castro
Rosalia de CastroRosalia de Castro
Rosalia de CastroDeni-sa
 
Vir’s ib educators ankeeta
Vir’s ib educators ankeetaVir’s ib educators ankeeta
Vir’s ib educators ankeetaparth_damania
 
Aca advocacy
Aca advocacyAca advocacy
Aca advocacyschacctf
 
Evolución de los avances tecnológicos
Evolución de los avances tecnológicosEvolución de los avances tecnológicos
Evolución de los avances tecnológicosMerlys Escarpeta
 
Top 150 global design firms
Top 150 global design firmsTop 150 global design firms
Top 150 global design firmsSamar Momin
 
Rosalia de Castro
Rosalia de CastroRosalia de Castro
Rosalia de CastroDeni-sa
 
Catedra virtual de cultura ciudadana
Catedra virtual de cultura ciudadanaCatedra virtual de cultura ciudadana
Catedra virtual de cultura ciudadanaLuisa Paternina
 
페차쿠차
페차쿠차페차쿠차
페차쿠차ekwjd4793
 
Jiit 2013 14 project presentation aniket mishra
Jiit 2013 14 project presentation aniket mishraJiit 2013 14 project presentation aniket mishra
Jiit 2013 14 project presentation aniket mishraAniket Mishra
 
페차쿠차_ 조연진
페차쿠차_ 조연진페차쿠차_ 조연진
페차쿠차_ 조연진연진 조
 

Viewers also liked (20)

Pre cd and artist research
Pre cd and artist researchPre cd and artist research
Pre cd and artist research
 
Qtllb
QtllbQtllb
Qtllb
 
Applying Big Data
Applying Big DataApplying Big Data
Applying Big Data
 
Usability ppt
Usability pptUsability ppt
Usability ppt
 
Rosalia de Castro
Rosalia de CastroRosalia de Castro
Rosalia de Castro
 
Vir’s ib educators ankeeta
Vir’s ib educators ankeetaVir’s ib educators ankeeta
Vir’s ib educators ankeeta
 
Big Data ROI
Big Data ROIBig Data ROI
Big Data ROI
 
Aca advocacy
Aca advocacyAca advocacy
Aca advocacy
 
SEO Pricing & Cost
SEO Pricing & CostSEO Pricing & Cost
SEO Pricing & Cost
 
Evolución de los avances tecnológicos
Evolución de los avances tecnológicosEvolución de los avances tecnológicos
Evolución de los avances tecnológicos
 
Enc 3241 document_design1
Enc 3241 document_design1Enc 3241 document_design1
Enc 3241 document_design1
 
PRUEBA TOEFL
PRUEBA TOEFLPRUEBA TOEFL
PRUEBA TOEFL
 
Enc 3241 color
Enc 3241 colorEnc 3241 color
Enc 3241 color
 
Top 150 global design firms
Top 150 global design firmsTop 150 global design firms
Top 150 global design firms
 
Subculture hippie
Subculture hippieSubculture hippie
Subculture hippie
 
Rosalia de Castro
Rosalia de CastroRosalia de Castro
Rosalia de Castro
 
Catedra virtual de cultura ciudadana
Catedra virtual de cultura ciudadanaCatedra virtual de cultura ciudadana
Catedra virtual de cultura ciudadana
 
페차쿠차
페차쿠차페차쿠차
페차쿠차
 
Jiit 2013 14 project presentation aniket mishra
Jiit 2013 14 project presentation aniket mishraJiit 2013 14 project presentation aniket mishra
Jiit 2013 14 project presentation aniket mishra
 
페차쿠차_ 조연진
페차쿠차_ 조연진페차쿠차_ 조연진
페차쿠차_ 조연진
 

Similar to Hadoop Infrastructure (Oct. 3rd, 2012)

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basicssaili mane
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPKrishna Sujeer
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 

Similar to Hadoop Infrastructure (Oct. 3rd, 2012) (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
INTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOPINTRODUCTION TO BIG DATA HADOOP
INTRODUCTION TO BIG DATA HADOOP
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Hadoop Infrastructure (Oct. 3rd, 2012)

  • 1. Infrastructure and Stack Presented by John Dougherty, Viriton 10/03/2012 john.dougherty@viriton.com
  • 2. What is Hadoop? • Apache’s implementation of Google’s BigTable • Uses a Java VM in order to parse instructions • Uses sequential writes & column based file structures with HDFS • Grants the ability to read/write/manipulate very large data sets/structures.
  • 3. What is Hadoop? (cont.) VS.
  • 4. What is BigTable • Contains the framework that was based on, and is used in, hadoop • Uses a commodity approach to hardware • Extreme scalability and redundancy
  • 5. Commodity Perspective • Commercial Hardware cost vs. failure rate – Roughly double the cost of commodity – Roughly 5% failure rate • Commodity Hardware cost vs. failure rate – Roughly half the cost of commodity – Rougly 10-15% failure rate
  • 6. Breaking Down the Complexity
  • 7. What is HDFS • Backend file system for the Hadoop platform • Allows for easy operability/node management • Certain technologies can replace or augment – Hbase (Augments HDFS) – Cassandra (Replaces HDFS)
  • 8. What works with Hadoop? • Middleware and connectivity tools improve functionality • Hive, Pig, Cassandra (all sub-projects of Apache’s Hadoop) help to connect and utilize • Each application set has different uses Pig
  • 10. Schedulers/Configurators • Zookeeper – Helps you in configuring many nodes – Can be integrated easily • Oozie – A job resource/scheduler for hadoop – Open source • Flume – Concatenator/Aggregator (Dist. log collection)
  • 11. Middleware • Hive – Data warehouse, connects natively to hadoop’s internals – Uses HiveQL to create queries – Easily extendable with plugins/macros • Pig – Hive-like in that it uses its own query language (pig latin) – Easily extendable, more like SQL than Hive • Sqoop – Connects databases and datasets – Limited, but powerful
  • 12. How can Hadoop/Hbase/MapReduce help? • You have a very large data set(s) • You require results on your data in a timely manner • You don’t enjoy spending millions on infrastructure • Your data is large enough to cause a classic RDBMS headaches
  • 13. Column Based Data • Developer woes – Extract/Transfer/Load is a concern for complicated schemas – Egress/Ingress between existing queries/results becomes complicated – Solutions are deployed with walls of functionality – Hard questions turn into hard queries
  • 14. Column Based Data (cont.) • Developer joys – You can now process PB, into EB, and beyond – Your extended datasets can be aggregated, not easily; but also unlike ever before – You can extend your daily queries to include historical data, even incorporating into existing real-time data usage
  • 15. Future Projects/Approaches • Cross discipline data sharing/comparisons • Complex statistical models re-constructed • Massive data set conglomeration and standardization (Public sector data, mostly)
  • 16. How some software makes it easier • Alteryx – Very similar to Talend for interface, visual – Allows easy integration into reporting (Crystal Reports) • Qubole – This will be expanded on shortly – Easy to use interface and management of data • Hortonworks (Open Source) – Management utility for internal cluster deployments • Cloudera (Open, to an extent) – Management utility from Cloudera, also for internal deployments