SlideShare uma empresa Scribd logo
1 de 17
Hadoop Presentation
       2012

Presenter : Pham Thai Hoa
Email : thaihoabo@gmail.com
Web : http://mobion.com/hoa



                    4/14/2012   Pham Thai Hoa
Topic
 Introduce to Hadoop
 Introduce to Hive
 Introduce to Logger
 Using Hadoop at Mobion
 Warehouse at Mobion
 Q&A




                4/14/2012   Pham Thai Hoa
What is Hadoop
 It’s a framework for the distributed
  processing
 Inspired by Google’s architecture: Map
  Reduce and GFS
 A top-level Apache project
 Hadoop is the open source
 Hadoop have the two important
  elements
  + Map – Reduce core
  + Hadoop Distributed File System
                  4/14/2012   Pham Thai Hoa
Why use Hadoop
 Fault-tolerant hardware is expensive
 Hadoop is designed to run on cheap
  commodity hardware
 It automatically handles data
  replication and node failure
 It does the hard work – you can focus
  on processing data
 It has the three supported modes :
  Local, Pseudo-Distributed, Fully-
  Distributed Mode
                  4/14/2012   Pham Thai Hoa
Data Flow into Hadoop




         4/14/2012   Pham Thai Hoa
Who use Hadoop
 Amazon's product search indices
  using the streaming API and pre-
  existing C++, Perl, and Python tools
 Yahoo : More than 100,000 CPUs in
  >40,000 computers running Hadoop
 Facebook use Hadoop to store copies
  of internal log and dimension data
  sources and use it as a source for
  reporting/analytics and machine
  learning
                 4/14/2012   Pham Thai Hoa
What is Hive
 Hive is a data warehouse system for
  Hadoop
 Using Map-Reduce for execution
 Using HDFS for storage
 Metadata in an RDBMS
 Scalability and performance
 Interoperability
 Using a SQL-like language called
  HiveQL
                  4/14/2012   Pham Thai Hoa
Data Flow into Hive




        4/14/2012   Pham Thai Hoa
Hive Data Model
 Tables
  + Typed columns (int, float, string,…)
  + Also, array/map/struct for JSON-like
  data
 Partitions
  + e.g., to range-partition tables by
  date
 Buckets
  + Hash partitions within ranges (useful
  for sampling, join optimization)
                   4/14/2012   Pham Thai Hoa
Hive Metastore
 Database: namespace containing a
  set of tables
 Holds Table/Partition definitions
  (column types,mappings to HDFS
  directories)
 Statistics
 Implemented with DataNucleus ORM.
  Runs on Derby, MySQL, and many
  other relational databases
                4/14/2012   Pham Thai Hoa
Introduce to Logger
 A logging system has three broad
  components
  + Client Code Interface
  + Distribution System
  + Do Something Usefullizer
 Scribe is a server for aggregating
  streaming log data. It is designed to
  scale to a very large number of nodes
  and be robust to network and node
  failures
                  4/14/2012   Pham Thai Hoa
Why use Scribe
 Scalability and performance
 Event Notification library
 Thrift framework
 Hadoop is optional
 Client using
 Distributed scribe system
 Over 1 million messages per second
  for logging
 Hierarchy stores

                 4/14/2012   Pham Thai Hoa
Warehouse at Mobion
 Log Collector
 Log/Data Transformer
 Data Analyzer
 Web Reporter
 Log define
 Log integrate (into application)
 Log/Data analyze
 Report develop (API, Mobion, Music
  …)
                 4/14/2012   Pham Thai Hoa
Warehouse at Mobion
 Data mining
 Music Recommendation
 Spam Detection
 Application performance
 Export data and import into MySQL for
  web report
 Analytic system



                  4/14/2012   Pham Thai Hoa
Q&A
 Why use hadoop ?
 Why use Hive ?
 Why need a logging system ?
 What is the warehouse system
  architecture ?
 Do we use these system for voting,
  chat, message and feed ??
 How can we use them for
  recommendation, suggestion ?

                  4/14/2012   Pham Thai Hoa
Following Link
 http://facebook.com
 http://highscalability.com/product-
  scribe-facebooks-scalable-logging-
  system
 http://hadoop.apache.org/
 http://hive.apache.org/
 http://wiki.apache.org/hadoop/Powere
  dBy
 http://www.apache.org/foundation/than
  ks.html         4/14/2012   Pham Thai Hoa
THANK YOU
   4/14/2012   Pham Thai Hoa

Mais conteúdo relacionado

Mais procurados

Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 

Mais procurados (20)

Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 

Destaque

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Life Insurance Facts
Life Insurance FactsLife Insurance Facts
Life Insurance FactsPolicyBoss
 

Destaque (18)

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Food & Beverage Liability Insurance
Food & Beverage Liability InsuranceFood & Beverage Liability Insurance
Food & Beverage Liability Insurance
 
Room Viewer
Room ViewerRoom Viewer
Room Viewer
 
The New Enterprise Data Platform
The New Enterprise Data PlatformThe New Enterprise Data Platform
The New Enterprise Data Platform
 
Apartment buildings insurance
Apartment buildings insuranceApartment buildings insurance
Apartment buildings insurance
 
Life Insurance Facts
Life Insurance FactsLife Insurance Facts
Life Insurance Facts
 

Semelhante a Hadoop Presentation

Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache HadoopFirst NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache HadoopEvert Lammerts
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Pramod Gosavi
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotechlccinfotech
 
WHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USES
WHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USESWHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USES
WHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USESSprintzeal
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst TrainingCloudera, Inc.
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 

Semelhante a Hadoop Presentation (20)

Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache HadoopFirst NL-HUG: Large-scale data processing at SARA with Apache Hadoop
First NL-HUG: Large-scale data processing at SARA with Apache Hadoop
 
Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1Sam fineberg big_data_hadoop_storage_options_3v9-1
Sam fineberg big_data_hadoop_storage_options_3v9-1
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
 
WHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USES
WHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USESWHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USES
WHAT IS HADOOP – UNDERSTANDING THE FRAMEWORK, MODULES, ECOSYSTEM, AND USES
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
Introduction to Data Analyst Training
Introduction to Data Analyst TrainingIntroduction to Data Analyst Training
Introduction to Data Analyst Training
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Hadoop-2022.pptx
Hadoop-2022.pptxHadoop-2022.pptx
Hadoop-2022.pptx
 

Último

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Último (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Hadoop Presentation

  • 1. Hadoop Presentation 2012 Presenter : Pham Thai Hoa Email : thaihoabo@gmail.com Web : http://mobion.com/hoa 4/14/2012 Pham Thai Hoa
  • 2. Topic  Introduce to Hadoop  Introduce to Hive  Introduce to Logger  Using Hadoop at Mobion  Warehouse at Mobion  Q&A 4/14/2012 Pham Thai Hoa
  • 3. What is Hadoop  It’s a framework for the distributed processing  Inspired by Google’s architecture: Map Reduce and GFS  A top-level Apache project  Hadoop is the open source  Hadoop have the two important elements + Map – Reduce core + Hadoop Distributed File System 4/14/2012 Pham Thai Hoa
  • 4. Why use Hadoop  Fault-tolerant hardware is expensive  Hadoop is designed to run on cheap commodity hardware  It automatically handles data replication and node failure  It does the hard work – you can focus on processing data  It has the three supported modes : Local, Pseudo-Distributed, Fully- Distributed Mode 4/14/2012 Pham Thai Hoa
  • 5. Data Flow into Hadoop 4/14/2012 Pham Thai Hoa
  • 6. Who use Hadoop  Amazon's product search indices using the streaming API and pre- existing C++, Perl, and Python tools  Yahoo : More than 100,000 CPUs in >40,000 computers running Hadoop  Facebook use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning 4/14/2012 Pham Thai Hoa
  • 7. What is Hive  Hive is a data warehouse system for Hadoop  Using Map-Reduce for execution  Using HDFS for storage  Metadata in an RDBMS  Scalability and performance  Interoperability  Using a SQL-like language called HiveQL 4/14/2012 Pham Thai Hoa
  • 8. Data Flow into Hive 4/14/2012 Pham Thai Hoa
  • 9. Hive Data Model  Tables + Typed columns (int, float, string,…) + Also, array/map/struct for JSON-like data  Partitions + e.g., to range-partition tables by date  Buckets + Hash partitions within ranges (useful for sampling, join optimization) 4/14/2012 Pham Thai Hoa
  • 10. Hive Metastore  Database: namespace containing a set of tables  Holds Table/Partition definitions (column types,mappings to HDFS directories)  Statistics  Implemented with DataNucleus ORM. Runs on Derby, MySQL, and many other relational databases 4/14/2012 Pham Thai Hoa
  • 11. Introduce to Logger  A logging system has three broad components + Client Code Interface + Distribution System + Do Something Usefullizer  Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures 4/14/2012 Pham Thai Hoa
  • 12. Why use Scribe  Scalability and performance  Event Notification library  Thrift framework  Hadoop is optional  Client using  Distributed scribe system  Over 1 million messages per second for logging  Hierarchy stores 4/14/2012 Pham Thai Hoa
  • 13. Warehouse at Mobion  Log Collector  Log/Data Transformer  Data Analyzer  Web Reporter  Log define  Log integrate (into application)  Log/Data analyze  Report develop (API, Mobion, Music …) 4/14/2012 Pham Thai Hoa
  • 14. Warehouse at Mobion  Data mining  Music Recommendation  Spam Detection  Application performance  Export data and import into MySQL for web report  Analytic system 4/14/2012 Pham Thai Hoa
  • 15. Q&A  Why use hadoop ?  Why use Hive ?  Why need a logging system ?  What is the warehouse system architecture ?  Do we use these system for voting, chat, message and feed ??  How can we use them for recommendation, suggestion ? 4/14/2012 Pham Thai Hoa
  • 16. Following Link  http://facebook.com  http://highscalability.com/product- scribe-facebooks-scalable-logging- system  http://hadoop.apache.org/  http://hive.apache.org/  http://wiki.apache.org/hadoop/Powere dBy  http://www.apache.org/foundation/than ks.html 4/14/2012 Pham Thai Hoa
  • 17. THANK YOU 4/14/2012 Pham Thai Hoa