SlideShare uma empresa Scribd logo
1 de 25
The Apache Hadoop HIVE
Omoyayi Ibrahim Omodamilola
Student No.; 20174831
PhD Biomedical Engineering
Outline
• Big Data
• History of Database (NoSQL vs SQL)
• New SQL database
• SQL
• NoSQL
• Factors Affecting the Selection of a database
• Hadoop Hive
– Functions of Hive on Hadoop
– Hive vs Java vs Pig
• Hadoop Distributed File System
• Hive Architecture
• Work Flow of Hive
• List of Reference
INTRODUCTION
• The initiation of The Hadoop Apache Hive began in 2007 by
Facebook due to its data growth.
• This ETL system began to fail over few years as more people
joined Facebook.
• In August 2008, Facebook decided to move to scalable a
more scalable open-source Hadoop environment; Hive
• Facebook, Netflix and Amazons support the Apache Hive
SQL now known as the HiveQL
SQL (left) vs NoSQL (right)
Source: Google Images
NEW STRUCTURED QUERY LANGUAGE
NewSQL
• Relational + NoSQL
• designed for Web-scale applications
• provide many of the traditional SQL
operations
Class of modern relational database management systems that seek to
provide the same scalable performance of NoSQL systems for online
transaction processing (OLTP) read-write workloads while still
maintaining the ACID guarantees of a traditional database system.
RELATIONAL DATABASES SQL
• Structured Query Language (SQL)
• Consists of two or more tables with columns and
row
• Relationship between tables and field types is called
a schema
• (SQL) is a programming language used by database
(MySQL, Sybase, Oracle, or IBM DM2, SQL)
architects to design relational databases.
• These databases are well understood and widely
supported
Popular SQL databases and RDBMS’s
• MySQL—the most popular open-source database
• Oracle—an object-relational DBMS written in the C++ language.
• IMB DB2—a family of database server products from IBM that are
built to handle advanced “big data” analytics.
• Sybase—a relational model database server product for
businesses primarily used on the Unix OS and Linux
• MS SQL Server—a Microsoft-developed RDBMS for enterprise-
level databases that supports both SQL and NoSQL architectures.
• Microsoft Azure—a cloud computing platform that supports any
operating system, and lets you store, compute, and scale data
• MariaDB—an enhanced, drop-in version of MySQL.
• PostgreSQL—an enterprise-level, object-relational DBMS that uses
procedural languages like Perl and Python.
NOSQL DATABASES
• Easy to access
• Greater flexibility
• Documents oriented data
• Massive amounts of data
• Uncleared data requirements
• Data Includes: sensor data, social sharing, personal
settings, photos, location-based information, online
activity, usage metrics, etc
Source: UpWork
POPULAR NOSQL DATABASES
• MongoDB—the most popular NoSQL system
• Apache’s CouchDB—a true DB for the web, it uses the
JSON data exchange format to store its documents
• HBase—another Apache project, developed as a part of
Hadoop, this open-source, non-relational “column
store”
• Oracle NoSQL—Oracle’s entry into the NoSQL category.
• Apache’s Cassandra DB—born at Facebook, handling
massive amounts of structured data. Examples:
Instagram, Comcast, Apple, and Spotify (growing app).
• Riak—It has fault-tolerance replication and automatic
data distribution built in for excellent performance.
SQL
Pros Cons
Relational databases work with structured data. Relational Databases do not scale out
horizontally very well (concurrency and data
size), only vertically.
They support ACID (Atomicity, Consistency,
Isolation, Durability) transactional consistency
and support.
Data is normalized, meaning lots of joins, which
affects speed.
They come with built-in data integrity and a
large eco-system.
Data is normalized, meaning lots of joins, which
affects speed.
Relationships in this system have constraints. They have problems working with semi-
structured data.
There is limitless indexing. Strong SQL
NoSQL
Pros Cons
They scale out horizontally and work with
unstructured and semi-structured data.
Data is deformalized, requiring mass updates
(i.e. product name change).
Some support ACID transactional
consistency.
Weaker or eventual consistency instead of
ACID
Schema-free or Schema-on-read options. Does not have built-in data integrity (must
do in code)
High availability of language training, setup,
and developments cost
Limited support
Databases are open source and so “free” Does not have built-in data integrity (must
do in code)
Numerous commercial products available.
Hadoop
• Facebook, Google, Yahoo, Amazon, and Microsoft
• Exponential growth of data
• Doug Cutting developed an open source version of
MapReduce system called Hadoop
• Hadoop is a software ecosystem that allows for
massively parallel computing
• Large data procedure which might takes 20 hours of
processing time on relational database may only
take 3 minutes with Hadoop
• Hive looks like old SQL - HQL
Hadoop clusters on Client computers
Hive is not
• A relational database
• A design for OnLine Transaction Processing
OLTP
• A language for real-time queries and row-level
updates
FUCTIONS OF HIVE ON HADOOP
• Data Warehouse system built on top of Hadoop
• Takes advantages of Hadoop processing power
• Facilitates data summarization, ad-hoc queries,
analysis of large datasets stored in Hadoop
• Provides a SQL interface (known as Hive QL – HQL)
which is widely familiar to most programmers
• Saves times using Hadoop MapReduce programmes
• Provides mechanism to project structure onto
Hadoop datasets
• Loads fast and allow flexibility at the cost of query
time
Apaches framework
• Sqoop: It is used to import and export data to
and from between HDFS and RDBMS.
• Pig: It is a procedural language platform used
to develop a script for MapReduce operations.
• Hive: It is a platform used to develop SQL type
scripts to do MapReduce operations
Hive vs Java and Pig
Java Pig
• Word Count MapReduce
example: List words and
number of occurrences in a
document
Java takes 63 lines of java codes
to write this hive only takes 7
easy lines of code.
• High level programming
language
• Good for ETL
• Powerful transformation
capabilities
• Often used in combination with
HIVE.
Hive Architecture
HIVE DIRECTORY STRUCTURE
• Lib directory
– SHIVE_HOME/lib
– Location of the Hive JAR files
– Contain the actual Java code that implement the Hive
functionality
• Bin directory
– SHIVE_HOME/bin
– Location of Hive Scripts/Services
• Conf directory
– HIVE_HOME/conf
– Location of configuration files
Summary & Conclusion
• Hive is a data warehouse infrastructure tool to process
structured data in Hadoop.
• It resides on top Hadoop to summarize Big Data, and
makes querying and analyzing easy.
• Initially Hive was developed by Facebook, later the
Apache Software Foundation took it up and
• Developed it further as an open source under the
name Apache Hive.
• It is used by different companies. For example,
Amazon uses it in Amazon Elastic MapReduce.
REFERENCES
• http://www.dataversity.net/review-pros-cons-
different-databases-relational-versus-non-
relational/
• https://segment.com/blog/choosing-a-
database-for-analytics/
• https://www.upwork.com/hiring/data/sql-vs-
nosql-databases-whats-the-difference/
DON’T THANK ME THANK HIVE

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Apache hive
Apache hiveApache hive
Apache hive
 
Hive
HiveHive
Hive
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Hbase
HbaseHbase
Hbase
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
 
Unit 5-apache hive
Unit 5-apache hiveUnit 5-apache hive
Unit 5-apache hive
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 

Semelhante a Apache Hadoop Hive

Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxiaeronlineexm
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 

Semelhante a Apache Hadoop Hive (20)

Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Apache drill
Apache drillApache drill
Apache drill
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
hive.pptx
hive.pptxhive.pptx
hive.pptx
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 

Mais de Some corner at the Laboratory (10)

CRISPR in Cancer Biology and Therapy.pptx
CRISPR in Cancer Biology and Therapy.pptxCRISPR in Cancer Biology and Therapy.pptx
CRISPR in Cancer Biology and Therapy.pptx
 
Wet Granulation Process Optimization 1.0.pptx
Wet Granulation Process Optimization 1.0.pptxWet Granulation Process Optimization 1.0.pptx
Wet Granulation Process Optimization 1.0.pptx
 
Smart “Anti-Bacterial” Silk-Silver Nanoparticles Hydrogel Biosynthesis
Smart “Anti-Bacterial” Silk-Silver Nanoparticles Hydrogel BiosynthesisSmart “Anti-Bacterial” Silk-Silver Nanoparticles Hydrogel Biosynthesis
Smart “Anti-Bacterial” Silk-Silver Nanoparticles Hydrogel Biosynthesis
 
microRNA “miRNA”mi RNA
microRNA “miRNA”mi RNAmicroRNA “miRNA”mi RNA
microRNA “miRNA”mi RNA
 
Tissue regeneration of the liver
Tissue regeneration of the liverTissue regeneration of the liver
Tissue regeneration of the liver
 
Omoyayi ibrahim
Omoyayi ibrahimOmoyayi ibrahim
Omoyayi ibrahim
 
Hydrogel Nanocomposties: THE BIOMEDICAL APPLICATION
Hydrogel Nanocomposties: THE BIOMEDICAL APPLICATIONHydrogel Nanocomposties: THE BIOMEDICAL APPLICATION
Hydrogel Nanocomposties: THE BIOMEDICAL APPLICATION
 
The Skeletal & Muscular Systems ;
The  Skeletal & Muscular Systems ; The  Skeletal & Muscular Systems ;
The Skeletal & Muscular Systems ;
 
Biomaterials presentation
Biomaterials presentationBiomaterials presentation
Biomaterials presentation
 
The social value of pollution prevention
The social value of pollution preventionThe social value of pollution prevention
The social value of pollution prevention
 

Último

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Último (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Apache Hadoop Hive

  • 1. The Apache Hadoop HIVE Omoyayi Ibrahim Omodamilola Student No.; 20174831 PhD Biomedical Engineering
  • 2. Outline • Big Data • History of Database (NoSQL vs SQL) • New SQL database • SQL • NoSQL • Factors Affecting the Selection of a database • Hadoop Hive – Functions of Hive on Hadoop – Hive vs Java vs Pig • Hadoop Distributed File System • Hive Architecture • Work Flow of Hive • List of Reference
  • 3.
  • 4. INTRODUCTION • The initiation of The Hadoop Apache Hive began in 2007 by Facebook due to its data growth. • This ETL system began to fail over few years as more people joined Facebook. • In August 2008, Facebook decided to move to scalable a more scalable open-source Hadoop environment; Hive • Facebook, Netflix and Amazons support the Apache Hive SQL now known as the HiveQL
  • 5. SQL (left) vs NoSQL (right) Source: Google Images
  • 6. NEW STRUCTURED QUERY LANGUAGE NewSQL • Relational + NoSQL • designed for Web-scale applications • provide many of the traditional SQL operations Class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system.
  • 7. RELATIONAL DATABASES SQL • Structured Query Language (SQL) • Consists of two or more tables with columns and row • Relationship between tables and field types is called a schema • (SQL) is a programming language used by database (MySQL, Sybase, Oracle, or IBM DM2, SQL) architects to design relational databases. • These databases are well understood and widely supported
  • 8. Popular SQL databases and RDBMS’s • MySQL—the most popular open-source database • Oracle—an object-relational DBMS written in the C++ language. • IMB DB2—a family of database server products from IBM that are built to handle advanced “big data” analytics. • Sybase—a relational model database server product for businesses primarily used on the Unix OS and Linux • MS SQL Server—a Microsoft-developed RDBMS for enterprise- level databases that supports both SQL and NoSQL architectures. • Microsoft Azure—a cloud computing platform that supports any operating system, and lets you store, compute, and scale data • MariaDB—an enhanced, drop-in version of MySQL. • PostgreSQL—an enterprise-level, object-relational DBMS that uses procedural languages like Perl and Python.
  • 9. NOSQL DATABASES • Easy to access • Greater flexibility • Documents oriented data • Massive amounts of data • Uncleared data requirements • Data Includes: sensor data, social sharing, personal settings, photos, location-based information, online activity, usage metrics, etc
  • 11. POPULAR NOSQL DATABASES • MongoDB—the most popular NoSQL system • Apache’s CouchDB—a true DB for the web, it uses the JSON data exchange format to store its documents • HBase—another Apache project, developed as a part of Hadoop, this open-source, non-relational “column store” • Oracle NoSQL—Oracle’s entry into the NoSQL category. • Apache’s Cassandra DB—born at Facebook, handling massive amounts of structured data. Examples: Instagram, Comcast, Apple, and Spotify (growing app). • Riak—It has fault-tolerance replication and automatic data distribution built in for excellent performance.
  • 12.
  • 13. SQL Pros Cons Relational databases work with structured data. Relational Databases do not scale out horizontally very well (concurrency and data size), only vertically. They support ACID (Atomicity, Consistency, Isolation, Durability) transactional consistency and support. Data is normalized, meaning lots of joins, which affects speed. They come with built-in data integrity and a large eco-system. Data is normalized, meaning lots of joins, which affects speed. Relationships in this system have constraints. They have problems working with semi- structured data. There is limitless indexing. Strong SQL
  • 14. NoSQL Pros Cons They scale out horizontally and work with unstructured and semi-structured data. Data is deformalized, requiring mass updates (i.e. product name change). Some support ACID transactional consistency. Weaker or eventual consistency instead of ACID Schema-free or Schema-on-read options. Does not have built-in data integrity (must do in code) High availability of language training, setup, and developments cost Limited support Databases are open source and so “free” Does not have built-in data integrity (must do in code) Numerous commercial products available.
  • 15. Hadoop • Facebook, Google, Yahoo, Amazon, and Microsoft • Exponential growth of data • Doug Cutting developed an open source version of MapReduce system called Hadoop • Hadoop is a software ecosystem that allows for massively parallel computing • Large data procedure which might takes 20 hours of processing time on relational database may only take 3 minutes with Hadoop • Hive looks like old SQL - HQL
  • 16. Hadoop clusters on Client computers
  • 17. Hive is not • A relational database • A design for OnLine Transaction Processing OLTP • A language for real-time queries and row-level updates
  • 18. FUCTIONS OF HIVE ON HADOOP • Data Warehouse system built on top of Hadoop • Takes advantages of Hadoop processing power • Facilitates data summarization, ad-hoc queries, analysis of large datasets stored in Hadoop • Provides a SQL interface (known as Hive QL – HQL) which is widely familiar to most programmers • Saves times using Hadoop MapReduce programmes • Provides mechanism to project structure onto Hadoop datasets • Loads fast and allow flexibility at the cost of query time
  • 19. Apaches framework • Sqoop: It is used to import and export data to and from between HDFS and RDBMS. • Pig: It is a procedural language platform used to develop a script for MapReduce operations. • Hive: It is a platform used to develop SQL type scripts to do MapReduce operations
  • 20. Hive vs Java and Pig Java Pig • Word Count MapReduce example: List words and number of occurrences in a document Java takes 63 lines of java codes to write this hive only takes 7 easy lines of code. • High level programming language • Good for ETL • Powerful transformation capabilities • Often used in combination with HIVE.
  • 22. HIVE DIRECTORY STRUCTURE • Lib directory – SHIVE_HOME/lib – Location of the Hive JAR files – Contain the actual Java code that implement the Hive functionality • Bin directory – SHIVE_HOME/bin – Location of Hive Scripts/Services • Conf directory – HIVE_HOME/conf – Location of configuration files
  • 23. Summary & Conclusion • Hive is a data warehouse infrastructure tool to process structured data in Hadoop. • It resides on top Hadoop to summarize Big Data, and makes querying and analyzing easy. • Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and • Developed it further as an open source under the name Apache Hive. • It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
  • 25. DON’T THANK ME THANK HIVE