SlideShare uma empresa Scribd logo
1 de 30
Motivation
 Analysis of Data made by both engineering
and non-engineering people.
 The data are growing fast. In 2007, the
volume was 15TB and it grew up to 200TB in
2010.
 Current RDBMS can NOT handle it.
 Current solution are not available, not
scalable, Expensive and Proprietary.




                                             2
Map/Reduce -Apache Hadoop
  MapReduce is a programing model and an
associated implementation introduced by
Goolge in 2004.
  Apache Hadoop is a software framework
inspired by Google's MapReduce.




                                           3
Motivation (cont.)
 Hadoop supports data-intensive distributed
applications.
However...
  –Map-reduce hard to program (users know
  sql/bash/python).
  –No schema.




                                              4
What is HIVE?
A data warehouse infrastructure built on top
of Hadoop for providing data
summarization, query, and analysis.
   –ETL.
   –Structure.
   –Access to different storage.
   –Query execution via MapReduce.
Key Building Principles:
   –SQL is a familiar language
   –Extensibility –
   Types, Functions, Formats, Scripts
   –Performance                                5
Hive Applications
Log  processing
Text mining
Document indexing
Customer-facing business intelligence
(e.g., Google Analytics)
Predictive modeling, hypothesis testing




                                           6
Hive Architecture




                    7
Data Units
Databases.
Tables.
Partitions.
Buckets (or Clusters).




                         8
Type System
Primitive types
 –Integers:TINYINT, SMALLINT, INT, BIGINT.
 –Boolean: BOOLEAN.
 –Floating point numbers: FLOAT, DOUBLE .
 –String: STRING.
Complex types
 –Structs: {a INT; b INT}.
 –Maps: M['group'].
 –Arrays: ['a', 'b', 'c'], A[1] returns 'b'.




                                               9
Physical Layout
Warehouse   directory in HDFS
 – e.g., /user/hive/warehouse
Table  row data stored in subdirectories of
warehouse
Partitions form subdirectories of table
directories
Actual data stored in flat files
 – Control char-delimited text, or SequenceFiles
 – With custom SerDe, can use arbitrary format



                                                   1
                                                   0
HiveQL




         1
         1
Examples – DDL Operations

 CREATE TABLE sample (foo INT, bar STRING)
PARTITIONED BY (ds STRING);

 SHOW TABLES '.*s';

 DESCRIBE sample;

 ALTER TABLE sample ADD COLUMNS (new_col INT);

 DROP TABLE sample;




                                                 1
                                                 2
Examples – DML Operations

 LOAD DATA LOCAL INPATH './sample.txt' OVERWRITE INTO
TABLE sample PARTITION (ds='2012-02-24');


 LOAD DATA INPATH '/user/falvariz/hive/sample.txt'
OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24');




                                                           1
                                                           3
Examples – DML Operations
 LOAD DATA LOCAL INPATH './sample.txt' OVERWRITE INTO
TABLE sample PARTITION (ds='2012-02-24');


 LOAD DATA INPATH '/user/falvariz/hive/sample.txt'
OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24');




                                                           1
                                                           4
SELECTS and FILTERS
SELECT foo FROM sample WHERE ds='2012-02-24';


INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT * FROM
sample WHERE ds='2012-02-24';


 INSERT OVERWRITE LOCAL DIRECTORY '/tmp/hive-sample-out'
SELECT * FROM sample;




                                                           1
                                                           5
Aggregations and Groups
SELECT MAX(foo) FROM sample;

 SELECT ds, COUNT(*), SUM(foo) FROM sample GROUP BY ds;

 FROM sample s INSERT OVERWRITE TABLE bar SELECT
s.bar, count(*) WHERE s.foo > 0 GROUP BY s.bar;




                                                          1
                                                          6
Extension mechanisms




                       1
                       8
Built-in Functions
 Mathematical: round, floor, ceil, rand, exp...

 Collection:
size, map_keys, map_values, array_contains.

 Type Conversion: cast.

 Date: from_unixtime, to_date, year, datediff...

 Conditional: if, case, coalesce.

 String: length, reverse, upper, trim...



                                                   2
                                                   0
Install and Config Hive




                          2
                          2
Installing Hive
From a Release Tarball:
$ wget http://archive.apache.org/dist/
hadoop/hive/hive-0.5.0/hive-0.5.0-
bin.tar.gz
$ tar xvzf hive-0.5.0-bin.tar.gz
$ cd hive-0.5.0-bin
$ export HIVE_HOME=$PWD
$ export PATH=$HIVE_HOME/bin:$PATH




                                         2
                                         3
Hive Dependencies
 Java 1.6
Hadoop >= 0.17-0.20
Hive *MUST* be able to find Hadoop:
 – $HADOOP_HOME=<hadoop-install-dir>
– Add $HADOOP_HOME/bin to $PATH
Hive needs r/w access to /tmp and
/user/hive/warehouse on HDFS:
$   hadoop   fs   –mkdir /tmp
$   hadoop   fs   –mkdir /user/hive/warehouse
$   hadoop   fs   –chmod g+w /tmp
$   hadoop   fs   –chmod g+w /user/hive/warehouse




                                                    2
                                                    4
Hive Configuration
Default
       configuration in
  $HIVE_HOME/conf/hive-default.xml
   – DO NOT TOUCH THIS FILE!
Re(Define) properties in
  $HIVE_HOME/conf/hive-site.xml
 Use $HIVE_CONF_DIR to specify alternate conf dir
location
You can override Hadoop configuration properties in
Hive’s configuration,
   e.g:– mapred.reduce.tasks=1




                                                       2
                                                       5
Hive CLI




           2
           6
Hive CLI Commands
Start  a terminal and run
   $ hive
Should see a prompt like:
    hive>
Set a Hive or Hadoop conf prop:
    – hive> set propkey=value;
 List all properties and values:
    – hive> set –v;
 Add a resource to the DCache:
   – hive> add [ARCHIVE|FILE|JAR] filename;



                                              2
                                              7
Hive CLI Commands
List tables:
  – hive> show tables;
Describe a table:
   – hive> describe <tablename>;
More information:
  – hive> describe extended <tablename>;
List Functions:
   – hive> show functions;
More information:
    – hive> describe function<functionname>;



                                               2
                                               8
Conclusion
 A easy way to process large scale data.
 Support SQL-based queries.
 Provide more user defined interfaces to extend
Programmability.
 Files in HDFS are immutable. Tipically:
   –Log processing: Daily Report, User Activity
   Measurement
   –Data/Text mining: Machine learning (Training Data)
   –Business intelligence: Advertising Delivery,Spam
   Detection




                                                         2
                                                         9
Apache Hive

Mais conteúdo relacionado

Mais procurados

Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoopChirag Ahuja
 
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010ragho
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hiveReza Ameri
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalogmarkgrover
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentationpuneet yadav
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Edureka!
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopSomeshwar Kale
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopGetInData
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 

Mais procurados (20)

Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
 
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache Hive | Big Data Hadoop Spark Tutorial | CloudxLab
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
An intriduction to hive
An intriduction to hiveAn intriduction to hive
An intriduction to hive
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for HadoopLearning Apache HIVE - Data Warehouse and Query Language for Hadoop
Learning Apache HIVE - Data Warehouse and Query Language for Hadoop
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Apache hive
Apache hiveApache hive
Apache hive
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in Hadoop
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 

Destaque

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveTapan Avasthi
 
Spring framework
Spring frameworkSpring framework
Spring frameworkAjit Koti
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Apache Mahout
Apache MahoutApache Mahout
Apache MahoutAjit Koti
 
Hive 입문 발표 자료
Hive 입문 발표 자료Hive 입문 발표 자료
Hive 입문 발표 자료beom kyun choi
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Destaque (17)

Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Spring framework
Spring frameworkSpring framework
Spring framework
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache hive
Apache hiveApache hive
Apache hive
 
2 fitnesse
2 fitnesse2 fitnesse
2 fitnesse
 
Hive 입문 발표 자료
Hive 입문 발표 자료Hive 입문 발표 자료
Hive 입문 발표 자료
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Semelhante a Apache Hive

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at FacebookS S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014rpbrehm
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of HadoopNam Nham
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoopFrank Y
 
SQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopSQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopJan Pieter Posthuma
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahoMartin Ferguson
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryIJRESJOURNAL
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopSvetlin Nakov
 
Fundamental of Big Data with Hadoop and Hive
Fundamental of Big Data with Hadoop and HiveFundamental of Big Data with Hadoop and Hive
Fundamental of Big Data with Hadoop and HiveSharjeel Imtiaz
 
Cosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARECosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWAREFernando Lopez Aguilar
 

Semelhante a Apache Hive (20)

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
מיכאל
מיכאלמיכאל
מיכאל
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014
 
The Family of Hadoop
The Family of HadoopThe Family of Hadoop
The Family of Hadoop
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
SQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopSQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - Hadoop
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentaho
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Map-Reduce and Apache Hadoop
Map-Reduce and Apache HadoopMap-Reduce and Apache Hadoop
Map-Reduce and Apache Hadoop
 
Fundamental of Big Data with Hadoop and Hive
Fundamental of Big Data with Hadoop and HiveFundamental of Big Data with Hadoop and Hive
Fundamental of Big Data with Hadoop and Hive
 
Cosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARECosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARE
 

Último

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Apache Hive

  • 1.
  • 2. Motivation Analysis of Data made by both engineering and non-engineering people. The data are growing fast. In 2007, the volume was 15TB and it grew up to 200TB in 2010. Current RDBMS can NOT handle it. Current solution are not available, not scalable, Expensive and Proprietary. 2
  • 3. Map/Reduce -Apache Hadoop MapReduce is a programing model and an associated implementation introduced by Goolge in 2004. Apache Hadoop is a software framework inspired by Google's MapReduce. 3
  • 4. Motivation (cont.) Hadoop supports data-intensive distributed applications. However... –Map-reduce hard to program (users know sql/bash/python). –No schema. 4
  • 5. What is HIVE? A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. –ETL. –Structure. –Access to different storage. –Query execution via MapReduce. Key Building Principles: –SQL is a familiar language –Extensibility – Types, Functions, Formats, Scripts –Performance 5
  • 6. Hive Applications Log processing Text mining Document indexing Customer-facing business intelligence (e.g., Google Analytics) Predictive modeling, hypothesis testing 6
  • 9. Type System Primitive types –Integers:TINYINT, SMALLINT, INT, BIGINT. –Boolean: BOOLEAN. –Floating point numbers: FLOAT, DOUBLE . –String: STRING. Complex types –Structs: {a INT; b INT}. –Maps: M['group']. –Arrays: ['a', 'b', 'c'], A[1] returns 'b'. 9
  • 10. Physical Layout Warehouse directory in HDFS – e.g., /user/hive/warehouse Table row data stored in subdirectories of warehouse Partitions form subdirectories of table directories Actual data stored in flat files – Control char-delimited text, or SequenceFiles – With custom SerDe, can use arbitrary format 1 0
  • 11. HiveQL 1 1
  • 12. Examples – DDL Operations CREATE TABLE sample (foo INT, bar STRING) PARTITIONED BY (ds STRING); SHOW TABLES '.*s'; DESCRIBE sample; ALTER TABLE sample ADD COLUMNS (new_col INT); DROP TABLE sample; 1 2
  • 13. Examples – DML Operations LOAD DATA LOCAL INPATH './sample.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24'); LOAD DATA INPATH '/user/falvariz/hive/sample.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24'); 1 3
  • 14. Examples – DML Operations LOAD DATA LOCAL INPATH './sample.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24'); LOAD DATA INPATH '/user/falvariz/hive/sample.txt' OVERWRITE INTO TABLE sample PARTITION (ds='2012-02-24'); 1 4
  • 15. SELECTS and FILTERS SELECT foo FROM sample WHERE ds='2012-02-24'; INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT * FROM sample WHERE ds='2012-02-24'; INSERT OVERWRITE LOCAL DIRECTORY '/tmp/hive-sample-out' SELECT * FROM sample; 1 5
  • 16. Aggregations and Groups SELECT MAX(foo) FROM sample; SELECT ds, COUNT(*), SUM(foo) FROM sample GROUP BY ds; FROM sample s INSERT OVERWRITE TABLE bar SELECT s.bar, count(*) WHERE s.foo > 0 GROUP BY s.bar; 1 6
  • 17.
  • 19.
  • 20. Built-in Functions Mathematical: round, floor, ceil, rand, exp... Collection: size, map_keys, map_values, array_contains. Type Conversion: cast. Date: from_unixtime, to_date, year, datediff... Conditional: if, case, coalesce. String: length, reverse, upper, trim... 2 0
  • 21.
  • 22. Install and Config Hive 2 2
  • 23. Installing Hive From a Release Tarball: $ wget http://archive.apache.org/dist/ hadoop/hive/hive-0.5.0/hive-0.5.0- bin.tar.gz $ tar xvzf hive-0.5.0-bin.tar.gz $ cd hive-0.5.0-bin $ export HIVE_HOME=$PWD $ export PATH=$HIVE_HOME/bin:$PATH 2 3
  • 24. Hive Dependencies  Java 1.6 Hadoop >= 0.17-0.20 Hive *MUST* be able to find Hadoop: – $HADOOP_HOME=<hadoop-install-dir> – Add $HADOOP_HOME/bin to $PATH Hive needs r/w access to /tmp and /user/hive/warehouse on HDFS: $ hadoop fs –mkdir /tmp $ hadoop fs –mkdir /user/hive/warehouse $ hadoop fs –chmod g+w /tmp $ hadoop fs –chmod g+w /user/hive/warehouse 2 4
  • 25. Hive Configuration Default configuration in $HIVE_HOME/conf/hive-default.xml – DO NOT TOUCH THIS FILE! Re(Define) properties in $HIVE_HOME/conf/hive-site.xml  Use $HIVE_CONF_DIR to specify alternate conf dir location You can override Hadoop configuration properties in Hive’s configuration, e.g:– mapred.reduce.tasks=1 2 5
  • 26. Hive CLI 2 6
  • 27. Hive CLI Commands Start a terminal and run $ hive Should see a prompt like: hive> Set a Hive or Hadoop conf prop: – hive> set propkey=value;  List all properties and values: – hive> set –v;  Add a resource to the DCache: – hive> add [ARCHIVE|FILE|JAR] filename; 2 7
  • 28. Hive CLI Commands List tables: – hive> show tables; Describe a table: – hive> describe <tablename>; More information: – hive> describe extended <tablename>; List Functions: – hive> show functions; More information: – hive> describe function<functionname>; 2 8
  • 29. Conclusion A easy way to process large scale data. Support SQL-based queries. Provide more user defined interfaces to extend Programmability. Files in HDFS are immutable. Tipically: –Log processing: Daily Report, User Activity Measurement –Data/Text mining: Machine learning (Training Data) –Business intelligence: Advertising Delivery,Spam Detection 2 9