SlideShare uma empresa Scribd logo
1 de 68
©2012, Cognizant
Data Warehouse and Query Language for Hadoop
August 2013
By Someshwar Kale
| ©2012, Cognizant2
HIVE
 Data Warehousing Solution built on top of Hadoop
 Provides SQL-like query language named HiveQL
– Minimal learning curve for people with SQL expertise
– Data analysts are target audience
 Early Hive development work started at Facebook in 2007
Today, Facebook counts 29% of its employees (and growing!)
as Hive users.
https://www.facebook.com/note.php?note_id=114588058858
 Today Hive is an Apache project under Hadoop
– http://hive.apache.org
| 2012 Cognizant Technology Solutions
Hive Provides
3
• Ability to bring structure to various data Formats
• Simple interface for ad hoc querying,analyzing and
summarizing large amounts of data
• Access to files on various data stores such
as HDFS and HBase
| ©2012, Cognizant4
Hive
 Hive does NOT provide low latency or realtime queries.
 Even querying small amounts of data may take minutes.
 Designed for scalability and ease-of-use rather than low latency
responses
| ©2012, Cognizant5
Hive
 Translates HiveQL statements into a set of MapReduce Jobs
which are then executed on a Hadoop Cluster.
| ©2012, Cognizant6
Hive Metastore
 To support features like schema(s) and data partitioning Hive
keeps its metadata in a Relational Database
 Packaged with Derby, a lightweight embedded SQL DB
 Default Derby based is good for evaluation an testing
 Schema is not shared between users as each user has their own
instance of embedded Derby Stored in metastore_db directory
which resides in the directory that hive was started from
• Can easily switch another SQL installation such as MySQL
| ©2012, Cognizant7
Metastore Deployment Modes : Embedded Mode
 Default metastore deployment mode for CDH.
 Both the database and the metastore service run embedded in
the main HiveServer process
 Both are started for you when you start the HiveServer process.
 Support only one active user at a time and is not certified for
production use.
| ©2012, Cognizant8
Metastore Deployment Modes : Local Mode
 Hive metastore service runs
in the same process as the
main HiveServer process.
 The metastore database runs
in a separate process, and
can be on a separate host.
 The embedded metastore
service communicates with
the metastore database over
JDBC.
| ©2012, Cognizant9
Metastore Deployment Modes : Remote Mode
| ©2012, Cognizant10
Hive Architecture
| ©2012, Cognizant11
Hive Interface Options
Command Line Interface (CLI)
– Will use exclusively in these slides
• Hive Web Interface
https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface
• Java Database Connectivity (JDBC)
– https://cwiki.apache.org/confluence/display/Hive/HiveClient
BEELINE for Hivesrver2 (new in CDH4)
- http://sqlline.sourceforge.net/#manual
| ©2012, Cognizant12
Data Types
[cts318692@aster4 ~]$ hive
Logging initialized using configuration in
jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.2.1.jar!/hive-
log4j.properties
Hive history
file=/tmp/cts318692/hive_job_log_cts318692_201308071622_200
5272769.txt
hive>
Launch Hive Command Line Interface
(CLI)
Location of the session’s log file
hive> !cat data/user-posts.txt;
user1,Funny Story,1343182026191
user2,Cool Deal,1343182133839
user4,Interesting Post,1343182154633
user5,Yet Another Blog,13431839394
hive>
Can execute local commands
within CLI, place a command
in between ! and ;
| ©2012, Cognizant13
Data Types
Numeric Types
TINYINT
SMALLINT
INT
BIGINT
FLOAT
DOUBLE
DECIMAL (Note: Only available starting with Hive 0.11.0)
Date/Time Types
TIMESTAMP (Note: Only available starting with
Hive 0.8.0)
DATE (Note: Only available starting with Hive 0.12.0)
Misc Types
BOOLEAN
STRING
BINARY (Note: Only available starting with Hive 0.8.0)
| ©2012, Cognizant14
Complex Data Types
| ©2012, Cognizant15
Check physical storage of hive
[cts318692@aster4 ~]$ hive -S -e "set" | grep warehouse
hive.metastore.warehouse.dir=/user/hive/warehouse
hive.warehouse.subdir.inherit.perms=true
This is the location where hive stores
its data.
| ©2012, Cognizant16
Creating DataBase
hive> CREATE DATABASE IF NOT EXISTS som COMMENT 'my
database'
> LOCATION '/user/cts318692/someshwar/hivestore/'
> WITH DBPROPERTIES ('creator'='someshwar
kale','date'='2013-06-08');
OK
Time taken: 0.046 seconds
Used to suppress
warnings
Database name,
Hive opens default database when u open a
new session
You can override ‘/usr/hive/warehouse’
default location for the new directory
Table propertiesPhysical storage for som
database
| ©2012, Cognizant17
Exploring Data
STRUCT<street:STRING,
city:STRING,
state:STRING,
zip:INT>
For complex data types map,
arrays,structures
field
| ©2012, Cognizant18
Creating Table
For complex data types map,
arrays,structures
For map key and value eg. ‘key’
^C ’value’ (003=ctrlC=^C)
Column seperator Definition
| ©2012, Cognizant19
hive> DESCRIBE FORMATTED som.employees;
| ©2012, Cognizant20
Creating External Table
| ©2012, Cognizant21
Create ..like
 If you omit the EXTERNAL keyword and the original table is
external, the new table will also be external.
 If you omit EXTERNAL and the original table is managed,
the new table will also be managed. However, if you include
the EXTERNAL keyword and the original table is managed,
the new table will be external. Even in this scenario, the
LOCATION clause will still be optional.
| ©2012, Cognizant22
Select Clause
| ©2012, Cognizant23
Describe External Table
| ©2012, Cognizant
Dropping DataBase and Table
By default, Hive won’t permit
you to drop a database if it
contains tables. You can either
drop the tables first or append
the CASCADE keyword to the
command, which will cause
the Hive to drop the tables in the
database first.
| ©2012, Cognizant
Partitions
 To increase performance Hive has the capability to partition data
– The values of partitioned column divide a table into
segments
– Entire partitions can be ignored at query time
– Similar to relational databases’ indexes but not as
Granular
 Partitions have to be properly crated by users
– When inserting data must specify a partition
 At query time, whenever appropriate, Hive will automatically filter
out partitions
| ©2012, Cognizant
Creating Partitioned Table
Partition table based on
the value of a country
and state
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Loading data to table
LOAD DATA LOCAL ... copies the local data to the final location in the
distributed filesystem, while LOAD DATA ... (i.e., without LOCAL) moves
the data to the final location.
Necessary if table to which we are loading
the data is partitioned. This is known as
Static partitioning as we are providing the
partition value in the query
Partitions are physically stored under
separate directories
| ©2012, Cognizant
Schema Violations
hive> LOAD DATA LOCAL INPATH
> 'data/user-posts-inconsistentFormat.txt'
> OVERWRITE INTO TABLE posts;
OK
Time taken: 0.612 seconds
hive> select * from posts;
OK
user1 Funny Story 1343182026191
user2 Cool Deal NULL
user4 Interesting Post 1343182154633
user5 Yet Another Blog 13431839394
Time taken: 0.136 seconds
null is set for any value that
violates pre-defined schema
| ©2012, Cognizant
External Partitioned Tables
| ©2012, Cognizant
Cntd…
There is no difference in syntax
• When partitioned column is specified in the
where clause entire directories/partitions could
be ignored
| ©2012, Cognizant
Bucketing
• Break data into a set of buckets based on a hash
function of a "bucket column"
– Capability to execute queries on a sub-set of random data
• Doesn’t automatically enforce bucketing
– User is required to specify the number of buckets by setting hash of
Reducer
hive> mapred.reduce.tasks = 256;
OR
hive> hive.enforce.bucketing = true;
Either manually set the hash
of
reducers to be the number of
buckets or you can use
‘hive.enforce.bucketing’ which
will set it on your behalf.
| ©2012, Cognizant
Create and Use Table with Buckets
| ©2012, Cognizant
ALTER TABLE
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Cntd…
Partition columns
are not deleted
| ©2012, Cognizant
Inserting Data into Tables from Queries
| ©2012, Cognizant
Dynamic Partition Inserts
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Exporting Data
| ©2012, Cognizant
Functions
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Cntd…
| ©2012, Cognizant
Table generating functions
Return 0 to many rows, one row for
each element from
the input array
| ©2012, Cognizant
Table generating functions
Only a single expression in the
SELECT clause is supported with
UDTF's'.
| ©2012, Cognizant
LIMIT clause
| ©2012, Cognizant
CASE … WHEN … THEN Statements
| ©2012, Cognizant
Where and Group by .. having clause
| ©2012, Cognizant
Joins
| ©2012, Cognizant
Outer Join
| ©2012, Cognizant
Points to remember
 Only equality joins are allowed.
 More than 2 tables can be joined in the same query e.g.
SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1)
JOIN c ON (c.key = b.key2)
is a valid join.
 A single map/reduce job if for every table the same column is used in
the join clause -
ON (a.key = b.key1) JOIN c ON (c.key = b.key1)
 ON (a.key = b.key1) JOIN c ON (c.key = b.key2)
is converted into two map/reduce jobs because key1 column from b
is used in the first join condition and key2 column from b is used in the
second one.
| ©2012, Cognizant
ORDER BY and SORT BY
 ORDER BY uses single reducer to sort the data, which may take
an unacceptably long time to execute for larger data sets.
 Hive adds an alternative, SORT BY, that orders the data only
within each reducer, thereby performing a local ordering, where
each reducer’s output will be sorted.
| ©2012, Cognizant
Casting
 If a salary value was not a valid string for a floating-
point number? In this case, Hive returns NULL.
| ©2012, Cognizant
UNION ALL and Nested select
 Each subquery of the union query must produce the
same number of columns, and for each column, its
type must match all the column types in the same
position.
| ©2012, Cognizant
View
• similar to writing a
function in a
programming
language.
• Views are virtual.
| ©2012, Cognizant
Lateral view
 Lateral view is used in conjunction with user-defined table
generating functions such as explode().
 A lateral view first applies the UDTF to each row of base table and
then joins resulting output rows to the input rows to form a virtual
table having the supplied table alias.
 Syntax-
1. LATERAL VIEW udtf(expression) tableAlias AS columnAlias
| ©2012, Cognizant
Lateral view Example
| ©2012, Cognizant
UDF
| ©2012, Cognizant
UDF
 Hive actually uses reflection to find methods whose names are
evaluate and matches the arguments used in the HiveQL function
call.
 Hive can work with both the Hadoop Writables and the Java
primitives, but it’s recommended to work with the Writables since
they can be reused.
 Input arguments type and return type must be same.
| ©2012, Cognizant
UDF
| ©2012, Cognizant
UDF vs. GenericUDF
| ©2012, Cognizant
between operator
hive> select name,salary from employees2 where salary between
80000 and 100000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
....
OK
John Doe 100000.0
John Doe 100000.0
Mary Smith 80000.0
Mary Smith 80000.0
Time taken: 14.39 seconds
 Both values (lower and upper) are inclusive.
| ©2012, Cognizant
HiveServer2
 As of CDH4.1, you can deploy HiveServer2, an improved version of
HiveServer that supports a new Thrift API tailored for JDBC and
ODBC clients, Kerberos authentication, and multi-client concurrency.
 There is also a new CLI for HiveServer2 named BeeLine.
 HiveServer2
 Connection URL ===== jdbc:hive2://<host>:<port>
 Driver Class =========== org.apache.hive.jdbc.HiveDriver
 HiveServer1
 Connection URL ===== jdbc:hive://<host>:<port>
 Driver Class ========org.apache.hadoop.hive.jdbc.HiveDriver
| ©2012, Cognizant
BEELINE
$ /usr/lib/hive/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000 username password
org.apache.hive.jdbc.HiveDriver
0: jdbc:hive2://localhost:10000>
| ©2012, Cognizant
Connecting database using properties file
| ©2012, Cognizant
References
Hive
Edward Capriolo (Author), Dean Wampler
(Author), Jason
Rutherglen (Author)
O'Reilly Media; 1 edition (October 3, 2012)
Chapter About Hive
Hadoop in Action
Chuck Lam (Author)
Manning Publications; 1st Edition (December,
2010)
| ©2011, Cognizant68
Thank You

Mais conteúdo relacionado

Mais procurados

Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
Takrim Ul Islam Laskar
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
ragho
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
Hortonworks
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
Praveen Sripati
 

Mais procurados (20)

Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Hive ICDE 2010
Hive ICDE 2010Hive ICDE 2010
Hive ICDE 2010
 
Hive Hadoop
Hive HadoopHive Hadoop
Hive Hadoop
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Hive Evolution: ApacheCon NA 2010
Hive Evolution:  ApacheCon NA 2010Hive Evolution:  ApacheCon NA 2010
Hive Evolution: ApacheCon NA 2010
 
Introduction to Hive
Introduction to HiveIntroduction to Hive
Introduction to Hive
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Introduction to Hive and HCatalog
Introduction to Hive and HCatalogIntroduction to Hive and HCatalog
Introduction to Hive and HCatalog
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Where does hadoop come handy
Where does hadoop come handyWhere does hadoop come handy
Where does hadoop come handy
 
NoSQL Needs SomeSQL
NoSQL Needs SomeSQLNoSQL Needs SomeSQL
NoSQL Needs SomeSQL
 
Hive and HiveQL - Module6
Hive and HiveQL - Module6Hive and HiveQL - Module6
Hive and HiveQL - Module6
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 

Destaque

New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
DataWorks Summit
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 

Destaque (20)

Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2New Data Transfer Tools for Hadoop: Sqoop 2
New Data Transfer Tools for Hadoop: Sqoop 2
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
 
Habits of Effective Sqoop Users
Habits of Effective Sqoop UsersHabits of Effective Sqoop Users
Habits of Effective Sqoop Users
 
Ian West VP Analytics & Information Cognizant
Ian West VP Analytics & Information CognizantIan West VP Analytics & Information Cognizant
Ian West VP Analytics & Information Cognizant
 
Hive - SerDe and LazySerde
Hive - SerDe and LazySerdeHive - SerDe and LazySerde
Hive - SerDe and LazySerde
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive
 
Big data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and SqoopBig data components - Introduction to Flume, Pig and Sqoop
Big data components - Introduction to Flume, Pig and Sqoop
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
2015-11-24-cognizant-digital-factory
2015-11-24-cognizant-digital-factory2015-11-24-cognizant-digital-factory
2015-11-24-cognizant-digital-factory
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Hadoop Summit 2009 Hive
Hadoop Summit 2009 HiveHadoop Summit 2009 Hive
Hadoop Summit 2009 Hive
 
Hive Object Model
Hive Object ModelHive Object Model
Hive Object Model
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Cost-based query optimization in Apache Hive
Cost-based query optimization in Apache HiveCost-based query optimization in Apache Hive
Cost-based query optimization in Apache Hive
 

Semelhante a Learning Apache HIVE - Data Warehouse and Query Language for Hadoop

Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-features
Navneet Upneja
 
ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-Features
Navneet Upneja
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit
 

Semelhante a Learning Apache HIVE - Data Warehouse and Query Language for Hadoop (20)

Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Midwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL FeaturesMidwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL Features
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL IBM THINK 2019 - Self-Service Cloud Data Management with SQL
IBM THINK 2019 - Self-Service Cloud Data Management with SQL
 
CDS Views.pptx
CDS Views.pptxCDS Views.pptx
CDS Views.pptx
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-features
 
New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13
 
Build a Big Data solution using DB2 for z/OS
Build a Big Data solution using DB2 for z/OSBuild a Big Data solution using DB2 for z/OS
Build a Big Data solution using DB2 for z/OS
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
JethroData technical white paper
JethroData technical white paperJethroData technical white paper
JethroData technical white paper
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!
 
Hive
HiveHive
Hive
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdf
 
ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-Features
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 

Último

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 

Learning Apache HIVE - Data Warehouse and Query Language for Hadoop

  • 1. ©2012, Cognizant Data Warehouse and Query Language for Hadoop August 2013 By Someshwar Kale
  • 2. | ©2012, Cognizant2 HIVE  Data Warehousing Solution built on top of Hadoop  Provides SQL-like query language named HiveQL – Minimal learning curve for people with SQL expertise – Data analysts are target audience  Early Hive development work started at Facebook in 2007 Today, Facebook counts 29% of its employees (and growing!) as Hive users. https://www.facebook.com/note.php?note_id=114588058858  Today Hive is an Apache project under Hadoop – http://hive.apache.org
  • 3. | 2012 Cognizant Technology Solutions Hive Provides 3 • Ability to bring structure to various data Formats • Simple interface for ad hoc querying,analyzing and summarizing large amounts of data • Access to files on various data stores such as HDFS and HBase
  • 4. | ©2012, Cognizant4 Hive  Hive does NOT provide low latency or realtime queries.  Even querying small amounts of data may take minutes.  Designed for scalability and ease-of-use rather than low latency responses
  • 5. | ©2012, Cognizant5 Hive  Translates HiveQL statements into a set of MapReduce Jobs which are then executed on a Hadoop Cluster.
  • 6. | ©2012, Cognizant6 Hive Metastore  To support features like schema(s) and data partitioning Hive keeps its metadata in a Relational Database  Packaged with Derby, a lightweight embedded SQL DB  Default Derby based is good for evaluation an testing  Schema is not shared between users as each user has their own instance of embedded Derby Stored in metastore_db directory which resides in the directory that hive was started from • Can easily switch another SQL installation such as MySQL
  • 7. | ©2012, Cognizant7 Metastore Deployment Modes : Embedded Mode  Default metastore deployment mode for CDH.  Both the database and the metastore service run embedded in the main HiveServer process  Both are started for you when you start the HiveServer process.  Support only one active user at a time and is not certified for production use.
  • 8. | ©2012, Cognizant8 Metastore Deployment Modes : Local Mode  Hive metastore service runs in the same process as the main HiveServer process.  The metastore database runs in a separate process, and can be on a separate host.  The embedded metastore service communicates with the metastore database over JDBC.
  • 9. | ©2012, Cognizant9 Metastore Deployment Modes : Remote Mode
  • 11. | ©2012, Cognizant11 Hive Interface Options Command Line Interface (CLI) – Will use exclusively in these slides • Hive Web Interface https://cwiki.apache.org/confluence/display/Hive/HiveWebInterface • Java Database Connectivity (JDBC) – https://cwiki.apache.org/confluence/display/Hive/HiveClient BEELINE for Hivesrver2 (new in CDH4) - http://sqlline.sourceforge.net/#manual
  • 12. | ©2012, Cognizant12 Data Types [cts318692@aster4 ~]$ hive Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.10.0-cdh4.2.1.jar!/hive- log4j.properties Hive history file=/tmp/cts318692/hive_job_log_cts318692_201308071622_200 5272769.txt hive> Launch Hive Command Line Interface (CLI) Location of the session’s log file hive> !cat data/user-posts.txt; user1,Funny Story,1343182026191 user2,Cool Deal,1343182133839 user4,Interesting Post,1343182154633 user5,Yet Another Blog,13431839394 hive> Can execute local commands within CLI, place a command in between ! and ;
  • 13. | ©2012, Cognizant13 Data Types Numeric Types TINYINT SMALLINT INT BIGINT FLOAT DOUBLE DECIMAL (Note: Only available starting with Hive 0.11.0) Date/Time Types TIMESTAMP (Note: Only available starting with Hive 0.8.0) DATE (Note: Only available starting with Hive 0.12.0) Misc Types BOOLEAN STRING BINARY (Note: Only available starting with Hive 0.8.0)
  • 15. | ©2012, Cognizant15 Check physical storage of hive [cts318692@aster4 ~]$ hive -S -e "set" | grep warehouse hive.metastore.warehouse.dir=/user/hive/warehouse hive.warehouse.subdir.inherit.perms=true This is the location where hive stores its data.
  • 16. | ©2012, Cognizant16 Creating DataBase hive> CREATE DATABASE IF NOT EXISTS som COMMENT 'my database' > LOCATION '/user/cts318692/someshwar/hivestore/' > WITH DBPROPERTIES ('creator'='someshwar kale','date'='2013-06-08'); OK Time taken: 0.046 seconds Used to suppress warnings Database name, Hive opens default database when u open a new session You can override ‘/usr/hive/warehouse’ default location for the new directory Table propertiesPhysical storage for som database
  • 17. | ©2012, Cognizant17 Exploring Data STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> For complex data types map, arrays,structures field
  • 18. | ©2012, Cognizant18 Creating Table For complex data types map, arrays,structures For map key and value eg. ‘key’ ^C ’value’ (003=ctrlC=^C) Column seperator Definition
  • 19. | ©2012, Cognizant19 hive> DESCRIBE FORMATTED som.employees;
  • 21. | ©2012, Cognizant21 Create ..like  If you omit the EXTERNAL keyword and the original table is external, the new table will also be external.  If you omit EXTERNAL and the original table is managed, the new table will also be managed. However, if you include the EXTERNAL keyword and the original table is managed, the new table will be external. Even in this scenario, the LOCATION clause will still be optional.
  • 24. | ©2012, Cognizant Dropping DataBase and Table By default, Hive won’t permit you to drop a database if it contains tables. You can either drop the tables first or append the CASCADE keyword to the command, which will cause the Hive to drop the tables in the database first.
  • 25. | ©2012, Cognizant Partitions  To increase performance Hive has the capability to partition data – The values of partitioned column divide a table into segments – Entire partitions can be ignored at query time – Similar to relational databases’ indexes but not as Granular  Partitions have to be properly crated by users – When inserting data must specify a partition  At query time, whenever appropriate, Hive will automatically filter out partitions
  • 26. | ©2012, Cognizant Creating Partitioned Table Partition table based on the value of a country and state
  • 28. | ©2012, Cognizant Loading data to table LOAD DATA LOCAL ... copies the local data to the final location in the distributed filesystem, while LOAD DATA ... (i.e., without LOCAL) moves the data to the final location. Necessary if table to which we are loading the data is partitioned. This is known as Static partitioning as we are providing the partition value in the query Partitions are physically stored under separate directories
  • 29. | ©2012, Cognizant Schema Violations hive> LOAD DATA LOCAL INPATH > 'data/user-posts-inconsistentFormat.txt' > OVERWRITE INTO TABLE posts; OK Time taken: 0.612 seconds hive> select * from posts; OK user1 Funny Story 1343182026191 user2 Cool Deal NULL user4 Interesting Post 1343182154633 user5 Yet Another Blog 13431839394 Time taken: 0.136 seconds null is set for any value that violates pre-defined schema
  • 30. | ©2012, Cognizant External Partitioned Tables
  • 31. | ©2012, Cognizant Cntd… There is no difference in syntax • When partitioned column is specified in the where clause entire directories/partitions could be ignored
  • 32. | ©2012, Cognizant Bucketing • Break data into a set of buckets based on a hash function of a "bucket column" – Capability to execute queries on a sub-set of random data • Doesn’t automatically enforce bucketing – User is required to specify the number of buckets by setting hash of Reducer hive> mapred.reduce.tasks = 256; OR hive> hive.enforce.bucketing = true; Either manually set the hash of reducers to be the number of buckets or you can use ‘hive.enforce.bucketing’ which will set it on your behalf.
  • 33. | ©2012, Cognizant Create and Use Table with Buckets
  • 37. | ©2012, Cognizant Cntd… Partition columns are not deleted
  • 38. | ©2012, Cognizant Inserting Data into Tables from Queries
  • 39. | ©2012, Cognizant Dynamic Partition Inserts
  • 45. | ©2012, Cognizant Table generating functions Return 0 to many rows, one row for each element from the input array
  • 46. | ©2012, Cognizant Table generating functions Only a single expression in the SELECT clause is supported with UDTF's'.
  • 48. | ©2012, Cognizant CASE … WHEN … THEN Statements
  • 49. | ©2012, Cognizant Where and Group by .. having clause
  • 52. | ©2012, Cognizant Points to remember  Only equality joins are allowed.  More than 2 tables can be joined in the same query e.g. SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key2) is a valid join.  A single map/reduce job if for every table the same column is used in the join clause - ON (a.key = b.key1) JOIN c ON (c.key = b.key1)  ON (a.key = b.key1) JOIN c ON (c.key = b.key2) is converted into two map/reduce jobs because key1 column from b is used in the first join condition and key2 column from b is used in the second one.
  • 53. | ©2012, Cognizant ORDER BY and SORT BY  ORDER BY uses single reducer to sort the data, which may take an unacceptably long time to execute for larger data sets.  Hive adds an alternative, SORT BY, that orders the data only within each reducer, thereby performing a local ordering, where each reducer’s output will be sorted.
  • 54. | ©2012, Cognizant Casting  If a salary value was not a valid string for a floating- point number? In this case, Hive returns NULL.
  • 55. | ©2012, Cognizant UNION ALL and Nested select  Each subquery of the union query must produce the same number of columns, and for each column, its type must match all the column types in the same position.
  • 56. | ©2012, Cognizant View • similar to writing a function in a programming language. • Views are virtual.
  • 57. | ©2012, Cognizant Lateral view  Lateral view is used in conjunction with user-defined table generating functions such as explode().  A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.  Syntax- 1. LATERAL VIEW udtf(expression) tableAlias AS columnAlias
  • 60. | ©2012, Cognizant UDF  Hive actually uses reflection to find methods whose names are evaluate and matches the arguments used in the HiveQL function call.  Hive can work with both the Hadoop Writables and the Java primitives, but it’s recommended to work with the Writables since they can be reused.  Input arguments type and return type must be same.
  • 62. | ©2012, Cognizant UDF vs. GenericUDF
  • 63. | ©2012, Cognizant between operator hive> select name,salary from employees2 where salary between 80000 and 100000; Total MapReduce jobs = 1 Launching Job 1 out of 1 .... OK John Doe 100000.0 John Doe 100000.0 Mary Smith 80000.0 Mary Smith 80000.0 Time taken: 14.39 seconds  Both values (lower and upper) are inclusive.
  • 64. | ©2012, Cognizant HiveServer2  As of CDH4.1, you can deploy HiveServer2, an improved version of HiveServer that supports a new Thrift API tailored for JDBC and ODBC clients, Kerberos authentication, and multi-client concurrency.  There is also a new CLI for HiveServer2 named BeeLine.  HiveServer2  Connection URL ===== jdbc:hive2://<host>:<port>  Driver Class =========== org.apache.hive.jdbc.HiveDriver  HiveServer1  Connection URL ===== jdbc:hive://<host>:<port>  Driver Class ========org.apache.hadoop.hive.jdbc.HiveDriver
  • 65. | ©2012, Cognizant BEELINE $ /usr/lib/hive/bin/beeline beeline> !connect jdbc:hive2://localhost:10000 username password org.apache.hive.jdbc.HiveDriver 0: jdbc:hive2://localhost:10000>
  • 66. | ©2012, Cognizant Connecting database using properties file
  • 67. | ©2012, Cognizant References Hive Edward Capriolo (Author), Dean Wampler (Author), Jason Rutherglen (Author) O'Reilly Media; 1 edition (October 3, 2012) Chapter About Hive Hadoop in Action Chuck Lam (Author) Manning Publications; 1st Edition (December, 2010)