Mais conteúdo relacionado
Semelhante a Introduction to Apache Hive (20)
Introduction to Apache Hive
- 1. APACHE HIVE
(Apache Hadoop Sub Project)
Agenda:
Story – Making of Apache Hive
What is Apache Hive
Physical Layout
Hive CLI
Hive QL
- 3. Can Elephants Fly?
Concern: Can hadoop be used more efficiently/fruitfully by developers?
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 3
- 5. Thinking…. ?
Step 1. Give him Wings
Mr. Hadoop energizing himself.
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 5
- 6. Thinking… ?
Step 2. Pray to Gravity
Thanks to gravity, sky never fell down on us ;)
But wait 2012 is not yet over. Keep Praying.
Mr. Hadoop enjoying his first air ride.
“God did not create the universe, gravity did” - Stephen Hawking
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 6
- 8. Upshot of the down-fall
Victims Mr. Hadoo
p – The Fly
ing Elephan
t
Blame Gravity! The Fall will have a huge impact.
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 8
- 10. Saving Life…
Step1. Shrink
BEFORE -
ACME Elephant Shrinker
AFTER -
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 10
- 11. Saving Life…
Step2. Genetic Engineering & a bit of magic
BEFORE AFTER
Mr. Hadoop
Ms. Hive
Injecting Insecto-receptors
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 11
- 13. Behind the scenes…?
Hive was initially developed by Facebook.
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 13
- 14. Hive is a datawarehouse infrastructure built
on top of hadoop.
Supports analysis of large datasets stored in
Hadoop compatible file systems like HDFS,
Amazon S3 fs.
Provides SQL-like query language called
HiveQL.
To accelerate queries, it provides indexing.
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 14
- 15. Warehouse directory in hdfs
/user/hive/warehouse
Tables ~ Subdirectories of warehouse
Partitions ~ Subdirectories of corresponding
Table directory.
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 15
- 16. Hive Queries are implicitly converted to map-
reduce code by hive engine.
Compiler translates all the queries into a
directed acyclic graph of map-reduce jobs.
These map-reduce jobs are sent to hadoop
for execution.
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 16
- 17. /user/hive directory is created automatically as soon
as hive session is started first time.
/user/hive/warehouse directory shall be accessible
by all.
hadoop dfs -chmod –R 1777 /user/hive/warehouse
Recommended to activate sticky bit if supported by
the hadoop version installed on cluster.
/tmp directory shall also be made as a sticky
directory.
hadoop dfs –chmod –R 1777 /tmp
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 17
- 18. Hive CLI(Command Line Interface) can be
invoked by hive command.
% hive
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 18
- 19. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 19
- 21. DML’s
▪ Select
DDL’s
▪ SHOW TABLES
▪ CREATE TABLE
▪ ALTER TABLE
▪ DROP TABLE
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 21
- 23. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 23
- 24. Normal Tables are created under warehouse
directory. (source Data migrates to warehouse)
Normal Tables are directly visible through hdfs
directory browsing.
On Dropping a normal table, the source data and
table meta data both are deleted.
External Tables read directly from hdfs files.
External tables not visible in warehouse
directory.
On Dropping an external table, only the meta
data is deleted but not the source data.
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 24
- 25. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 25
- 26. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 26
- 27. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 27
- 28. Hive QL supports Joins on only equality
expressions. Complex boolean expressions,
inequality conditions are not supported.
More than 2 tables can be joined.
Number of map-reduce jobs generated for a
join depend on the columns being used.
If same col is used for all the tables, then n=1
Otherwise n>1
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 28
- 29. © 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 29
- 30. HiveQL Doesn’t follow SQL-92 standard
Lack support
No Materialized views
No Transaction level support
Limited Sub-query support
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 30
- 31. Hadoop – Entering into the new world!
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 31
- 32. Reach me
Tapan Avasthi
Associate Software Developer Intern, Travelocity Global
tapan.avasthi@travelocity.com
tapan.k.avasthi@gmail.com
© 2012 Sabre Holdings Pvt. Ltd. | All rights reserved 32