More Related Content
Similar to Get started with hadoop hive hive ql languages (20)
More from JanBask Training (20)
Get started with hadoop hive hive ql languages
- 2. www.JanBaskTraining.coCopyright © JanBask Training. All rights reserved
Career Options Of Hadoop Big Data Certification
Hadoop to HiveQL
Uses of Hadoop
Hive
Remember that Hive is not
Uses of HiveQL
Major Reasons to use Hadoop for Data
Science
Bottom Line
- 3. www.JanBaskTraining.coCopyright © JanBask Training. All rights reserved
Hadoop to HiveQL
Apache Hadoop is the storage system which is
written in Java, which is an open-source, fault-
tolerant, and scalable framework. It gives a
platform to process a large amount of data.
Hadoop makes use of Data Lake, which supports the
storage of data in its original or exact format. Hadoop
is designed in such a way through which there can be
a scale up from single servers to thousands of
machines, each of which offering local computation
and storage.
- 4. www.JanBaskTraining.coCopyright © JanBask Training. All rights reserved
Uses of Hadoop
Uses of Hadoop
There is no need to preprocess data before storing it (you may store as much
data as you want and decide later how to use it)
You may easily grow your system to handle more data easily by adding nodes
(only a little administration is required)
It is convenient to use for millions or billions of transactions
• Many cities, states, and countries make use of Hadoop to analyze data. For
example, figuring out the traffic jams which can be controlled by the use of
Hadoop (Concept of Smart City)
• Big data is also used by many businesses to optimize their data performance
in an effective manner
- 5. www.JanBaskTraining.coCopyright © JanBask Training. All rights reserved
Hive
Big Data Analyst
Apache Hive is a data warehouse software project which was built on the top of Apache
Hadoop for supplying data query and analysis.
It makes use of declarative language, which is similar to SQL called HQL.
Hive allows programmers who are well-known with the language to write custom
MapReduce framework to perform more knowledgeable analysis.
- 7. www.JanBaskTraining.coCopyright © JanBask Training. All rights reserved
HQL
The Hive Query Language is a SQL like an
interface which is used to query data stored in
the database and file systems that are
integrated with Hadoop. It supports simple SQL
like functions- CONCAT, SUBSTR, ROUND,
etc. and aggregate functions like- SUM,
COUNT, MAX, etc.
It also supports clauses- GROUP BY and
SORT BY. Also, it is possible to write user-
defined functions using Hive Query Language
(HQL). Basically, it makes use of the well-
known concepts from the relational database
world, like- tables, rows, columns, and schema.
- 8. www.JanBaskTraining.coCopyright © JanBask Training. All rights reserved
Uses of HiveQL
HQL is the twin of SQL
HQL allows programmers to plug-in custom mappers and
reducers
HQL is scalable, familiar, extensible, and fast to use
It provides indexes to correct queries
HQL contains a large number of user function APIs which
can be used to create custom behavior into the query engine
It perfectly fits in the requirement of a low-level interface of
Hadoop
- 9. www.JanBaskTraining.coCopyright © JanBask Training. All rights reserved
Major Reasons to use Hadoop for Data Science
When you have to deal with a large amount of data, Hadoop is
the best option to choose When you are planning to implement
Hadoop on your data, the first step is to understand the
complexity level of data and the data-rate based on which data
is going to grow.
In this case, cluster planning is required. Depending upon the
size of data of the company (GBs or TBs), Hadoop is helpful
here.
Different types of data
Numeric data
Nominal data
Different specific applications
- 10. www.JanBaskTraining.coCopyright © JanBask Training. All rights reserved
Bottom Line
Hadoop has become de-facto of Data Science
and is the gateway of Big Data related
technologies. It is the foundation of other Big
Data technologies like Spark, Hive, etc. As
per Forbes– “Hadoop market is expected to
reach $99.318 by 2022 at a CAGR of 42.1
percent.” So, this is the right time to give a push
to your skills in the field of Big Data. Happy
Reading!