SlideShare a Scribd company logo
1 of 37
Introduction to
HiveQL
BY KRISTIN FERRIER
About Me – Kristin Ferrier
 15+ Years in IT (Software development and BI development)
 10+ years experience with SQL Server and 5+ years experience with
Oracle
 Co-founder OKCSQL
 Currently Sr. Data Analyst at an energy company
 Social Media
 Twitter: @SQLenergy
 Blog: http://www.kristinferrier.com
Agenda
 Hadoop – Very High Level
 Hive and HiveQL - High Level
 Getting started with Hive and HiveQL
 HiveQL examples
 Resources for getting started with HiveQL
Hadoop
 Open source software
 Popular for storing, processing, and analyzing large volumes of data
 For example, web logs or sensor data
 Main distributions
 Cloudera
 Hortonworks
 MapR (has some proprietary components)
Hadoop 2.0 Main Components
 Hadoop Distributed File System (HDFS)
 Handles the data storage
 MapReduce
 Handles the processing
 Works with key value pairs
 Often written in Java
 Can be written in any scripting language using the Streaming API of
Hadoop
Example MapReduce Code
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
Code from Hortonworks tutorial found at http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
Getting Started with Hadoop
 What if I don’t know Java?
 Or one of the Scripting languages using the Streaming API of Hadoop
 Example: Python
 That’s OK. If you know SQL, then Hive and HiveQL may be a great
starting point for your Hadoop learning
Hive
Hive essentially allows us to use tables
within Hadoop
 Built on top of Apache Hadoop
 Can access files stored in HDFS or HBase
 HCatalog allows you to apply table structures to the data
 HiveQL to query the data
HiveQL
HiveQL is SQL-like language for
querying data from Hive
 Follows some of the ANSI SQL-92 standard
 Offers its own extensions
 Implicitly turned into MapReduce jobs
HiveQL – Key SQL items it has
 SELECT
 FROM
 WHERE
 GROUP BY
 HAVING
 JOINS – Some kinds
HiveQL – Key differences from SQL
 No transactions
 No materialized views
 Update and delete available only with Hive 0.14 and later
 Hive 0.14 was released November 2014
Accessing Hive
 Hue
 Web interface for Hadoop
 Beeswax
 Hive UI within Hue
Hue
Beeswax
Getting Data into Hive Tables
 One way is to import a file into Hive
 Can create the table at this time
 Can import the data at this time
 File can even come from a Windows box
Importing a file
Beeswax  Tables  Create a new table from a file
Importing a file cont.
Enter Table Name and Description  .. button
Importing a file cont.
Upload a file  Select your Windows file
 Open
Importing a file cont.
After file uploads, double-click your file
Importing a file cont.
Choose a Delimiter
Importing a file cont.
Select column data types  Create Table
Importing a file cont.
Table has been created
Query Editor
 Write queries in the Query Editor
Select
SELECT * FROM WEATHER
Where, Group By, Min/Max
Where, Group By, Min/Max - Results
Aliasing, Ordering
 Standard SQL syntax for Aliasing
 SORT BY instead of ORDER BY– For ordering
Aliasing, Ordering - Results
Joins
 INNER, LEFT, RIGHT, and FULL OUTER
 Equi Joins only: (table1.key = table2.key) is allowed but not (table1.key
<> table2.key)
 Extensions exist like LEFT SEMI JOIN
INNER JOIN
INNER JOIN - Results
LEFT SEMI JOIN
 Left Semi Joins are less necessary
starting with Hive 0.13
 As of Hive 0.13 the IN/NOT
IN/EXISTS/NOT EXISTS operators are
supported using subqueries
SELECT a.key, a.value
FROM a
WHERE a.key in
(SELECT b.key
FROM B);
can be rewritten to
SELECT a.key, a.val
FROM a LEFT SEMI JOIN b ON (a.key = b.key)
Example from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Performance
 Queries can take minutes to run. Focus is on analysis of large data
sets.
 Relational databases are still a strong solution for providing the faster
performance of CRUD (create, read, update, and delete)
operations required by OLTP systems.
Summary
 Hive essentially allows us to use tables in Hadoop
 We can query them using HiveQL, which is similar to SQL
 Knowing how to write MapReduce code is not required, as the
HiveQL will be turned into MapReduce for us
Getting Started Yourself
 Hortonworks Sandbox
 Portable Hadoop environment with tutorials
 Even though the sandbox runs Hadoop on Linux, you can run the sandbox
on your Windows machine and access it via a web browser
 Available at http://hortonworks.com/sandbox
Getting Started Yourself
 Hive DML Reference
 https://cwiki.apache.org/confluence/display/hive/languageManual+dml
 Apache’s Hive Language Manual
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual
 Treasure’s HiveQL Reference
 http://docs.treasuredata.com/articles/hive
 Network World – Comparing the top Hadoop Distros
 http://www.networkworld.com/article/2369327/software/comparing-the-
top-hadoop-distributions.html
Contact Info
 Social Media
 Twitter: @SQLenergy
 Blog: http://www.kristinferrier.com

More Related Content

What's hot

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaEdureka!
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 

What's hot (20)

Hive presentation
Hive presentationHive presentation
Hive presentation
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Hive
HiveHive
Hive
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | EdurekaWhat Is Hadoop | Hadoop Tutorial For Beginners | Edureka
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 

Viewers also liked

Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureEduardo Castro
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]knowbigdata
 
Carnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraCarnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraMeera Raghu
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperknowbigdata
 
Guide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsGuide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsMeera Raghu
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsKalyan Hadoop
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
An introduction to the Recorder
An introduction to the Recorder An introduction to the Recorder
An introduction to the Recorder Sandra Morgan
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Differences between OpenStack and AWS
Differences between OpenStack and AWSDifferences between OpenStack and AWS
Differences between OpenStack and AWSEdureka!
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaEdureka!
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 

Viewers also liked (20)

Introduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage AzureIntroduction to SQL Server Cloud Storage Azure
Introduction to SQL Server Cloud Storage Azure
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]
 
Carnatic Music Notations: Alankara
Carnatic Music Notations: AlankaraCarnatic Music Notations: Alankara
Carnatic Music Notations: Alankara
 
Jathiswara
JathiswaraJathiswara
Jathiswara
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Guide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music NotationsGuide to understanding Carnatic Music Notations
Guide to understanding Carnatic Music Notations
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshots
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Recorder lesson
Recorder lessonRecorder lesson
Recorder lesson
 
An introduction to the Recorder
An introduction to the Recorder An introduction to the Recorder
An introduction to the Recorder
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Differences between OpenStack and AWS
Differences between OpenStack and AWSDifferences between OpenStack and AWS
Differences between OpenStack and AWS
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 

Similar to Introduction to HiveQL

ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop ToolsXplenty
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseJonathan Bloom
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big dealeduarderwee
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 

Similar to Introduction to HiveQL (20)

ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Intro to Hybrid Data Warehouse
Intro to Hybrid Data WarehouseIntro to Hybrid Data Warehouse
Intro to Hybrid Data Warehouse
 
Hive_Pig.pptx
Hive_Pig.pptxHive_Pig.pptx
Hive_Pig.pptx
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 
Apache hive1
Apache hive1Apache hive1
Apache hive1
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 

More from kristinferrier

So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?kristinferrier
 
Intro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and AuthenticationIntro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and Authenticationkristinferrier
 
Demystifying JSON in SQL Server
Demystifying JSON in SQL ServerDemystifying JSON in SQL Server
Demystifying JSON in SQL Serverkristinferrier
 
3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Mapkristinferrier
 

More from kristinferrier (6)

So MANY databases, which one do I pick?
So MANY databases, which one do I pick?So MANY databases, which one do I pick?
So MANY databases, which one do I pick?
 
Intro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and AuthenticationIntro to Firebase Realtime Database and Authentication
Intro to Firebase Realtime Database and Authentication
 
Demystifying JSON in SQL Server
Demystifying JSON in SQL ServerDemystifying JSON in SQL Server
Demystifying JSON in SQL Server
 
SQL to JSON
SQL to JSONSQL to JSON
SQL to JSON
 
T-SQL Treats
T-SQL TreatsT-SQL Treats
T-SQL Treats
 
3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map3D Geospatial Visualization Using Power Map
3D Geospatial Visualization Using Power Map
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Introduction to HiveQL

  • 2. About Me – Kristin Ferrier  15+ Years in IT (Software development and BI development)  10+ years experience with SQL Server and 5+ years experience with Oracle  Co-founder OKCSQL  Currently Sr. Data Analyst at an energy company  Social Media  Twitter: @SQLenergy  Blog: http://www.kristinferrier.com
  • 3. Agenda  Hadoop – Very High Level  Hive and HiveQL - High Level  Getting started with Hive and HiveQL  HiveQL examples  Resources for getting started with HiveQL
  • 4. Hadoop  Open source software  Popular for storing, processing, and analyzing large volumes of data  For example, web logs or sensor data  Main distributions  Cloudera  Hortonworks  MapR (has some proprietary components)
  • 5. Hadoop 2.0 Main Components  Hadoop Distributed File System (HDFS)  Handles the data storage  MapReduce  Handles the processing  Works with key value pairs  Often written in Java  Can be written in any scripting language using the Streaming API of Hadoop
  • 6. Example MapReduce Code public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Code from Hortonworks tutorial found at http://hortonworks.com/hadoop-tutorial/introducing-apache-hadoop-developers/
  • 7. Getting Started with Hadoop  What if I don’t know Java?  Or one of the Scripting languages using the Streaming API of Hadoop  Example: Python  That’s OK. If you know SQL, then Hive and HiveQL may be a great starting point for your Hadoop learning
  • 8. Hive Hive essentially allows us to use tables within Hadoop  Built on top of Apache Hadoop  Can access files stored in HDFS or HBase  HCatalog allows you to apply table structures to the data  HiveQL to query the data
  • 9. HiveQL HiveQL is SQL-like language for querying data from Hive  Follows some of the ANSI SQL-92 standard  Offers its own extensions  Implicitly turned into MapReduce jobs
  • 10. HiveQL – Key SQL items it has  SELECT  FROM  WHERE  GROUP BY  HAVING  JOINS – Some kinds
  • 11. HiveQL – Key differences from SQL  No transactions  No materialized views  Update and delete available only with Hive 0.14 and later  Hive 0.14 was released November 2014
  • 12. Accessing Hive  Hue  Web interface for Hadoop  Beeswax  Hive UI within Hue
  • 13. Hue
  • 15. Getting Data into Hive Tables  One way is to import a file into Hive  Can create the table at this time  Can import the data at this time  File can even come from a Windows box
  • 16. Importing a file Beeswax  Tables  Create a new table from a file
  • 17. Importing a file cont. Enter Table Name and Description  .. button
  • 18. Importing a file cont. Upload a file  Select your Windows file  Open
  • 19. Importing a file cont. After file uploads, double-click your file
  • 20. Importing a file cont. Choose a Delimiter
  • 21. Importing a file cont. Select column data types  Create Table
  • 22. Importing a file cont. Table has been created
  • 23. Query Editor  Write queries in the Query Editor
  • 25. Where, Group By, Min/Max
  • 26. Where, Group By, Min/Max - Results
  • 27. Aliasing, Ordering  Standard SQL syntax for Aliasing  SORT BY instead of ORDER BY– For ordering
  • 29. Joins  INNER, LEFT, RIGHT, and FULL OUTER  Equi Joins only: (table1.key = table2.key) is allowed but not (table1.key <> table2.key)  Extensions exist like LEFT SEMI JOIN
  • 31. INNER JOIN - Results
  • 32. LEFT SEMI JOIN  Left Semi Joins are less necessary starting with Hive 0.13  As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using subqueries SELECT a.key, a.value FROM a WHERE a.key in (SELECT b.key FROM B); can be rewritten to SELECT a.key, a.val FROM a LEFT SEMI JOIN b ON (a.key = b.key) Example from https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
  • 33. Performance  Queries can take minutes to run. Focus is on analysis of large data sets.  Relational databases are still a strong solution for providing the faster performance of CRUD (create, read, update, and delete) operations required by OLTP systems.
  • 34. Summary  Hive essentially allows us to use tables in Hadoop  We can query them using HiveQL, which is similar to SQL  Knowing how to write MapReduce code is not required, as the HiveQL will be turned into MapReduce for us
  • 35. Getting Started Yourself  Hortonworks Sandbox  Portable Hadoop environment with tutorials  Even though the sandbox runs Hadoop on Linux, you can run the sandbox on your Windows machine and access it via a web browser  Available at http://hortonworks.com/sandbox
  • 36. Getting Started Yourself  Hive DML Reference  https://cwiki.apache.org/confluence/display/hive/languageManual+dml  Apache’s Hive Language Manual  https://cwiki.apache.org/confluence/display/Hive/LanguageManual  Treasure’s HiveQL Reference  http://docs.treasuredata.com/articles/hive  Network World – Comparing the top Hadoop Distros  http://www.networkworld.com/article/2369327/software/comparing-the- top-hadoop-distributions.html
  • 37. Contact Info  Social Media  Twitter: @SQLenergy  Blog: http://www.kristinferrier.com