34. Hive Architecture
UI Driver
DDL HQL
Execution
Works
Engine
MetaStore Compiler
ORM Hadoop
Result
Friday, July 1, 2011
35. Hive Architecture
UI Driver select col1 from tab1 where ...
DDL HQL
Execution
Works
Engine
MetaStore Compiler
ORM Hadoop
Result
Friday, July 1, 2011
36. Hive Architecture
UI Driver
DDL HQL
Execution
Works
Engine
MetaStore Compiler
ORM Hadoop
Result
Friday, July 1, 2011
37. Hive Architecture
UI Driver
DDL HQL
Execution
Works
Engine
MetaStore Compiler
ORM Hadoop
Result
Friday, July 1, 2011
38. Hive Architecture
UI Driver
DDL HQL
Execution
Works
Engine
MetaStore Compiler
ORM Hadoop
Result
Friday, July 1, 2011
39. Hive Architecture
a 123344
b 121211
c 342434
UI Driver
DDL HQL
Execution
Works
Engine
MetaStore Compiler
ORM Hadoop
Result
Friday, July 1, 2011
40. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF/UDAF
SELOperator
substr
sum
MetaStore Hive QL FSOperator average
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
41. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF/UDAF
SELOperator
substr
sum
MetaStore Hive QL FSOperator average
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
49. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF/UDAF
SELOperator
substr
sum
MetaStore Hive QL FSOperator average
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
50. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF/UDAF
SELOperator
substr
sum
MetaStore Hive QL FSOperator average
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
51. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
Friday, July 1, 2011
52. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
TOK_FROM
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
53. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
TOK_FROM TableScanOperator
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
54. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
TOK_FROM TableScanOperator
TOK_WHERE
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
55. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
TOK_FROM TableScanOperator
TOK_WHERE FilterOperator
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
56. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
TOK_FROM TableScanOperator
TOK_WHERE FilterOperator
TOK_SELECT
TOK_DESTINATION
Friday, July 1, 2011
57. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
TOK_FROM TableScanOperator
TOK_WHERE FilterOperator
TOK_SELECT SelectOperator
TOK_DESTINATION
Friday, July 1, 2011
58. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
TOK_FROM TableScanOperator
TOK_WHERE FilterOperator
TOK_SELECT SelectOperator
TOK_DESTINATION
Friday, July 1, 2011
59. Plan
Plan
Select col1,col2 From tab1 Where col3 > 5
QB
TOK_FROM TableScanOperator
TOK_WHERE FilterOperator
TOK_SELECT SelectOperator
TOK_DESTINATION FileSinkOperator
Friday, July 1, 2011
60. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF/UDAF
SELOperator
substr
sum
MetaStore Hive QL FSOperator average
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
61. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF/UDAF
SELOperator
substr
sum
MetaStore Hive QL FSOperator average
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
62. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
63. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
64. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
65. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
Context
TableScanOperator
FilterOperator
ColumnPruner
SelectOperator
FileSinkOperator
Friday, July 1, 2011
66. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
Context
TableScanOperator
FilterOperator FIL
ColumnPruner TS
SEL
SelectOperator
FileSinkOperator
Friday, July 1, 2011
67. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FilterOperator FIL
ColumnPruner TS
SEL
SelectOperator
FileSinkOperator Context
Friday, July 1, 2011
68. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FilterOperator
ColumnPruner
SelectOperator
FIL
FileSinkOperator Context TS
SEL
Friday, July 1, 2011
69. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FilterOperator
ColumnPruner
FIL
SelectOperator Context TS
SEL
FileSinkOperator
Friday, July 1, 2011
70. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FilterOperator
ColumnPruner
FIL
SelectOperator Context TS
SEL col1, col2
FileSinkOperator
Friday, July 1, 2011
71. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FilterOperator
ColumnPruner
FIL
SelectOperator Context TS
SEL
FileSinkOperator
Friday, July 1, 2011
72. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FIL col1, col2, col3
FilterOperator Context TS
ColumnPruner
SEL
SelectOperator
FileSinkOperator
Friday, July 1, 2011
73. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
TableScanOperator
FIL
FilterOperator Context TS
ColumnPruner
SEL
SelectOperator
FileSinkOperator
Friday, July 1, 2011
74. Optimizer
Optimizer Select col1,col2 From tab1 Where col3 > 5
tab1 {col1, col2, col3, col4,col5,col6,col7}
FIL
TableScanOperator Context TS col1, col2, col3
SEL
FilterOperator
ColumnPruner
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
75. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF/UDAF
SELOperator
substr
sum
MetaStore Hive QL FSOperator average
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
76. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF/UDAF
SELOperator
substr
sum
MetaStore Hive QL FSOperator average
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
77. Task
Task Select col1,col2 From tab1 Where col3 > 5
TS - GenMRTableScan1
TaskFactory
FS - GenMRFileSink1
QB
Friday, July 1, 2011
78. Task
Task Select col1,col2 From tab1 Where col3 > 5
TS - GenMRTableScan1
TaskFactory
FS - GenMRFileSink1
QB
FetchTask
Friday, July 1, 2011
79. Task
Task Select col1,col2 From tab1 Where col3 > 5
TS - GenMRTableScan1
TaskFactory
FS - GenMRFileSink1
QB
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
80. Task
Task Select col1,col2 From tab1 Where col3 > 5
TS - GenMRTableScan1
TaskFactory
FS - GenMRFileSink1
QB
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
81. Task
Task Select col1,col2 From tab1 Where col3 > 5
TaskFactory
FS - GenMRFileSink1
QB
MapRedTask
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
82. Task
Task Select col1,col2 From tab1 Where col3 > 5
TaskFactory
FS - GenMRFileSink1
QB
MapRedTask
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
83. Task
Task Select col1,col2 From tab1 Where col3 > 5
TaskFactory
FS - GenMRFileSink1
QB
MapRedTask
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
84. Task
Task Select col1,col2 From tab1 Where col3 > 5
TaskFactory
FS - GenMRFileSink1
QB
MapRedTask
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
85. Task
Task Select col1,col2 From tab1 Where col3 > 5
TaskFactory
FS - GenMRFileSink1
QB
MapRedTask
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
86. Task
Task Select col1,col2 From tab1 Where col3 > 5
TaskFactory
QB
MapRedTask
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
87. Task
Task Select col1,col2 From tab1 Where col3 > 5
TaskFactory
QB
MapRedTask
MapRedTask
TableScanOperator
FilterOperator FetchTask
FilterOperator
SelectOperator
FileSinkOperator
Friday, July 1, 2011
88. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF
FILOperator SELOperator
MetaStore Hive QL FILOperator FSOperator
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
89. Hive Internal
Map Reduce
Web UI Hive CLI JDBC
TSOperator User Script
Browse, Query, DDL
UDF
FILOperator SELOperator
MetaStore Hive QL FILOperator FSOperator
Thrift API Parser ExecMapper/ExecReducer
Plan SerDe
Optimizer Input/OutputFormat
Task
HDFS StorageHandler
RCFile
DB ... HBase
Friday, July 1, 2011
94. Understand Oracle SQL
• more than 3000 ETL SQL
• understand Data-Flow
• Group similar SQL Pattern
• Investigate used Oracle Function
Friday, July 1, 2011
110. DataType Convert
NUMBER(n) TINYINT
INT/BIGINT
NUMBER(n,m) FLOAT/DOUBLE
VARCHAR2 STRING
DATE
Friday, July 1, 2011
111. DataType Convert
NUMBER(n) TINYINT
INT/BIGINT
NUMBER(n,m) FLOAT/DOUBLE
VARCHAR2 STRING
DATE STRING
“yyyy-MM-dd HH:mm:ss” format
Friday, July 1, 2011
112. HIVE DML
• HIVE supports ANSI-SQL
• Only Support Sub-Queries in FROM clause
• Join query : equi-join/inner-join
outer-join
self-join
Friday, July 1, 2011
115. IN Clause
IN SubQuery
SELECT * from Employee e WHERE e.DeptNo
IN(SELECT d.DeptNo FROM Dept d)
Friday, July 1, 2011
116. IN Clause
IN SubQuery
SELECT * from Employee e WHERE e.DeptNo
IN(SELECT d.DeptNo FROM Dept d)
SELECT * from Employee e
LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo)
Friday, July 1, 2011
119. NOT IN Clause
NOT IN SubQuery
SELECT * from Employee e WHERE e.DeptNo
NOT IN(SELECT d.DeptNo FROM Dept d)
Friday, July 1, 2011
120. NOT IN Clause
NOT IN SubQuery
SELECT * from Employee e WHERE e.DeptNo
NOT IN(SELECT d.DeptNo FROM Dept d)
SELECT e.* from Employee e
LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo)
WHERE d.DeptNo IS NULL
Friday, July 1, 2011
127. Functions
Math Function
round,ceil,mod,
power,sqrt,sin/cos
Friday, July 1, 2011
128. Functions
Math Function Math Function
round,ceil,mod, round,ceil,pmod,
power,sqrt,sin/cos power,sqrt,sin/cos
Friday, July 1, 2011
129. Functions
Math Function Math Function
round,ceil,mod, round,ceil,pmod,
power,sqrt,sin/cos power,sqrt,sin/cos
Character Function
substr,trim,lpad/rpad
ltrim/rtrim,replace
Friday, July 1, 2011
130. Functions
Math Function Math Function
round,ceil,mod, round,ceil,pmod,
power,sqrt,sin/cos power,sqrt,sin/cos
Character Function Character Function
substr,trim,lpad/rpad substr,trim,lpad/rpad
ltrim/rtrim,replace ltrim/rtrim,regexp_replace
Friday, July 1, 2011
131. Functions
Math Function Math Function
round,ceil,mod, round,ceil,pmod,
power,sqrt,sin/cos power,sqrt,sin/cos
Character Function Character Function
substr,trim,lpad/rpad substr,trim,lpad/rpad
ltrim/rtrim,replace ltrim/rtrim,regexp_replace
NULL Function
coalesce,nvl,nvl2
Friday, July 1, 2011
132. Functions
Math Function Math Function
round,ceil,mod, round,ceil,pmod,
power,sqrt,sin/cos power,sqrt,sin/cos
Character Function Character Function
substr,trim,lpad/rpad substr,trim,lpad/rpad
ltrim/rtrim,replace ltrim/rtrim,regexp_replace
NULL Function NULL Function
coalesce,nvl,nvl2 coalesce
Friday, July 1, 2011
133. Functions
Math Function Math Function
round,ceil,mod, round,ceil,pmod,
power,sqrt,sin/cos power,sqrt,sin/cos
Character Function Character Function
substr,trim,lpad/rpad substr,trim,lpad/rpad
ltrim/rtrim,replace ltrim/rtrim,regexp_replace
NULL Function NULL Function
coalesce,nvl,nvl2 coalesce
No NVL,NVL2
Friday, July 1, 2011
134. Custom UDF Function
• Condition Function
• DECODE, GREATEST
• Null Comparison Function
• NVL / NVL2
• Type Conversion
• TO_NUMBER
• TO_CHAR
• TO_DATE
• INSTR4
• DATE_FORMAT
• LAST_DAY
Friday, July 1, 2011
138. Analytic Function
RANK
SELECT name,dept,salary,RANK() OVER (PARTITION BY dept
ORDER BY salary DESC) FROM emp
Friday, July 1, 2011
139. Analytic Function
RANK
SELECT name,dept,salary,RANK() OVER (PARTITION BY dept
ORDER BY salary DESC) FROM emp
SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary)
FROM (SELECT name, dept, salary FROM empDISTRIBUTED
BY dept SORT BY dept, salary DESC) e
Friday, July 1, 2011
140. Analytic Function
RANK
SELECT name,dept,salary,RANK() OVER (PARTITION BY dept
ORDER BY salary DESC) FROM emp
RANK(arg1,arg2) - Custom UDF
SELECT e.name,e.dept,e.salary,RANK( e.dept,e.salary)
FROM (SELECT name, dept, salary FROM empDISTRIBUTED
BY dept SORT BY dept, salary DESC) e
Friday, July 1, 2011
144. Analytic Aggregation Function
MIN
SELECT dept, MIN(salary) OVER (PARTITION BY dept)
FROM emp
SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m
FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept
Friday, July 1, 2011
145. Analytic Aggregation Function
MIN
SELECT dept, MIN(salary) OVER (PARTITION BY dept)
FROM emp
Aggregation + JOIN
SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m
FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept
Friday, July 1, 2011
147. Merge Join Tree Bug
• select * from a join b on a.v1 = b.v1
join c on a.v1 = c.v1
join d on a.v1 = d.v1
join e on a.v2 = e.v2
• select * from a join e on a.v2 = e.v2
join c on a.v1 = c.v1
join d on a.v1 = d.v1
join b on a.v1 = b.v1
Friday, July 1, 2011
148. Merge Join Tree Bug
• select * from a join b on a.v1 = b.v1
join c on a.v1 = c.v1
join d on a.v1 = d.v1 educ e #3
M a pR
join e on a.v2 = e.v2
• select * from a join e on a.v2 = e.v2
join c on a.v1 = c.v1
join d on a.v1 = d.v1
join b on a.v1 = b.v1
Friday, July 1, 2011
149. Merge Join Tree Bug
• select * from a join b on a.v1 = b.v1
join c on a.v1 = c.v1
join d on a.v1 = d.v1 educ e #3
M a pR
join e on a.v2 = e.v2
• select * from a join e on a.v2 = e.v2
join c on a.v1 = c.v1
join d on a.v1 = d.v1 duc e #2
Ma pRe
join b on a.v1 = b.v1
Friday, July 1, 2011
166. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Friday, July 1, 2011
167. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Modify Oracle SQL to Hive Query Syntax
Friday, July 1, 2011
168. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Modify Oracle SQL to Hive Query Syntax
Use Built-in function
Friday, July 1, 2011
169. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Modify Oracle SQL to Hive Query Syntax
Use Built-in function
Develop custom UDF/UDAF/UDTF
Friday, July 1, 2011
170. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Modify Oracle SQL to Hive Query Syntax
Use Built-in function
Develop custom UDF/UDAF/UDTF
Support analytic function
Friday, July 1, 2011
171. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Modify Oracle SQL to Hive Query Syntax
Use Built-in function
Develop custom UDF/UDAF/UDTF
Support analytic function
- distributed by + sort by + udf
Friday, July 1, 2011
172. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Modify Oracle SQL to Hive Query Syntax
Use Built-in function
Develop custom UDF/UDAF/UDTF
Support analytic function
- distributed by + sort by + udf
- join + udf (aggregation)
Friday, July 1, 2011
173. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Modify Oracle SQL to Hive Query Syntax
Use Built-in function
Develop custom UDF/UDAF/UDTF
Support analytic function
- distributed by + sort by + udf
- join + udf (aggregation)
Modify internal hive
Friday, July 1, 2011
174. Wrap-Up
Oracle 2 Hive
Look insight data flow & model
Modify Oracle SQL to Hive Query Syntax
Use Built-in function
Develop custom UDF/UDAF/UDTF
Support analytic function
- distributed by + sort by + udf
- join + udf (aggregation)
Modify internal hive
Hadoop + Hive Tunning
Friday, July 1, 2011