Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
SQL In/On/Around Hadoop
1. SQL In/On/Around Hadoop
Hadoop Summit, 2015
Chris Twogood, Vice President Product and Services Marketing
Fawad Qureshi, Principal Consultant, Big Data
9. 9
Shift from a Single Platform to an Ecosystem
“We will abandon the old
models based on the
desire to implement for
high-value analytic
applications.”
"Logical" Data Warehouse
10. 10
• Pick Your Best-of Breed Technology:
– Data types
– Analytic engines
– Economic options
– File systems
– Operating systems
• With Different Characteristics:
– CPU centric
– I/O centric
– Data volume centric
– Workload characteristics and
volume
– Availability/DR
– Service Level Agreements
Data Fabric Vision Enabled by QueryGrid
Analytic Flexibility to meet your business needs
Users direct their queries to a single
cohesive data fabric
Focus on data and business questions,
not integrating separate systems
11. 11
Customer Value Based on Social Influence
Use Case
HADOOP
TERADATA
ASTER
DATABASE
TERADATA
DATABASE
• Determine high value
customers based on history
• Determine customer value
based on social influence
<=
• Determine
customer
sentiment
• Determine
customer
sphere of
influence
$$
13. 13
INTEGRATED DATA WAREHOUSE
TERADATA DATABASE
DATA
PLATFORM
HADOOP
Teradata Database 15 – Teradata QueryGrid
Leverage analytic resources, reduce data movement
• Parallel Bi-directional
data transfer
• Push-down processing
• Native Analytics on
Target system
• Easy configuration of
server connections
• Simplified Server
Grammar
• Adaptive Optimizer
14. 14
Deep History – QueryGrid Teradata 15.00
Use Case
SELECT Trans.Trans_ID
,Trans.Trans_Amount
FROM TD_Transactions Trans
WHERE Trans_Amount > 5000
UNION
SELECT *
FROM FOREIGN TABLE
(SELECT Trans_ID
,Trans_Amount
FROM Transaction_Hist
WHERE Trans_Amount > 5000)@Hadoop Hist;
HADOOP
TERADATA
DATABASE
– Push "Foreign Table" Select to Hive to execute the query
– Provides import to Teradata of just the required columns.
– Allows predicate processing of conditions on non-partitioned columns.
– The Hadoop cluster resources are used for data qualification.
Years
5-10
Years
1-5
15. 15
Incremental planning & execution of smaller
query fragments
• Most efficient overall query plan derived from
reliable statistics
– Statistics dynamically collected from foreign data
• Incremental query plans generated for single
and multi-system queries
– Consistent Optimizer approach for queries within and
between systems
– Teradata systems “transfer” query plans between
systems
• A fully automatic optimizer feature – users don’t
have to change anything
Adaptive Optimizer
Better Query
Plan
Foreign and Sub-Queries
Why?
Unreliable statistics can result in less-than-
optimal query plans
Some analytic systems, like Hadoop,
don’t keep data statistics
Statistics not designed for compatibility
between databases
How?
Pulls out remote server requests and
single-row and scalar non-correlated sub-
queries from a main query
Plans and executes them
Plugs the results into the main query
Plans and executes the main query
∑
18. 18
DATAMART
1990’s
Just Give Me
Some Data
and Fast!
EDW/IDW
2000’s
Give Me
Good Data
But Do It
Efficiently!
LOGICALDATAWAREHOUSE
2010’s
Give Me
All Data
Fast, Simple &
Effectively!