Real-Time Queries in Hadoop w/ Cloudera Impala

Impala: A Modern SQL
Engine for Hadoop
Henry Robinson | Software Engineer
henry@cloudera.com | @henryr

Wednesday, 16 January 2013

Agenda


Agenda
• Part 1:
• Low-latency puzzle
piece of the
and Hadoop: a missing

• Impala: Goals, non-goals and features
• Demo
• Q+A


Agenda
• Part 1:
• Low-latency puzzle
piece of the
and Hadoop: a missing

• Impala: Goals, non-goals and features
• Demo
• Q+A
• Part 2:
• Impala Internals
• Comparing Impala to other systems
• Q+A

About Me

3

About Me

• Hi!

3

About Me

• Hi!
• Software Engineer at Cloudera since 2009
• Apache ZooKeeper
• First version of Flume
• Cloudera Enterprise
• Working on Impala since the beginning
of 2012

3

Part 1: Why Impala?


The Hadoop Landscape

5


• Hadoop MapReduce is a batch processing
system

5


system
• Ideally suited to workloads high-latency
data processing
long-running,

5


system
• Ideally suited to workloads high-latency
data processing
long-running,

• But not as suitable for interactive queries,
data exploration or iterative query
reﬁnement
• All of which are keystones of data
warehousing

5

Bringing Low-Latency to
Hadoop

6

Hadoop
• HDFS and HBase make data storage cheap
and ﬂexible

6

Hadoop
and ﬂexible
• SQL / ODBC are industry-standards
• Analyst familiarity
• BI tool integration
• Legacy systems

6

Hadoop
and ﬂexible
• SQL / ODBC are industry-standards
• Analyst familiarity
• BI tool integration
• Legacy systems
• Can we get the advantages of both?
• With acceptable performance?
6

Impala Overview: Goals

7

• General-purpose SQL query engine
• should work both for analytical and transactional workloads

• will support queries that take from milliseconds to hours

7

• Runs directly within Hadoop:
• Reads widely-used Hadoop ﬁle formats
• talks to widely used Hadoop storage managers like HDFS and HBase
• runs on same nodes that run Hadoop processes

7

• Runs directly within Hadoop:
• Reads widely-used Hadoop ﬁle formats
• talks to widely used Hadoop storage managers like HDFS and HBase
• runs on same nodes that run Hadoop processes
• High performance
• C++ instead of Java
• runtime code generation via LLVM
• completely new execution engine that doesn’t build on MapReduce
7

User View of Impala

8

User View of Impala
• Runs as a distributed service in cluster: one
Impala daemon on each node with data

8

User View of Impala
• User submits query via ODBC/Beeswax Thrift
API to any daemon

8

User View of Impala
API to any daemon
• Query is distributed to all nodes with relevant
data

8

User View of Impala
API to any daemon
data
• If any node fails, the query fails

8

User View of Impala
API to any daemon
data
• Impala uses Hive’s metadata interface

8

User View of Impala
API to any daemon
data
• Impala uses Hive’s metadata interface
• Supported file formats:
• text files (GA: with compression, including lzo)
• sequence files with snappy / gzip compression
• GA: Avro data files / columnar format (more on this later)
8

User View of Impala: SQL

9


• SQL support:
• patterned after Hive’s version of SQL

• limited to Select, Project, Join, Union, Subqueries, Aggregation and Insert

• only equi-joins, no non-equi-joins, no cross products

• ORDER BY only with LIMIT

• GA: DDL support (CREATE, ALTER)

9


• SQL support:
• patterned after Hive’s version of SQL
• limited to Select, Project, Join, Union, Subqueries, Aggregation and Insert
• only equi-joins, no non-equi-joins, no cross products
• ORDER BY only with LIMIT
• GA: DDL support (CREATE, ALTER)
• Functional Limitations
• no custom UDFs, ﬁle formats, Hive SerDes
• only hash memory of alltable has to ﬁt in(GA) of a single node (beta) /
aggregate
joins: joined
executing nodes
memory

• join order = FROM clause order

9

User View of Impala: HBase

10


• HBase functionality
• uses Hive’s mapping of HBase table into metastore table

• predicates on rowkey columns are mapped into start / stop row

• predicates on other columns are mapped into SingleColumnValueFilters

10


• HBase functionality
• uses Hive’s mapping of HBase table into metastore table
• predicates on rowkey columns are mapped into start / stop row
• predicates on other columns are mapped into SingleColumnValueFilters
• HBase functional limitations
• no nested-loop joins
• all data stored as text

10

Demo


TPC-DS

12

TPC-DS

• TPC-DS isdecision supportdataset designed
to model
a benchmark
systems

12

TPC-DS

to model
a benchmark
systems
• We generatedillustrative!) (not a lot, but
enough to be
500MB data

12

TPC-DS

to model
a benchmark
systems
enough to be
500MB data

• Let’sagainstsample query against Hive 0.9,
and
run a
Impala 0.3

12

TPC-DS

to model
a benchmark
systems
enough to be
500MB data

• Let’sagainstsample query against Hive 0.9,
and
run a
Impala 0.3
• Single node (VM! -engine speeds so we’re
testing execution
caveat emptor),

12

TPC-DS Sample Query
select
i_item_id,
s_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
FROM store_sales
JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
JOIN customer_demographics on (store_sales.ss_cdemo_sk =
customer_demographics.cd_demo_sk)
JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
where
cd_gender = 'M' and
cd_marital_status = 'S' and
cd_education_status = 'College' and
d_year = 2002 and
s_state in ('TN','SD', 'SD', 'SD', 'SD', 'SD')
group by
i_item_id,
s_state
order by
i_item_id,
s_state
limit 100;

13

Impala is much faster

14

Impala is much faster

• Why?
• No materialisation of intermediate data
- less I/O
• No multi-phase queries - much smaller
startup / teardown overhead
• Fasterfor each individual query
code
execution engine: generates fast

14

Part 2:
Impala Internals /
Roadmap


Impala Architecture

16

Impala Architecture
• Two binaries: impalad and statestored

16

Impala Architecture
• Impala daemon (impalad)
• handles client requests andexecution
requests related to query
all internal
over Thrift
• runs on every datanode

16

Impala Architecture
• Impala daemon (impalad)
• handles client requests andexecution
requests related to query
all internal
over Thrift
• runs on every datanode
• Statestore daemon (statestored)
• provides membership information and
metadata distribution
• only one per cluster
16

Query Execution

17

Query Execution
• Query execution phases:
• Request arrives via Thrift API (perhaps
from ODBC, or shell)
• Plannerfragments into collections
of plan
turns request

• ‘Coordinator’ initiates execution on
remote impalad daemons

17

Query Execution
• Query execution phases:
• Request arrives via Thrift API (perhaps
from ODBC, or shell)
• Plannerfragments into collections
of plan
turns request

• ‘Coordinator’ initiates execution on
remote impalad daemons
• During execution:
• Intermediate results are streamed
between impalad daemons
• Query results are streamed to client
17

Query Execution

• Request arrives via Thrift API:
SQL App
Hive
HDFS NN Statestore
Metastore
ODBC

SQL request

Query Planner Query Planner Query Planner

Query Coordinator Query Coordinator Query Coordinator

Query Executor Query Executor Query Executor

HDFS DN HBase HDFS DN HBase HDFS DN HBase

18

Query Execution

• Planner turns request into collections of
plan fragments

SQL App
Hive
HDFS NN Statestore
Metastore
ODBC





19

Query Execution
• Intermediate results are streamed between
Impalad daemons. Query results are
streamed back to client.
SQL App
Hive
HDFS NN Statestore
Metastore
ODBC

query results





20

The Planner

21

The Planner

• Two-phase planning process:
• single-node plan: left-deep tree of plan operators

• plan partitioning: partition single-node plan to maximise scan locality,
minimise data movement

21

The Planner



• Plan operators: Scan, HashJoin, Exchange
HashAggregation, Union, TopN,

21

The Planner



• Distributed aggregation:aggregation at root
individual nodes, merge
pre-aggregate in

21

The Planner



• Distributed aggregation:aggregation at root
individual nodes, merge
pre-aggregate in

• GA: rudimentary cost-based optimiser
21

Plan Partitioning

• Example: query with join and aggregation
SELECT state, SUM(revenue)
FROM HdfsTbl h JOIN HbaseTbl b ON (...)
GROUP BY 1 ORDER BY 2 desc LIMIT 10

TopN
Agg
TopN
Agg Hash
Agg Join
Hash
Join
Hdfs Hbase
Exch Exch
Scan Scan
Hdfs Hbase at coordinator at DataNodes at region servers
Scan Scan

22

Catalog Metadata

23

Catalog Metadata

• Metadata Handling
• Uses Hive’s metastore
• Caches metadata between queries: no
synchronous metastore API calls during
query execution
• Beta: Changes in metadata require
manual refresh
• GA: Metadata distributed through
statestore

23

Execution Engine

24

Execution Engine

• Heavy-lifting component of each Impalad
• Written in C++
• runtime code generation for “big-
loops”
• Internal in-memoryﬁxed offsets puts
ﬁxed-width data at
tuple format

• Hand-optimised assembly where
needed

24

Real-Time Queries in Hadoop w/ Cloudera Impala

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Real-Time Queries in Hadoop w/ Cloudera Impala

Similar to Real-Time Queries in Hadoop w/ Cloudera Impala (20)

More from Data Science London

More from Data Science London (20)

Recently uploaded

Recently uploaded (20)

Real-Time Queries in Hadoop w/ Cloudera Impala