7. Data
type
compaAbility
7
struct AnyVal {!
bool is_null;!
};!
!
struct StringVal : public AnyVal {!
int len;!
uint8_t* ptr;!
};!
%AnyVal = type { i8 }!
%StringVal = type { %AnyVal, i32, i8* }!
!
; or!
!
%StringVal = type { { i8 }, i32, i8* }!
C++
LLVM
IR
8. Register
and
execute
the
funcAon
8
CREATE FUNCTION StringEq(STRING, STRING)!
RETURNS BOOLEAN!
LOCATION '/path/to/bitcode.ll’!
SYMBOL=’StringEq’;!
SELECT StringEq(a, b) FROM mytable;!
10. Impyla:
Python
Library
for
Impala
• pip
install
impyla
• DB
API
v2.0
(PEP
249)
compaAble
• Prototype
sklearn
API
for
Impala
ML
• Numba
integraAon
(described
here)
• See
blog
post:
h]p://blog.cloudera.com/blog/
2014/04/a-‐new-‐python-‐client-‐for-‐impala/
10
14. Example:
100
Node
Decision
Tree
14
def predict_income(impala_function_context, age, workclass, final_weight, education, education_num, marital_status, occupation, relationship,!
race, sex, hours_per_week, native_country, income):!
if (marital_status is None):!
return '<=50K'!
if (marital_status == 'Married-civ-spouse'):!
if (education_num is None):!
return '<=50K'!
if (education_num > 12):!
if (hours_per_week is None):!
return '>50K'!
if (hours_per_week > 31):!
if (age is None):!
return '>50K'!
if (age > 28):!
if (education_num > 13):!
if (age > 58):!
return '>50K'!
if (age <= 58):!
return '>50K'!
if (education_num <= 13):!
if (occupation is None):!
return '>50K'!
if (occupation == 'Exec-managerial'):!
return '>50K'!
if (occupation != 'Exec-managerial'):!
return '>50K'!
if (age <= 28):!
if (age > 24):!
if (occupation is None):!
return '<=50K'!
if (occupation == 'Tech-support'):!
return '>50K'!
if (occupation != 'Tech-support'):!
return '<=50K'!
if (age <= 24):!
if (final_weight is None):!
return '<=50K'!
if (final_weight > 492053):!
return '>50K'!
if (final_weight <= 492053):!
return '<=50K'!
if (hours_per_week <= 31):!
if (sex is None):!
return '<=50K'!
if (sex == 'Male'):!
if (age is None):!
return '<=50K'!
if (age > 29):!
if (age > 62):!
21. Current
Status
• Support
for
all
Impala
UDF
data
types
(e.g.,
IntVal,
StringVal,
etc.)
• Support
for
casts
to/from
primiAve
types:
• Any
operaAons
on
primiAves
should
work
on
Impala
types
• Support
for
NULL
types
as
Python
None!
• Proof-‐of-‐principle
support
for
Python
string
module
• len!
• split!
• ConcatenaAon
• Call
out
to
any
extern C
funcAons
• Proposed
direcAons
• Array
handling
• Numpy
support
• What
else?
21
22. UDFs
with
Impala
+
Numba
• Simplicity
of
Python
interface/syntax
• Performance
of
compiled
language
like
C++
• Developed
at:
h]ps://github.com/cloudera/impyla
• Please
try
it
and
tell
us
what
features
would
be
useful
• Please
contribute!
22
pip install impyla!