[2024]Digital Global Overview Report 2024 Meltwater.pdf
An introduction to Cloudera Impala
1. Impala
● What is it ?
● How does it work ?
● Performance
● Formats
● Architecture
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
2. Impala – What is it ?
● Adhoc real time query for Hadoop
● Open source
● Developed by Cloudera
● Based on Google 2010 dremel paper
● Direct data access via Impala engine
● Future Hadoop parquet update will
– Add columnar binary storage to Hadoop
– Improve Impala performance
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
3. Impala – How does it work ?
● Direct data access
● Query planning / coordination on data nodes
● Node based query engine
● Low latency
● Perfomance imrovement
● Query data on HDFS or Hbase
● Uses same Hive QL syntax ( SQL like )
● Has the Hue GUI
● Allows table joins and aggregation
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
4. Impala – Performance
Impala delivers performance gains
● IO bound queries – hardware limitations
– Min 3 times
● Complex – multiple MapReduce stages
– Min 7 times
● Cached queries
– Min 20 times
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
5. Impala – Formats
Supported formats
– Text & Sequence Files which can be compressed as
● Snappy
● GZIP
● BZIP
– Future support for
● Avro
● RCFile
● LZO text file
● Parquet
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
7. Impala – Requirements
What does Impala need to run ?
– CentOS 6.2
– or RHEL (Red Hat Enterprise Linux)
– CDH 4.1 (Cloudera Hadoop Distribution)
– Cloudera Manager ( advised )
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
8. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems