7:30 SQL-on-Accumulo - Don Miner, ClearEdge IT
Running SQL queries over data in Accumulo is easier said than done and has several nuanced design challenges that don't have clear answers. This talk will give an outline of the current state of the art in SQL-on-Accumulo technologies, while giving a realistic view on what is doable and what is not doable today.
8. WWHBD? - Hive
• Hive
Runs in MapReduce
Map col family and col qualifiers to columns
Maintained by Hive community
• Impala and Shark inherit functionality from Hive
8
9. WWHBD? - next level
Problem:
Hive, Impala, and Shark don’t know how HBase works
… and don’t care
• Apache Phoenix
Specifically SQL-on-HBase
Currently Apache incubator project
Client-embedded JDBC driver
Uses series of scans and coprocessors
• Pivotal’s HAWQ and PXF
PXF is external table functionality in HAWQ
Native support for HAWQ: uses push down filters, range
scans, etc. to efficiently slurp data into HAWQ
9
11. SQL-on-Accumulo Status
Hive (and somewhat Impala and Shark)
• Github project by Brian Femiano [1]
Doesn’t work on new versions
Hasn’t been touched in 9 months
Wasn’t committed into trunk
• Some rumors that some orgs have done it
themselves (but no public information)
people. technology. integrity. 11
[1] https://github.com/bfemiano/accumulo-hive-storage-manager (google for “accumulo hive”)
12. SQL-on-Accumulo Status
Phoenix
• Discussion on mailing list last week
• Some differences between iterators and
coprocessors makes this interesting
Pivotal’s HAWQ and PXF
• In development
• Will support visibility labels
• Pushdown and optimizations with iterators
people. technology. integrity. 12
13. Visibility Design Problems
13
These problems are unique to Accumulo
• SELECT and visibility labels
Assume two cells, only uniqueness is visibility…
Which do I pick in a SELECT?
Timestamps have this problem, but have a logical
assumption (most recent)
• Authorizations in SQL
How do you tell the execution engine which
authorizations to use?
Table definition? (hard to change)
SQL statement? (extend SQL language?)
Based on login? (how do you downgrade?)
14. What are the next steps?
I guess that’s up to the community
14
15. QUIZ: What is this definition trying to say?
Big Data:
• Volume
• Variety
• Velocity
• Veracity
15
A warning about SQL-on-Accumulo
16. QUIZ: What is this definition trying to say?
Big Data:
• Volume
• Variety
• Velocity
• Veracity
Answer: RDBMS/SQL suck at all these things
16
A warning about SQL-on-Accumulo
17. QUIZ: What is this definition trying to say?
Big Data:
• Volume
• Variety
• Velocity
• Veracity
Answer: RDBMS/SQL suck at all these things
17
A warning about SQL-on-Accumulo
What does SQL-on-Accumulo still suck at?
*Added context for my internet viewers since this could be controversial if taken literally and I’m not talking to my
slides: I’m trying to say that SQL-on-X can’t solve all of the worlds problems, but it can solve a good number of
them very well. It also tees up the conversation that SQL is not the end-all-be-all… there are ways that it could be
made better to adapt to “the big data use case”. Don’t take this the wrong way, SQL-on-Hadoop and SQL-on-
Accumulo would be incredible useful, but it doesn’t solve 100% of the problems.