2. Cloudera Hue
• Web Interface for making Hadoop easier to use
• Aggregation of apps for each Hadoop component
(e.g. Hive, Pig, Impala, Oozie, Solr, Sqoop, HBase...)
2
3. • Industry Usage
• Widespread HBase
Usage
• Meetups, HBaseCon
• Community Need
Product Mission - HBase Usage Size
3
Organic Example
Community Expresses Need for HBase UI
4. Product Mission - Problem & Competition
Lack of Accessibility
• Hard for beginners
Lack of Familiarity
• Unfamiliarity with Key-Value
Usability Challenges
• Lack of any web UI
• Lack of imaginative interface
• Low-level
• CDH - Command Line
4
HBase Manager’s Tabular View
Competing Application
5. • Open HBase usage to non-technical people
• Drive HBase adoption in startups/organizations
• Solve a pain point with a good product
• Knockout/JS/jQuery
• Django
• Thrift Hbase
5
Project Purpose – Addressing this Need
7. Technical Challenges - Design
• Design Innovation
• Problem: HBase is a key-value store, not a traditional RDB
• Solution: Collapse Sparse Data across HBase Tables
7
Tabular View HBase Browser Smartview
8. • Scale
• Problem: Performance on Millions of Columns
• Solution: Lazy loading & Truncation using Thrift FilterString
• Stream raw data cached MEM DOM elements dynamically generated
• b64encode binary data to preserve during ASCII dump
• Detect schema on preview reading byte headers
• Capped, Lazy Loaded & Bound to DOM via MVVM Pattern
Technical Highlight - Scale
8
100MB Currently Streamed
Thrift 5TB HBase Cluster
9. Technical Highlight - Flexible Searchbar
• Flexible Searchbar
• Problem: Need high-level tool for complex data fetching &
analysis
• Solution: Simple custom query language
• Supports HBase filter language
• Supports selection & Copy + Paste, Gracefully degrades in IE
• Autocomplete Help Menu
• Regenerate Rendered Position – Firefox Bug
9
Row Key
Scan Length
Prefix Scan
Column/Family Filters
Thrift Filterstring
Searchbar Syntax Breakdown
10. Current
• Ramp up for CDH 4.4 release
• Tutorial Video & Blog Post
• Tons of JIRAs
Roadmap, Timeline & Next Steps
10
Initial Release in Hue 2.5 Ship in CDH 4.4
11. 11
Any Questions?
Kevin, Platform Intern, Cloudera
Follow me: @Kevinverse
www.gethue.com
Hue HBase Browser
250+ commits
20000+ lines of code
∞ cups of coffee
Pageviews
Hey everyone, my name is Kevin Wang. I’m a Platform Intern on Hue and today I’m here today to talk about my project, Hue HBase Browser, (the first) an accessible and sophisticated interface that lets you explore Hbase data directly in your browser.
Hbase is big, lots of people use it, yet we still need a great and easy UI.Slide point: HBase is huge. Hbase is hard - key-value isn't the way we think about a lot of databases 1.Huge Industry Usage (list companies)Large community behind it (one of the most highly used In the hadoop ecosystem, HBaseCon, etc.)Big need for a simple UIAddress fig on rightTransition: so why is this need so big
It’s not very accessible, people are much more familiar with RDB, and most Hbase usage happens on very low-level. In fact in CDH, the easiest access we provide is through the command line.Slide point: HBase is huge. Hbase is hard - key-value isn't the way we think about a lot of databases 1.Nature of Hbase data is sparse, inconsistent, no defined schemaHbase is hard, not accessible or beginner friendlyDue to the nature of Hbase, hbase is misunderstood, people don’t think in Key-Value people are used to tablesThis also is true for many of the Uis available. A lot of the usability of these interfaces are impacted by the developers trying to represent hbase as a tableIt’s also inconvenient. UI’s right now provide a lot of low-level access, meaning sophisticated ways to view and analyze your data are impossible through a UIin the browser are significantly simplerHbase access is limited through code and shellIt’s time for something to simplify and revolutionize the way we use Hbase. Hbase browser aims to be the phpMyAdmin
In 2.5 the Hue team has successfully released the most sophisticated and accessible UI for HBase today and the first UI native to the browser.Open up possibilities for new roles in the HBase communityAccessibility for non-technical people
Cluster viewI’m inside an Hbase cluster right now, let’s go inside the analytics table.SmartviewWe’re in the analytics table and I’d like to introduce you to the SmartViewInnovative view that is a nice break from the tabular view of most database browsersCollapses sparse data across Hbase table, here are the rows and columns, each labled by a family and timestampThe cell values are inside, you can click to edit and it’s that simpleOther controls like sorting, filter columns, or pick a few and collapse. Of course all of this is available in bulkScale (basically skip and say I’ll talk about it later)We’ve loaded over 100,000 cells in a matter of secondsNow you might have noticed, but this table has loaded tens of thousands of columns and its still performantThis is because cells are truncated and then lazily loaded. For instance, when I browse a row by scrolling to the right, you can see the row gets more populated by cellsNow sorting and filters still sort the entire row, not only the ones that are visibleSearchbar (skip rendering talk, mention in slide and just do the demo – Complex data queries on top of your hbase clusters using our simple querying language)Show basic rows and scans Beautiful to interact withAutomatically renders and tags your input, you can even copy and paste into itIntuitive help menu to guide the userExtremely powerful data queries at your fingertips, combine, mix and match queries, add hbasefilterstrings, etc.You can do prefix scans, column filters, filterstrings, there’s an incredible amount of power at your fingertips with this moduleSchemaTo end this demo I’m going to go into one last table called `events` and I’d like to show you one of the most powerful features of HBase BrowserSo here I’ve got a lot of data loaded in of many different types. One of the hardest things to do via command line is just to preview data. However, we detect the schema and mime types stored.We’ve got JSON, XML, Images, and even PDFs.
(Optional Slide) HBase is a key-value store, not a traditional RDB. A tabular representation Maybe people won't care about this...
2 levels of streaming, stream from server, then stream from memoryAs stated earlier this app was built to scale. This is because Hbase Browser uses two levels of streaming. First we stream from the server and then generate cache it in memory and generate DOM elements on the fly. This gives a really smooth UX. You can take a look at some optimizations I’ve made here.
Solution – build our own querying language. This was especially hard to make cross-browser and render correctly. As you saw in the demo it’s really powerful and it goes beyond what I can show you today. You can view a syntax breakdown below but for now we have to move on.