Lily is a repository made for the age of Data, and combines CDH, HBase and Solr in a powerful, high-level, developer-friendly backing store for content-centric application with ambition to scale. In this session, we highlight why we choose HBase as the foundation for Lily, and how Lily will allow users to not only store, index and search vast quantities of data, but also to track audience behaviour and generate recommendations, all in real-time.
17. lily data model
» adds a high-level data model on top of HBase’s byte[ ]’s
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
18. value types
» basic value types » datetime
» string » blob
» integer » uri
» long » parametrized value types
» double » list
» decimal » path
» boolean » link
» date » record
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
21. record version schema version
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
22. Versioning scopes
data gets overwritten versioned data version status data
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
33. consistencY
» many things can go wrong
» service failure: HBase, HDFS, Solr, Lily
» network failure, time skew
» node failure
» no Lily master node
» services can pick up where died ones left
» Zookeeper as service coordinator + lookup
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
35. High-level data model / easy API indexes
UI Framework SDK
(HUE) (HUE SDK)
Search
Dev2Dev
Workflow Scheduling Metadata tutoring,
(OOZIE) (oozie) (HIVE) integrated
deployment
and
Languages / enterprise
Data Compilers Fast usage metrics, support
Integration (PIG, HIVE) Read/Write analytics &
(FLUME, Access recommen-
SQOOP) (HBASE) dations
(PIG, HIVE)
Coordination
(ZOOKEEPER)
CDH
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
36. falling in love with Hbase
HBase = Google BigTable, open source
» datamodel with column families and cell versioning: flexible, for sparse data
» ordered tables with range scans
» HDFS for blob storage
» Apache
» consistent!
» atomic single-row updates
» automatic scaling to large data sets
» fault-tolerance
» commodity hardware
» efficient random access M/R for index regeneration
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
39. HBase indexing & RowLog Library
» building and querying » need for sync/async
indexes, GAE-style operations
» updating of secondary indexes
rowkey col col
content
A val3 foo6 (e.g. link tables)
table
B val2 foo7
» feeding of Indexer
(= indexes Lily-content into Solr)
rowkey col » not: transactions
order
index
table A val2-B
val3-A » need for distribution and
durability
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 39
43. from analytics to recommendations
» shorten distance between transactional and analytical aspects
» (near) real-time analytics > Insights
» "people are buying this now"
» algorithmical feedback > data-backed feedback
» based on Insights > recommendations
» shorten feedback cycle
» store data + metadata + attention data in one data system
» basis for analytical queries
» single point of growth
» incremental insights
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 43
44. challenges ahead
» solving real-time aspects
» in a distributed environment
» terra-byte-sized snapshots & backups?
» build resilience into data architecture
(i.e. against operator malfunction)
» presenting bigdata insights: new UIs
» less static / report-driven
» more 'Minority Report'
» from near-real-time data to real-time decision systems
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 44
45. conclusion: 4 trends
» real-time is rapidly becoming really important
» agility with complex/dynamic information
» predictive analysis
» one store for operational data + analytics
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
46. Thank you !
for your attention
for your questions
» stevenn@outerthought.org
» @stevenn
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org