Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Huguk lily
1. Lily
Smart data at scale
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
2. big data,
big problems
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
3. MOORE vs data
» IDC says Digital Universe will be 35 data
Zettabytes by 2020
» 20% = enterprise data (structured,
moore
curated, $$$)
» Facebook, Yahoo!, Google, Rapleaf,
Amazon show us how the
remaining 80% can be monetized
» some of them even rent out their data
platform
» ... at the cost of infrastructure lock-in
1 Zettabyte = 1,000,000,000,000,000,000,000 bytes, or 1 billion terrabytes
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
4. MOORE vs data
data
» coping with volume + need for
timeliness = parallel processing moore
» data becomes business-critical =
resilience through distributed
architectures
» Hadoop, MapReduce, HBase:
the future data platform
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
5. the CHALLENGES
» process ALL data
» process data in REAL-TIME
» derive INSIGHTS
» provide INSTANT FEEDBACK
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5
6. current thinking
ETL
data
data STORE analytics
warehouse
batched, off-line, overnight
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
7. 1. store and manage all YOUR data
DATA
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
8. 2. store user behaviour, nearby
DATA
USER
Behavior
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8
9. 3. analyze usage patterns
DATA data processing
USER
Behavior
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9
10. 4. add domain knowledge
DATA data processing
USER
Behavior
domain
knowledge
patterns
rules
keywords
lists
...
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
11. 5. process, in real-time
DATA data processing
recommendations
semantic augmentation
Analytics
USER
Behavior
domain
knowledge
patterns
rules
keywords
lists
...
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
12. 6. augment data
DATA data processing
recommendations
semantic augmentation
Analytics
USER
Behavior
domain
knowledge
patterns
rules
keywords
lists
...
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
13. data insights
SMARTER DATA data processing
s
relation
recommendations
semantic augmentation
Analytics
domain
knowledge
patterns
rules
keywords
lists
...
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13
14. data insights
SMARTER DATA data processing
s
relation
recommendations
semantic augmentation
Analytics
domain
knowledge
patterns
rules
keywords
lists
SMART DATA, at SCALE
...
... and in real time
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14
18. competitive
innovation
patents
(dis)SIMILARITY
companies
people
materials
processes
IP research
insights
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
19. outerthought
“ The world is moving from content as a cost
to data as an opportunity. We provide the tools
and the platform to let organisations maximally
benefit from the data they grow and collect. ”
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
20. Lily (now)
» Large-scale content storage, indexing and search
» Current pilots
e-retail mobile media isp e-gov ip research
» up-to now: 3 man-years investment (since Sept/2009)
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20
21. Lily 1.0 (CR)
data
data STORE + warehouse + analytics
real time
}
Lily 2.0
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
22. lily USPs
» Integrated approach, one-stop-data-shop
» No more flat file processing (Hadoop) ➙
interactive database (HBase) ➡ all data
» Real-time (vs. overnight)
» instant feedback loops
➡ real-time
» designed for on-line, interactive use
➡ easy
» Available in-house, SaaS possible
» Data Insights = data + customer retention
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
26. falling in love with Hbase : phase 1
» automatic scaling to large data sets
» fault-tolerance
» flexible datamodel with sparse data
» commodity hardware
» efficient random access
» community-based open source
» Java if possible
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
27. falling in love with Hbase : phase 2
» need for consistency
» atomic single-row updates
» M/R for index regeneration
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27
28. falling in love with Hbase : phase 3
HBase
» datamodel with column families and cell versioning
» ordered tables with range scans
» HDFS for blob storage
» Apache
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
37. HBase RowLog Library
» need for sync/async operations
» updating of secondary indexes (i.e. tables)
» feeding of Indexer (= bridge to SOLR index maintenance)
» not: transactions
» need for distribution and durability
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
38. HBase RowLog Library
» WAL » Queue
» guaranteed execution of synchronous » triggering of async actions
actions
» e.g. (re)index (updated) record with
» call doesn’t return before secondary
SOLR back-end
action finishes
» size depends on speed of back-end
» e.g. update secondary index tables
process
» if all goes well,
size = #concurrent ops
» useful outside of Lily context as well!
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
39. The Lily Indexer
sharding towards
indexing of multiple incremental index blob content
denormalization batch index building multiple SOLR
versions of a record updating extraction
instances
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 39
45. Lily and HBase
» adds high-level content model
» data types
» versioning
» blob storage on HDFS
» focus on sparse (efficient) storage
» RowLog for synchronous cross-table updates and async
message queues
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
46. Lily and SOLR
» provides flexible mapping between HBase content
model and SOLR index fields
» interactive and batch (M/R) index maintenance
» sharding
» use(s) SOLR as-is: loose, flexible, extensible coupling
» search access via SOLR (HTTP) API
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 46
47. Lily and CDH
» we intend to rely on CDH-‘blessed’ versions of HBase/
HDFS/ZK
» 700 patches and testing
» next: adopting similar distribution lay-out
» since we contribute patches to ASF HBase trunk, we would
expect CDH to track closely (until HBase 1.0)
» some Lily users could be interested in ‘CDH-level’ services
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 47
48. goodbye
» It’s open source !
» Content Repository: available now
(Lily model + HBase + SOLR + RowLog)
» Lily 1.0 soon, will mainly focus on differentiating open
source and enterprise edition
» “HBase is wa de max maat.”
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 48
49. Thank you !
for your attention
for your questions
» stevenn@outerthought.org
» @stevenn
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org