This document summarizes a technical presentation about user content storage after JCR. It discusses limitations of Jackrabbit for social content and introduces the Sparse map concept for a highly concurrent, lock-free content store using sparse maps and distributed databases like Cassandra. Key features of Sparse include high concurrency, no synchronization, light-weight sessions and flat hierarchies for social content requirements.
6. Jackrabbit Cluster
Single Threaded Single Threaded Single Threaded
PM PM PM
Single Threaded
Shared
Journal
Lock Step Sequence
Replay
7. Social Content Store Requirements
Jackrabbit Social Content
High Concurrency
40
No Synchronisation
20
Lock free
0
Session footprint in K
Light Weight Sessions
Short Lived Sessions
Simple
Flat Hierarchies Clustering, Scaling
Versioning, ACLs Storage Agnostic
8. Sparse Map Concept
Map Addressing
KeySpace ColumnFamily RowID
Map
Map Storage
cA cB cC cD
RowID v1 v2 v9
RowID v1 v2
9. Sparse Map Concept updates
cA cB cC cD
RowID v1 v2 v9
cC cD
Update
v8 del
cA cB cC cD
RowID v1 v8
10. Hierarchy Model
JCR Sparse
Parent Parent
list of children child child
child hash(parent)
find all child
child
iterate nodes hash(parent)
list of with child
children hash(par hash(parent)
property ent)
Fast Listing Slow Listing
12. Threading Model
shared Not thread safe, not shared between threads, no sync,
no locks, 1K size
Repository Session
AccessConrtrolManager
ContentManager
AuthorizableManager
Storage Client API
shared
StorageClientPool StorageClient
Thread Bound Long Lived
Persistence Connection
14. Objects
Exposed Objects
UserInternal GroupInternal
Content User Group
InternalContent Authorizable
Manipulation Objects
AclModification
15. Data Formats (Authorizables)
Authorizable
id(string) map addressing
name(string)
type(string)
principals(string[]) n:au:ieb
User keyspace
Column Family
password(string)
rowid
Group
members(string[])
members(string[]) A key value pair named members containing a String[]
versionId(string) A key value pair where the key is a versionId and the value is a String
16. Data Formats (Content)
Content map addressing
_:cid(string)
to Structure Map n:cn:a/path/to/content
_path(string)
keyspace
parenthash(string) Column Family rowid
_:id(string)
_path(string) to Content Map
n:cn:d4f3s3g3sft
_blockId
keyspace
_blockId/streamA Column Family rowid
StreamContentHelper Files
BlockContentHelper Maps of byte[]
18. Content Versioning
Content map addressing
_:cid(string)
to Structure Map n:cn:a/path/to/content
_path(string)
keyspace
parenthash(string) Column Family rowid
_:id(string)
_path(string) to Content Map
n:cn:d4f3s3g3sft
_versionHistoryId
keyspace
_previousVersion Column Family rowid
_:id(string) _:id(string)
_path(string) _path(string)
versionId(string)
_versionHistoryId _versionHistoryId
versionId(string)
_previousVersion _previousVersion
ms timestamp _nextVersion _nextVersion
19. Content Linking
Content map addressing
_:cid(string)
to Structure Map n:cn:a/path/to/content
_path(string)
keyspace
parenthash(string) Column Family rowid
_:id(string)
_path(string) to Content Map
n:cn:d4f3s3g3sft
_versionHistoryId
keyspace
_previousVersion Column Family rowid
_:cid(string)
_path(string)
parenthash(string)
20. Cassandra Driver
keyspace, columnFamily, key
values -> byte[] in columns
incremental updates
bodies are rows of byte[], 64x1MB per row
find operations via lookup
Indexing and Finding
n:au:? user=Ian
user:au:ieb ieb(ieb), ib236(ib236)
21. Memory Driver
keyspace, columnFamily, key
values -> byte[] in columns
incremental updates
bodies are rows of byte[], 64x1MB per row
find operations via lookup
Indexing and Finding
n:au:? user=Ian
user:au:ieb ieb(ieb), ib236(ib236)
ConcurrentHashMap
22. JDBC Driver
keyspace, columnFamily, hash(key)
Whole Map -> byte[] in columns
Column Family Selects Table
bodies on Shared Filesystem
find operations via query
rowid column value
DB by DDL and SQL file
Derby, Oracle, MySQL, PostgreSQL
24. Core Sparse Performance
100% Concurrent, no waits
1K Sessions
Memory Cassandra MySQL
User Adds 33000/s 3500/s 100/s
Pure Jar, can be used without OSGi
25. Nakamura
Jackrabbit Sparse
Application Content User Content
Enterprise Content Social Content
Updated every month Updated every ms