MyHeritage BackEnd group was built to scale to support 77 million users, 27 million family trees containing over 1.6 billion individuals, and over 6 billion historical documents.
With big data comes big challenges and this presentation explains the structure, the methodology and the technologies that support scaling up.
The presentation covers:
• How cross R&D continuous deployment and R&D structure supports scalability
• Sharding techniques
• Cassandra usage at MyHeritage
• Our search engine scaling structure
1. Backend Group - build to scale
Ran Levy, Backend Director
ranl@myheritage.com
2. Agenda
• Introduction to MyHeritage
• R&D structure to support scaling
• R&D methodology to support scaling up
• Scaling up technologies and solutions
– Micro-services architecture
– Relational DB scaling out
– Data storing for low latency
– SOLR scaling up
– Queuing services
– File servers
– Caching services
– Statistics services
3. Family history for Families
Building next generation tools for family history
enthusiasts and their families
Discover Preserve Share
4. Challenge: Scale
77 million registered users
1.7 billion tree profiles in 27 million trees
6 billion historical records
200 million photos
42 languages
1 million daily emails
5. R&D structure to support scaling up –
guilds and band
Band Master
Missions
Skill
Guilds(ExpertiseandQuality)
Bands (Delivery)
Guild Manager
Product Owner
…
Guild member
6. R&D Methodology to support scaling up
• Full continuous deployment
–All developers are working on trunk
–Commit triggers flow that ends in
production update
7. R&D Methodology to support scaling up
• Procedure is backed up with:
–Exposure flag (controlled by external UI)
–Code reviews
–Unit/integration tests (over 80% coverage)
–Sensors for each released features.
–Automatic logs and stats scanning
11. Agenda
Introduction to MyHeritage
R&D structure to support scaling
R&D methodology to support scaling up
• Scaling up technologies and solutions
– Micro-services architecture
– Relational DB scaling out
– Data storing for low latency
– SOLR scaling up
– Queuing services
– File servers
– Caching services
– Statistics services
12. Micro-services architecture
• Monolithic code can’t scale for long
– Localization of changes
– Concurrency of development
– Limits variety of coding languages
– Scaling up specific services
13. Micro-services architecture
• Solution:
– Micro-services architecture
• Migration from monolithic code is gradual
– Starting with isolated service
– Gradual replacement of core services
14. Relational DB scaling out techniques
• Data sharding
• Master – slaves
• Using MySQL 5.5 Percona
15. Relational DB scaling out techniques –
approaches for data sharding
• Consistent hashing based on key
• Used for MyHeritage Historical Records (6B records)
Func(ABCD)
Read(ABCD)
16. Relational DB scaling out techniques –
approaches for data sharding
• Consistent hashing pros & cons
– Pros:
• Supports high performance lookup
• “Infinite scale”
– Cons:
• Re-sharding is not trivial and requires code change.
17. Relational DB scaling out techniques –
approaches for data sharding
• Mapping table
• Use case: Users’ data in MyHeritage
Read(xyz)
Read(XYZ) from
Specific DB instance
XYZ key
lookup
18. Relational DB scaling out techniques –
approaches for data sharding
• Mapping table pros & cons
– Pros:
• Easy re-sharding and scaling up.
– Cons:
• Requires DB lookup prior to data access.
• Limited scalability.
19. Relational DB scaling out techniques –
Master Slave
Active standby
R/W
flow
R/O
flow
Master
20. Data Storing for low latency
• (Berkley DB, MapDB)
• Cassandra
– Account Store
– People Store
– (Counters system, A/B testing data)
21. Data Storing for low latency – Account Store
• Motivations
– Access account data in sub 1 msec
– High scale (~400M rows)
– Online schema changes
– Reduce OPEX
– Linear Scaling out architecture
22. Data Storing for low latency – Account Store
• Solution:
– Cassandra
– Apache Cassandra is an open source, distributed, decentralized,
elastically scalable, highly available, fault-tolerant, tuneable
consistent, column-oriented database.
23. Data Storing for low latency – Account Store
• Cluster main characteristics:
– 5 nodes, 500GB SSD, Replication Factor - 3
– Community Edition 2.0.13
• Very low maintenance (no repair –pr )
• Using counters
• Using secondary indexes
• Using VNodes for easier maintenance
• Using SizeTieredCompactionStrategy compactions (writes optimized)
• Achieved performance
– Avg. local read latency: 0.108 ms
– Avg. local write latency: 0.022 ms
24. Data Storing for low latency – People Store
(in progress)
• Main Motivations
– Access data rapidly
• Avoiding the need to access multiple partitions
– High scale (scaling to 2B rows)
25. Search technologies
• Motivations
– Search billion of records in sub 200 msec.
– Cope with differences: languages, spellings,
inaccuracies, missing data.
– Ranking of results.
26. Search technologies
• Solution:
– SOLR
– Solr is highly reliable, scalable and fault tolerant, providing
distributed indexing, replication and load-balanced querying,
automated failover and recovery, centralized configuration and
more.
27. Search technologies - SOLR
• Solr distributed search allows sharding a big index into smaller chunks
running on multiple hosts. We do not utilize Solr 4’s SolrCloud feature.
• Indexing: Client app is responsible to index each document on a specific shard
(using some hashing of document ID)
• Search: Client app sends request to aggregator Solr instance, which in turn
queries all shards, and merges the results into one response (sort, paging)
Index Shards:
Application: Indexing
Solr Solr Solr
Indexing
Search
. . .
Search
Aggregator
Solr
30. Queuing System – Kafka High Level
Overview
Broker 1
Family Tree
changes Topic
part 1
part 2
part 32
Indexing
Consumers
RecordMatching
Logstash reader
Web
Producers
Daemons
Face recog.
Activity Topic
part 1
part 2
part 32
DRBD
replica
Of
Broker
2
Broker 2
Family Tree
changes Topic
part 1
part 2
part 32 DRBD
replica
Of
Broker
1
…
…
…
…
Notifications sys.
Notifications
Topic
Activity Topic
part 1
part 2
part 32
…
Notifications
Topic
31. Kafka @Myheritage – Consumers (Indexing)
EventProcessor
1 Per consumer
type, reader per
partition
Broker 2
Broker 1
EventProcessor
EventProcessor
IndexingQueue
IndexingWorkers
IndexingWorkers
IndexingWorkers
Fetch work
SOLR
Update item
KafkaWatermark
Get/update watermark
Add event to queue
32. File servers
• Traditional – File Servers
–~30 file servers
–Total storage: 80 TB
–HTTP(s) accessible with REST APIs
33. File servers
• CEPH
– Use cases:
• SEO serving
• OpenStack
– Version in production: FireFly
– Using 40TB
– Lessons learnt:
• Do not use large buckets without index sharding (support from Hammer)
• If you can’t use Hammer shard your buckets (or bad things WILL happen)
• Don’t use the high density nodes
35. Statistics Services
• In-house MySQL
• Graphite usage for Infrastructure
• In research for app metrics:
– Graphite over InfluxDB
– Cyanite (Graphite over Cassandra)
• Automated Anomaly Detection for infrastructure (Anodot)
36. Logging Services
• Central logging (including app logging + infrastructure):
in-house in MySQL + ELK stack