2. About me
• Martin Tepper
• Lead Developer at Travel IQ
• http://monogreen.de
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
3. Contents
• About Travel IQ
• The problem
• The solution
• The headaches
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
4. About Travel IQ
• Meta Search Engine for Flights and Hotels
• 9 Hotel Providers
• 21 Flight Providers
• ~ 6000 searches per day
• ~ 64k provider queries per day
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
5.
6. About Travel IQ
• Real-Time Aggregation
• Ruby/Rails based
• API-Driven
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
7. Quick aside
• Ruby: OO script language
• Rails: MVC Web application framework
• ActiveRecord: ORM framework
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
12. Strongly Normalized
• Very organized
• Reuse of models
• Saves disk space
• But …
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
13. sql = <<-SQL
SELECT MIN(outerei.id) FROM
(
SELECT
OBJ1.starts_at AS OBJ1_starts_at,
OBJ1.ends_at AS OBJ1_ends_at,
OBJ1.origin_id AS OBJ1_origin_id,
OBJ1.destination_id AS OBJ1_destination_id,
MIN(P1.price) AS the_price
FROM packages P1
LEFT JOIN journeys OBJ1 ON (P1.outbound_journey_id = OBJ1.id)
LEFT JOIN results R1 ON (R1.package_id = P1.id)
LEFT JOIN packagings PA1a ON (PA1a.package_id = P1.id AND PA1a.position = 1)
LEFT JOIN offers O1a ON (PA1a.offer_id = O1a.id)
WHERE R1.search_id IN (#{search_id})
AND R1.search_type = 'FlightSearch'
AND O1a.expires_at > #{expiring_after}
GROUP BY
OBJ1.starts_at, OBJ1.ends_at,
OBJ1.origin_id, OBJ1.destination_id
) AS innerei JOIN (
SELECT P2.id,
OBJ2.starts_at AS OBJ2_starts_at,
OBJ2.ends_at AS OBJ2_ends_at,
OBJ2.origin_id AS OBJ2_origin_id,
OBJ2.destination_id AS OBJ2_destination_id,
P2.price
FROM packages P2
LEFT JOIN results R2 ON (R2.package_id = P2.id)
LEFT JOIN journeys OBJ2 ON (P2.outbound_journey_id = OBJ2.id)
LEFT JOIN packagings PA2a ON (PA2a.package_id = P2.id AND PA2a.position = 1)
LEFT JOIN offers O2a ON (PA2a.offer_id = O2a.id)
WHERE R2.search_id IN (#{search_id})
14. The problem
• Strongly normalized database
• Complex query requirements
• Lots of joins
• ActiveRecord and rendering overhead
• Slow API calls
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
16. Solution 1: Schema
• Redo the schema
• Migration hard
• Some relationships hard to denormalize
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
17. Solution 2: Memcached
• Memcached
• Very fast response times
• But no real queries
→ Horrible abstraction layer
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
18. Memcached response times over time
10,0
response time of api call in seconds
8,0
6,0
4,0
2,0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
seconds after search start
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
19. Solution 3: MongoDB
• Document-oriented – less render overhead
• Grouping of offers
• Proper queries and counts
• Still quite fast
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
20. How we use MongoDB
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
21. How we use MongoDB
• Replica set with 2 nodes and 2 arbiters
• Two servers with 16 cores / 64GB RAM
→ run MySQL and MongoDB
• ~ 600 writes/s and reads/s normal load
• ~ 6000 writes/s doable
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
22. MongoDB response times over time
10,0
response time of api call in seconds
8,0
6,0
4,0
2,0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
seconds after search start
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
24. Problems with MongoDB
• Segmentation Faults
• Only in production
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
25. Problems with MongoDB
• Segmentation Faults
• Only in production
→ Replica Set helped a lot
→ Fixed with nightly build
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
26. Problems with MongoDB
• Write performance during peak load
• Lots of small concurrent writes
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
27. Problems with MongoDB
• Write performance during peak load
• Lots of small concurrent writes
→ Solved by bundling writes
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
28. Problems with MongoDB
• Hotel data too big to denormalize
• In separate collection
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
29. Problems with MongoDB
• Hotel data too big to denormalize
• In separate collection
→ Solved with app-level “join“
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
30. Problems with MongoDB
• Data consistency
• Typical caching problem
• Updates to MySQL also in MongoDB
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25
31. Problems with MongoDB
• Data consistency
• Typical caching problem
• Updates to MySQL also in MongoDB
→ Solved with callbacks in ActiveRecord
MongoDB as a queryable cache · Martin Tepper, monogreen.de · 2011-03-25