13. High Availability
Problem: Summarization is slow.
Solution: Build a caching layer.
Cache
• 3x Replication
• "Dumb" load balancing
• Server Affinity (via Zookeeper)
14. Metaphor Shear
Why PostgreSQL?
Pros
• End-user expectations map to RDBMS world
• Indexing on common operations
o (ORDER BY, WHERE)
• Full-text search
• Latitude/longitude/geo functions with PostGIS
• Aggregation on summarized results
• Built-in persistence
15. Metaphor Shear
Why PostgreSQL?
Cons
• No built-in "versioning"
• Re-summarization, though infrequent, is expensive
• Need to map lisp-based query language to SQL
16. High Availability
Why PostgreSQL?
Other considerations
• Must pro-actively store attributes
• Schema changes are expensive
• Handling "upsert" operations is awkward
• Deletes are difficult (but infrequent)
• (related) No concept of row merge
22. Consistency Challenges
Cache Invalidation
• How do I handle new inputs?
o Shield the Input Store
Low-priority - shield the input store
Row-level invalidations
o Lazy
Fetch updated rows on summary request
Leverage postgres to track invalidations
o Decouple From Input API call
Async notification
23. Consistency Challenges
Cache Instance Management
• How do we handle query changes?
o filtering out spam inputs
o change the aggregation function
o give more weight to table owner's votes
25. Consistency Challenges
Cache Instance Management
• Better solution: Double Buffering
o Reload new version in background
o Continue to serve current table
"closest match" warning
o Allow switch-back
Continue to accept invalidations against old table
26. Performance
Encoding-compliant tablespaces
•Support UTF-8, non-Latin sort orders
Select Tables get SSD-based PostgreSQL caching
•See Jignesh Shah's terrific slides from PgEast 2009
•http://blogs.sun.com/jkshah/entry/effects_of_flash_ssd_on
•20x improvement in random reads (IO pattern for unclustered
index reads)
•2x improvement on sequential writes (generally pretty smooth)
27. What's next?
Encoding-compliant tablespaces
•Support UTF-8, non-Latin sort orders
Select Tables get SSD-based PostgreSQL caching
•See Jignesh Shah's terrific slides from PgEast 2009
•http://blogs.sun.com/jkshah/entry/effects_of_flash_ssd_on
•20x improvement in random reads (IO pattern for unclustered
index reads)
•2x improvement on sequential writes (generally pretty smooth)
28.
29. How can I use Factual?
Web UI
• Dataset Creation
• Workbench
http://www.factual.com/
APIs
• Server API
http://wiki.developer.factual.com/FrontPage
• Visualizations
http://wiki.developer.factual.com/Factual-Visualization-
Documentation