In our webinar, representatives from TiVo, creator of a digital recording platform for television content, will explain how they implemented a new big data and analytics platform that dynamically scales in response to changing demand. You’ll learn how the solution enables TiVo to easily orchestrate big data clusters using Amazon Elastic Cloud Compute (Amazon EC2) and Amazon EC2 Spot instances that read data from a data lake on Amazon Simple Storage Service (Amazon S3) and how this reduces the development cost and effort needed to support its network and advertiser users. TiVo will share lessons learned and best practices for quickly and affordably ingesting, processing, and making available for analysis terabytes of streaming and batch viewership data from millions of households.
34. Why Presto ?
• Storage/Compute Separation
• Easy to add and remove worker nodes
• Query many different data sources (inside our VPC)
without separate load
• Good performance for analytical queries.
Not so good for transactional and simple queries…
• Managed (e.g., Qubole, Starburst)
37. Memory Pools:
• System memory pool (40% of Java heap space)
• Reserved memory pool (largest query’s memory usage)
• General memory pool (the rest of the memory)
38. • What if memory usage varies a lot between different queries?
• Use many inexpensive instances, or a few expensive instances?
• Compute optimized or memory optimized?
Working with reserved memory pool
How do we achieve that?
Conceptually, reserved memory pool should be the “high water mark”
while most queries complete in the general pool.
Solution: multiple clusters based on workload
Empiric testing found large instance type was slightly faster
Solution: Cost/Benefit Analysis
39. Choosing the Right Instance Type
r 4 . 4 x l a r g e
Instance
Class
Generation
Multiplier
For CPU and Mem
t 2 . 2 x l a r g e
c 5 . 16x l a r g e
Over 100 to choose from!
50. My big fat Presto query
Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Not fast enough!
100% CPU 100% CPU
51. Presto
Worker
Presto
Worker
Presto
Coordinator
1 Query
When will queries complete
at current rate?
Upscaling only works for new queries
Presto
Worker
Presto
Worker
100% CPU 100% CPUIdle Idle
Not so fast…
Not fast enough!
Maybe we should have sent this
query to a more powerful cluster?
Autoscaling is for concurrency
52. Results
Elastic scaling: Spin the nodes up/down based on demand
Benefit: Cost savings
Specialized clusters: Different clusters for different workload
Benefit: Efficiency
Storage/Compute separation: Store on Amazon S3, serve using Presto
Benefit: Scalability and data availability