2. Realtime Data System Challenges
- System Design
- Data is big
- People want fresh data
- People want fast query
- Data Development
- Frequent and adhoc new requirements
- Many small data applications with slightly different data logic
- Data Operation
- Data could be delayed
- Application restart causes data loss or data duplication
- Need to do backfill
3. Data Delivery Guarantee
- Best Effort
- At Most Once
- At Least Once
- Most widely used
- Exactly Once
- Very Challenging doing it E2E
- High Cost
- Common Practice
- At Least Once - Steaming Processing
- Idempotent - Data Storage
8. Realtime Data Ingestion/Query - Challenges
- Continuous high throughput write
- Concurrent write and read (queries)
- Large table scan and aggregation
- Historical data + Recent data
9. Realtime OLAP Technologies
- Continuous high throughput write
- Concurrent write and read (queries)
- Large table scan and aggregation
- Historical data + Recent data
- Ingestion in memory
- Read-only segment
- Columnar Storage + Inverted Index
- Lambda Architecture
11. Takeaways
- Table Scan is Key
- Columnar Storage
- Inverted Index
- Data Ingestion Latency
- Balance of speed and cost
- Choose proper tool fitting your need