9. 42M 2.3M 7MFile their own taxes with
TurboTax
Run their small businesses
with QuickBooks
Manage their personal finances
with Mint
The Numbers Are Growing
65+ Applications, 25% of US GDP
10. Era of Windows Era
of
Web
E
ra
of
th
e
Cl
o
u
d
Era of DOS
Intuit - An Evolution Case Study
Compliant
data
M
o
bi
le
Fi
rs
t
1980s 1990s 2000s
• Employees: 150
• Customers: 1.3M customers
• Revenue: $33M
• Employees: 4,500
• Customers: 5.6M
• Revenue: $1.04B
• Employees: 7,700
• Customers: 37M
• Revenue: $4.2B
20162010
Regulatory data Transactional data Batch data Real time data Complex, secure data
12. Evolution of Big Data Pipelines – The Need
Secure Cloud
Environment
Single Cohesive
Data Pipeline
AB Testing
Personalization
Streaming
Profile Store
Fraud Detection
Support Varied
Use Cases
and more..
13. Evolution of Big Data Pipelines
Thin Slices - Minimal Viable Product
14. Evolution of Big Data Pipelines – The Recipe
Taking the
Data In
Transforming
Data
Handling The
Indigestion
With Scale
15. Evolution of Big Data Pipelines – The Recipe
No
Snowflakes
Solutions
Getting Vested
Stakeholders
Agreements
Establishing
The
Standards
16. Evolution of Big Data Pipelines – The Recipe
Breaking The
Silos
Moving Organization
In
One Direction
17. Evolution of Big Data Pipelines – The Recipe
● Making The Configuration Knobs Work
● At Scale
o Latency
o Throughput
● Schema, PII, Metadata, Changes, Audit, Governance
● Controlled Access←→ Innovation
● Error Monitoring
● Cluster Deployment
24. Data Pipeline (an example)
• Collect event stream data into one location
• Handle ~ 200k events / sec
• Payload ~ 3-5KB
• Enrich message and load it into Hive in defined SLA