Fifth elephant 2018 - Incremental Processing

INCREMENTAL TRANSFORM OF TRANSACTIONAL DATA
MODELS TO ANALYTICAL DATA MODELS IN NEAR-REAL-TIME
Govind Pandey

Definitely NOT the “the greatest invention since sliced bread”

We came together@Flipkart to create this
Ashendra Bansal
Mayank Verma
Govind Pandey
Darshak Bhagat
Bala N
Saloni Khandelwal

Prasanna R
Influencers
Vinoth Chandar
The case for incremental processing

@Flipkart: Supply Chain Automation, Predictive Optimizations and
Actionable Insights through Data and AI.

● Motivators
● Algorithm
● Trade-offs
Agenda

Grocery fulfillment journey
Multiple systems participate in the journey
Time to act drives latency requirements
100% accuracy expected
Minimal effort for business monitoring

Characteristics of transactional data models
Normalized
Complex, Directed,
Deterministic relationships
Longer life cycles
Tumbling time windows

Normalization helps parallelism and fast writes
order_items picker
order_items shipment shipment van
Order (1) Picklists (2)
Vans (4)
order order_items
Shipments (3)

time order picker
t1 o1 p1
order shipment
o1 s1
shipment van
s1 v1
Picklists Shipments Vans
key order picker pick_time shipment van location
o1+s1 o1 p1 t1 s1 v1 <lat, long>
Pick_To_Dispatch
Composite Key: order+shipment
Denormalization helps fast reads

Incremental processing algorithm
Identify δ
mutations
and join
Replicate
Identify
temporal
window
Identify
delete
candidates
Apply Enrich

Source
Source
Source
Source
Source
Blockitecture and flows
R
E
L
A
Y
Staging
Area
Replica
Replica
Replica
Replica
Replica
Transform & Enrich
δ
Transform
Area
Denormalized
Datamodel
Visualize
Data Quality (Freshness, Completeness)

Identify δ
mutations
and join
Replicate
Identify
temporal
window
Identify
delete
candidates
Apply Enrich

time order picker
10:00 o1 p1
10:00 o2 p2
Picklists
order shipment
Shipments
shipment van
Vans
start end time key order picker pick_time shipment van location
9:50 10:00 10:05 o1+null o1 p1 10:00
9:50 10:00 10:05 o2+null o2 p2 10:00
Pick_To_Dispatch
At 10:05

time order picker
10:00 o1 p1
10:00 o2 p2
10:10 o3 p2
10:10 o4 p3
order shipment
o1 s1
o1 s2
shipment van
previous start previous end
9:50 10:00
current start current end
10:00 10:10
At 10:10

time order picker
10:00 o1 p1
10:00 o2 p2
10:10 o3 p2
10:10 o4 p3
order shipment
o1 s1
o1 s2
shipment van
current start current end
10:00 10:10
Left Outer
Join
Inner
Join
At 10:10

9:50 10:00 10:05 o1+null o1 p1 10:00
9:50 10:00 10:05 o2+null o2 p2 10:00
10:00 10:10 10:15 o1+s1 o1 p1 10:00 s1
10:00 10:10 10:15 o1+s2 o1 p1 10:00 s2
10:00 10:10 10:15 o3+null o3 p2 10:10
10:00 10:10 10:15 o4+null o4 p3 10:10
Pick_To_Dispatch
At 10:15

time order picker
10:00 o1 p1
10:00 o2 p2
10:10 o3 p2
10:10 o4 p3
order shipment
o1 s1
o1 s2
shipment van
At 10:10
Primary Key: Order+Shipment
Delete Candidate: Order+null

time order picker
10:00 o1 p1
10:00 o2 p2
10:10 o3 p2
10:10 o4 p3
10:15 o5 p3
10:20
10:20
order shipment
o1 s1
o1 s2
o2 s4
o3 s3
o4 s5
shipment van
s1 v1
s2 v1
Left Outer
Join
Inner
Join
Inner
Join
At 10:20

10:10 10:20 10:25 o1+s1 o1 p1 10:00 s1 v1
10:10 10:20 10:25 o1+s2 o1 p1 10:00 s2 v1
10:10 10:20 10:25 o2+null o2 p2 10:00
10:10 10:20 10:25 o3+null o3 p2 10:10
10:10 10:20 10:25 o4+null o4 p3 10:10
10:10 10:20 10:25 o2+s4 o2 p2 10:00 s4
10:10 10:20 10:25 o3+s3 o3 p2 10:10 s3
10:10 10:20 10:25 o4+s5 o4 p3 10:10 s5
10:10 10:20 10:25 o5+null o5 p3 10:15
Pick_To_Dispatch
At 10:25

10:10 10:20 10:25 o1+s1 o1 p1 10:00 s1 v1
10:10 10:20 10:25 o1+s2 o1 p1 10:00 s2 v1
10:10 10:20 10:25 o2+s4 o2 p2 10:00 s4
10:10 10:20 10:25 o3+s3 o3 p2 10:10 s3
10:10 10:20 10:25 o4+s5 o4 p3 10:10 s5
10:10 10:20 10:25 o5+null o5 p3 10:15
Pick_To_Dispatch
At 10:25

10:10 10:20 10:25 o1+s1 o1 p1 10:00 s1 v1 <lat,lng>
10:10 10:20 10:25 o1+s2 o1 p1 10:00 s2 v1 <lat,lng>
10:10 10:20 10:25 o2+s4 o2 p2 10:00 s4
10:10 10:20 10:25 o3+s3 o3 p2 10:10 s3
10:10 10:20 10:25 o4+s5 o4 p3 10:10 s5
10:10 10:20 10:25 o5+null o5 p3 10:15
Pick_To_Dispatch
At 10:30
Orders Received Orders Picked Shipments Packed Shipments Dispatched
4 4 5 2

Configurable dashboards and ad-hoc analysis

Data is processed when it becomes available
ᶬ21, ᶬ22, ᶬ23, ᶬ24, ᶬ25 ...Shipment
ᶬ31, ᶬ32, ᶬ33, ᶬ34 ᶬ35, ᶬ36 ᶬ37, ᶬ38...Picklist
ᶬ11, ᶬ12, ᶬ13, ᶬ14, ᶬ15, ᶬ16, ᶬ17 ...Order Booking
Iteration 1
Iteration 2

Grocery control tower
8 different microservices
35 entities and 34 joins
100% completeness in 15 mins
Configuration driven

Data processing implementation trade offs
LATENCY
ACCURACY LOW COST
AGILITY

Batch - High accuracy, high latency, low cost
Batch Bulk writes leverage cheaper disks
Range queries take longer for scans
Compute can be shared
Replays are simpler but take as long

Stream Record level updates need fast writes
Range scans slow down processing
Compute is consistently engaged
Replay is complex (ϰ) or infeasible (𝝺)
Stream - Lower accuracy, low latency, low cost

Incremental Replication needs fast writes
Joins need fast scans
Compute is shared
Replays addressed by design
Incremental - High accuracy, mid latency, mid cost

Stream Batch Incremental
Lower accuracy, low
latency, low cost
High accuracy, high
latency, low cost
High accuracy, medium
latency, medium cost
Data processing implementation trade offs

Applications of Incremental Processing
Positive Indicators
● Time to act is 30 minutes or higher
● Accuracy is crucial
● Incremental visibility is acceptable
● Multiple systems come together with complex join criteria
Negative Indicators
● Low infrastructure cost is a constraint
● Independent systems

Thick, Medium and Thin Slices - Choose yours
BATCH STREAM
INCREMENTAL

Next up
Dashboards, thy end is nigh!
Set for release in Nov 2018

Thick, Medium and Thin Slices - Choose yours
BATCH STREAM
INCREMENTAL
THANK YOU!

time_d order picker
10:00 o1 p1
10:00 o2 p2
10:10 o3 p2
10:10 o4 p3
order shipment
o1 s1
o6 s6
shipment van
Left Outer
Join
Inner
Join
?
Example - Out of order mutations

time_d order picker
10:00 o1 p1
10:00 o2 p2
10:10 o3 p2
10:10 o4 p3
10:20 o6 p6
order shipment
o1 s1
o6 s6
shipment van
Left Outer
Join
Example - Out of order mutations

Fifth elephant 2018 - Incremental Processing

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Fifth elephant 2018 - Incremental Processing

Semelhante a Fifth elephant 2018 - Incremental Processing (20)

Último

Último (20)

Fifth elephant 2018 - Incremental Processing

Notas do Editor