Apache Apex Fault Tolerance and Processing Semantics

Apache Apex
Fault Tolerance and Processing Semantics
San Francisco Apache Apex Meetup
April 6th 2016
Thomas Weise, Apache Apex PPMC
@thweise thw@apache.org

Stream Processing
• Paradigm shift in big data processing
• Data from a variety of sources (Kafka, files, social media etc.)
• Unbounded data sources
• A batch can be processed as stream (but a stream is not a batch)
• Processing with temporal boundaries (windows)
• Results stored to a variety of sinks or destinations
ᵒ Streaming application can also serve data with very low latency
2
Browser
Web Server
Kafka Input
(logs)
Decompress,
Parse, Filter
Dimensions
Aggregate Kafka
Logs
Kafka

Apache Apex Features
• In-memory Stream Processing
• Scale out, Distributed, Parallel, High Throughput
• Windowing (temporal boundary)
• Reliability, Fault Tolerance
• Operability
• YARN native
• Compute Locality
• Dynamic updates
3

Native Hadoop Integration
5
• YARN is
the
resource
manager
• HDFS used
for storing
any
persistent
state

Streaming Windows
7
 Application window
 Sliding window and tumbling window
 Checkpoint window
 No artificial latency

Fault Tolerance
8
• Operator state is checkpointed to persistent store
ᵒ Automatically performed by engine, no additional coding needed
ᵒ Asynchronous and distributed
ᵒ In case of failure operators are restarted from checkpoint state
• Automatic detection and recovery of failed containers
ᵒ Heartbeat mechanism
ᵒ YARN process status notification
• Buffering to enable replay of data from recovered point
ᵒ Fast, incremental recovery, spike handling
• Application master state checkpointed
ᵒ Snapshot of physical (and logical) plan
ᵒ Execution layer change log

Checkpointing Operator State
9
• Save state of operator so that it can be recovered on failure
• Pluggable storage handler
• Default implementation
ᵒ Serialization with Kryo
ᵒ All non-transient fields serialized
ᵒ Serialized state written to HDFS
ᵒ Writes asynchronous, non-blocking
• Possible to implement custom handlers for alternative approach to
extract state or different storage backend (such as IMDG)
• For operators that rely on previous state for computation
ᵒ Operators can be marked @Stateless to skip checkpointing
• Checkpoint frequency tunable (by default 30s)
ᵒ Based on streaming windows for consistent state

• In-memory PubSub
• Stores results emitted by operator until committed
• Handles backpressure / spillover to local disk
• Ordering, idempotency
Operator
1
Container 1
Buffer
Server
Node 1
Operator
2
Container 2
Node 2
Buffer Server
10

Application Master State
11
• Snapshot state on plan change
ᵒ Serialize Physical Plan (includes logical plan)
ᵒ Infrequent, expensive operation
• WAL (Write-ahead-Log) for state changes
ᵒ Execution layer changes
ᵒ Container, operator state, property changes
• Containers locate master through DFS
ᵒ AM can fail and restart, other containers need to find it
ᵒ Work preserving restart
• Recovery
ᵒ YARN restarts application master
ᵒ Apex restores state from snapshot and replays log

• Container process fails
• NM detects
• In case of AM (Apex Application Master), YARN launches replacement
container (for attempt count < max)
• Node Manager Process fails
• RM detects NM failure and notifies AM
• Machine fails
• RM detects NM/AM failure and recovers or notifies AM
• RM fails - RM HA option
• Entire YARN cluster down – stateful restart of Apex application
Failure Scenarios
12

NM NM
Resource
Manager
Apex AM
3
2
1
Apex AM
1 2
3
NM
Failure Scenarios
NM
13

Failure Scenarios
… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1
sum
0
… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1
sum
7
… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1
sum
10
… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1
sum
7
14

Processing Guarantees
15
At-least-once
• On recovery data will be replayed from a previous checkpoint
ᵒ No messages lost
ᵒ Default, suitable for most applications
• Can be used to ensure data is written once to store
ᵒ Transactions with meta information, Rewinding output, Feedback from
external entity, Idempotent operations
At-most-once
• On recovery the latest data is made available to operator
ᵒ Useful in use cases where some data loss is acceptable and latest data is
sufficient
Exactly-once
ᵒ At-least-once + idempotency + transactional mechanisms (operator logic) to
achieve end-to-end exactly once behavior

End-to-End Exactly Once
16
• Becomes important when writing to external systems
• Data should not be duplicated or lost in the external system even in case of
application failures
• Common external systems
ᵒ Databases
ᵒ Files
ᵒ Message queues
• Platform support for at least once is a must so that no data is lost
• Data duplication must still be avoided when data is replayed from checkpoint
ᵒ Operators implement the logic dependent on the external system
• Aid of platform features such as stateful checkpointing and windowing
• Three different mechanisms with implementations explained in next slides

Files
17
• Streaming data is being written to file on a continuous basis
• Failure at a random point results in file with an unknown amount of data
• Operator works with platform to ensure exactly once
ᵒ Platform responsibility
• Restores state and restarts operator from an earlier checkpoint
• Platform replays data from the exact point after checkpoint
ᵒ Operator responsibility
• Replayed data doesn’t get duplicated in the file
• Accomplishes by keeping track of file offset as state
ᵒ Details in next slide
• Implemented in operator AbstractFileOutputOperator in apache/incubator-
apex-malhar github repository available here
• Example application AtomicFileOutputApp available here

Exactly Once Strategy
18
File Data
Offset
• Operator saves file offset during
checkpoint
• File contents are flushed before
checkpoint to ensure there is no
pending data in buffer
• On recovery platform restores the file
offset value from checkpoint
• Operator truncates the file to the
offset
• Starts writing data again
• Ensures no data is duplicated or lost
Chk

Transactional databases
19
• Use of streaming windows
• For exactly once in failure scenarios
ᵒ Operator uses transactions
ᵒ Stores window id in a separate table in the database
ᵒ Details in next slide
• Implemented in operator AbstractJdbcTransactionableOutputOperator in
apache/incubator-apex-malhar github repository available here
• Example application streaming data in from kafka and writing to a JDBC
database is available here

Exactly Once Strategy
20
d11 d12 d13
d21 d22 d23
lwn1 lwn2 lwn3
op-id wn
chk wn wn+1
Lwn+11 Lwn+12 Lwn+13
op-id wn+1
Data Table
Meta Table
• Data in a window is written out in a single
transaction
• Window id is also written to a meta table
as part of the same transaction
• Operator reads the window id from meta
table on recovery
• Ignores data for windows less than the
recovered window id and writes new data
• Partial window data before failure will not
appear in data table as transaction was not
committed
• Assumes idempotency for replay

Stateful Message Queue
21
• Data is being sent to a stateful message queue like Apache Kafka
• On failure data already sent to message queue should not be re-sent
• Exactly once strategy
ᵒ Sends a key along with data that is monotonically increasing
ᵒ On recovery operator asks the message queue for the last sent message
• Gets the recovery key from the message
ᵒ Ignores all replayed data with key that is less than or equal to the recovered key
ᵒ If the key is not monotonically increasing then data can be sorted on the key at the end
of the window and sent to message queue
• Implemented in operator AbstractExactlyOnceKafkaOutputOperator in
apache/incubator-apex-malhar github repository available here

Resources
22
• Exactly-once processing: https://www.datatorrent.com/blog/end-to-end-
exactly-once-with-apache-apex/
• Examples with Kafka and Files: https://github.com/tweise/apex-
samples/tree/master/exactly-once
• Learn more: http://apex.incubator.apache.org/docs.html
• Subscribe - http://apex.incubator.apache.org/community.html
• Download - http://apex.incubator.apache.org/downloads.html
• Apex website - http://apex.incubator.apache.org/
• Follow @ApacheApex - https://twitter.com/apacheapex
• Meetups - http://www.meetup.com/topics/apache-apex

Apache Apex Fault Tolerance and Processing Semantics

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (12)

Semelhante a Apache Apex Fault Tolerance and Processing Semantics

Semelhante a Apache Apex Fault Tolerance and Processing Semantics (19)

Mais de Apache Apex

Mais de Apache Apex (18)

Último

Último (20)

Apache Apex Fault Tolerance and Processing Semantics