Extending Flink for anomaly detection with Hierarchical Temporal Memory (HTM). Presented at Bay Area Apache Flink Meetup, in San Jose on June 27, 2016.
https://github.com/htm-community/flink-htm
10. Benefits
Good fit for Apache Flink
• Automated model-building
• Continuous learning
• Temporal awareness
10
Contrast with:
github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine
11. Benefits (con’t)
Good fit for HTM
• Integration w/ data pipeline
• Data connectivity
• e.g. Kafka, Twitter, HDFS, AWS Kinesis
• DSL for stream pre- and post-processing
• e.g. aggregation, transformation
• Distributed, reliable processing
• Event-Time Awareness
11
12. Features
`Learn` Operator
• Feeds input data to an HTM model
• Emits predictions and anomaly scores
• Supports keyed and non-keyed streams
Checkpoint Integration
• Models are serialized
• Facilitates exactly-once processing
Numenta RiverView Connector
• Public-domain temporal datasets
12
15. General Approach
1. Define Input Type
2. Add Data Source
3. Apply Learn Operator
• w/ HTM Network Definition
• w/ Field Encoders
4. Define Select Function
1. Process the inference data (predictions & anomaly
scores)
15
18. Advanced Topics
`Reset` Function
• Indicates the start of a temporal sequence
• For example: A,B,C,D,E, (reset), A,B,C,D,E
Stateful Functions
• Use `mapWithState` to store predictions for
the future
18
23. Learn Operator
Implement `AbstractStreamOperator`
Respect Flink’s type system
• Use the `TypeInformation` class
Use the State Handle abstraction
• * keyed streams only
Instrument your code
• Accumulators
23
24. RiverView Connector
Extend `RichParallelSourceFunction`
• Parallelism is user-defined
• Must handle partition assignment
Mix in `Checkpointed`
• Synchronize on checkpoint lock
Support cancel/stop
24