O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 64 Anúncio

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Baixar para ler offline

At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.

At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Event-Driven Messaging and Actions using Apache Flink and Apache NiFi (20)

Anúncio

Mais de DataWorks Summit (20)

Mais recentes (20)

Anúncio

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

  1. 1. EVENT-DRIVEN MESSAGING AND ACTIONS USING APACHE FLINK AND APACHE NIFI Dave Torok Distinguished Architect Comcast Corporation 23 May, 2019 DataWorks Summit – Washington, DC – 2019
  2. 2. 3 COMCAST CUSTOMER RELATIONSHIPS 30.7 MILLION OVERALL CUSTOMER RELATIONSHIPS AS OF Q1 2019 INCLUDING: 27.6 MILLION HIGH-SPEED INTERNET 21.9 MILLION VIDEO 11.4 MILLION VOICE ONE MILLION CUSTOMER NET ADDITIONS IN 2018
  3. 3. 4 DELIVER THE ULTIMATE CUSTOMER EXPERIENCE IS THE CUSTOMER HAVING A GOOD EXPERIENCE WITH OUR PRODUCTS AND SERVICE? IF THE CUSTOMER ENGAGES US DIGITALLY, CAN WE OFFER A SELF-SERVICE EXPERIENCE? GUIDE THE CUSTOMER THROUGH A JOURNEY WITH DIGITAL COMMUNICATIONS KEEP THE CUSTOMER INFORMED WITH THE RIGHT MESSAGE TO THE RIGHT PERSON AT THE RIGHT TIME REDUCE TIME AND COST TO THE BUSINESS AND THE CUSTOMER
  4. 4. How do we personalize the conversation? Comcast collects, stores, and uses all data in accordance with our privacy disclosures to users and applicable laws.
  5. 5. 6 EXAMPLE ONE-TIME MESSAGE
  6. 6. 7 EXAMPLE – NEW SERVICE INSTALL
  7. 7. 8 EXAMPLE - APPOINTMENT REMINDERS
  8. 8. 9 FOLLOW UP SATISFACTION AND SURVEY
  9. 9. 1 0 EXAMPLE WITH SMS RESPONSES FOLLOWING UP ON THE INTERACTION: Is the problem resolved? If so, great! If not, offer to talk with an agent.
  10. 10. APACHE Apache®, Apache NiFi®, and the NiFi logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
  11. 11. 1 2 WHAT IS APACHE NIFI? ENTERPRISE DATA FLOW…. GET STUFF FROM SOMEWHERE TO SOMEWHERE ELSE Source Systems FTP HTTP SFTP Kafka RabbitMQ JDBC Kinesis S3 …. Destination Systems FTP HTTP SFTP Kafka RabbitMQ JDBC Kinesis S3 …. Do Stuff! Transform Validate Enrich Protocol Conversion …. 350+ Processors, Controllers, and Reporting Tasks
  12. 12. 1 3 EXAMPLE NIFI FLOW
  13. 13. 1 4 WHAT IS NIFI GOOD FOR? ASYNCHRONOUS AND STATELESS STREAM PROCESSING PROTOCOL CONVERSION FORMAT CONVERSION AND TRANSFORMATION PUSH AND PULL SCENARIOS E.G. FTP LOTS OF DIFFERENT SOURCE AND SINK TYPES MILD CONTENT ENRICHMENT SERVICE CALLS / REST CALLS JDBC / CACHE LOOKUP RAPIDLY CHANGING BUSINESS LOGIC*** RAPID PROTOTYPING*** CONFIGURE RATHER THAN CODE *** EXTENSIBILITY (SCRIPTING PROCESSORS, CUSTOM (JAVA) PROCESSORS)
  14. 14. 1 5 OUR TEAM’S HISTORY WITH NIFI FIRST PRODUCTION WORKFLOW MAY 2016 RECENT SNAPSHOT: • 65+ USE CASES • 900+ PROCESS GROUPS • 7400+ PROCESSORS • 44000+ THREADS • 12 NODE PRIMARY PRODUCTION CLUSTER (16VCPU/32GB)
  15. 15. 1 6 NIFI –TOP LEVEL
  16. 16. 1 7 TOP PROCESSORS IN OUR NIFI CLUSTER PROCESSING 1114 UpdateAttribute 923 RouteOnAttribute 732 JSON-related (incl. 240 JOLTTransformJson) 729 ReplaceText 527 ExecuteScript (many for HTTP Retry Logic) 516 LogAttribute 162 ControlRate 98 AVRO-related 87 ExtractText COMMUNICATION 207 InvokeHTTP 128 PutSql / ExecuteSql 39 ConsumeKafka 10 PublishKafka/PutKafka 41 GetKinesisStream 6 PutKinesisStream 2 PutSFTP 2 Consume AMQP
  17. 17. APACHE FLINK Apache®, Apache Flink®, and the squirrel logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
  18. 18. 1 9 WHAT IS APACHE FLINK? REAL-TIME STREAM PROCESSING FRAMEWORK DISTRIBUTED PARALLEL COMPUTE ENGINE SIMILAR API STYLE TO APACHE SPARK LOW LATENCY, HIGH PERFORMANCE STATEFUL SOURCE Reduce Filter Join SOURCE Map Sum SINK
  19. 19. 2 0 FLINK STREAMING API STYLES TABLE / SQL API SQL PROVIDED BY APACHE CALCITE SELECTS, JOINS, GROUP-BY, AGGREGATIONS WINDOWS TIME AND COUNT WINDOW-BASED JOINS WINDOW-BASED AGGREGATIONS TEMPORAL TABLES UDF (USER-DEFINED FUNCTIONS) DATASTREAM API MAP / REDUCE / FOLD FILTER AGGREGATIONS (SUM, MIN, MAX) WINDOWS TIME AND COUNT TUMBLING, SLIDING STREAM UNION, JOIN, CO-MAP ITERATIONS NOTE: THERE IS ALSO A BATCH API
  20. 20. 2 1 EXAMPLE “WORD COUNT” CODE DataStream<WordWithCount> windowCounts = textInputStream .flatMap(new FlatMapFunction<String, WordWithCount>() { public void flatMap(String value, Collector<WordWithCount> out) { for (String word : value.split("s")) { out.collect(new WordWithCount(word, 1L)); }} }) .keyBy("word") .timeWindow(Time.seconds(5)) .reduce(new ReduceFunction<WordWithCount>() { public WordWithCount reduce(WordWithCount a, WordWithCount b) { return new WordWithCount(a.word, a.count + b.count); } });
  21. 21. 2 2 WHAT IS FLINK GOOD FOR? HIGH THROUGHPUT STREAM PROCESSING “MAP / REDUCE” STYLE PARALLEL COMPUTING STATEFUL PROCESSING AGGREGATIONS AND TIME WINDOWS MULTIPLE-STREAM OPERATIONS SQL-ON-STREAM HOWEVER… LIMITED “ORCHESTRATION” LIMITED SOURCE / SINK TYPES
  22. 22. 2 3 FLINK CONNECTORS ALSO VIA APACHE BAHIR: APACHE ACTIVEMQ (SOURCE/SINK) APACHE FLUME (SINK) REDIS (SINK) AKKA (SINK) NETTY (SOURCE) FLINK PROJECT: APACHE KAFKA (SOURCE/SINK) AMAZON KINESIS STREAMS (SOURCE/SINK) RABBITMQ (SOURCE/SINK) APACHE NIFI (SOURCE/SINK) APACHE CASSANDRA (SINK) ELASTICSEARCH (SINK) HADOOP FILESYSTEM – HDFS (SINK) TWITTER STREAMING API (SOURCE)
  23. 23. 2 4 OUR TEAM’S HISTORY WITH FLINK USED FOR 4+ DIFFERENT KINDS OF USE CASES FIRST DEV – NOV 2016 FIRST PRODUCTION – MAY 2018 CUSTOMER EXPERIENCE USE CASE: • 7 BILLION DATA POINTS PER DAY PRODUCTION SIZE FOR ABOVE: • 14 FLINK APPLICATION CLUSTERS • 150 VMS • 1100 VCPU • 5.8 TB RAM
  24. 24. 2 5 NIFI / FLINK MAJOR DIFFERENCES NiFi Flink Distributed-capable Distributed by nature Lineage, queues, buffering Straight-through processing 100’s of processor types Stream-oriented operators Limited state processing Natively stateful if desired UI-driven visual development Code / compiled / deployed
  25. 25. 2 6 “CONFIGURE NOT CODE” Scratch Website - http://scratch.mit.edu/
  26. 26. MESSAGING USE CASE
  27. 27. 2 8 START SIMPLE (EVENT, CONDITION, ACTION) Trigger Event Producers Notification Services Action
  28. 28. 2 9 START SIMPLE (EVENT, CONDITION, ACTION) Trigger Event Producers Notification Services Action NEED MORE INFORMATION
  29. 29. 3 0 STATELESS USE CASE Trigger Enrich Filter Enterprise Services (REST) Event Producers Notification Services Action
  30. 30. 3 1 EXAMPLE: VIDEO ON DEMAND EVENT: RECEIVE “VIDEO ON DEMAND” MESSAGE TRIGGER: IF (PRICE > 5) AND (TYPE = ‘RENTAL’) ENRICH: PREFERRED COMMUNICATION (EMAIL OR SMS) ACTION: SEND CONFIRMATION EMAIL OR SMS
  31. 31. 3 2 NIFI VERSION Consume Events Extract Attributes Call Customer Pref Service Set SMS Parameters Set Email Parameters Logging Metrics Send to Communication Handlers
  32. 32. TRIGGERS
  33. 33. 3 4 SQL ON STREAM – APACHE CALCITE FLINK APPROACH - SQL // SQL query with an inlined (unregistered) table Table table = tableEnv.fromDataStream(ds, "user, product, amount"); Table result = tableEnv.sqlQuery( "SELECT SUM(amount) FROM " + table + " WHERE product LIKE '%Rubber%'"); NIFI APPROACH – TRADITIONAL • EVALUATEJSONPATH / EXTRACTTEXT • NIFI EXPRESSION LANGUAGE + ROUTEONATTRIBUTE NIFI APPROACH - CALCITE • QUERYRECORD PROCESSOR • RECORDREADER / RECORDWRITER PATTERN
  34. 34. ENRICHMENT AND ACTIONS
  35. 35. 3 6 ACTIONS Send SMS Action Request Action Handler Send Email Other Notification Methods Communication Preferences
  36. 36. 3 7 ENRICHMENT DATA PLANE Streaming Compute Pipeline AWS S3 HDFS Data File Abstraction Databases MODEL Streaming State Sum Avg Time Buckets Stream Data QUERY Enterprise Services Data Sets at Rest
  37. 37. 3 8 CALLING SERVICES - NIFI INVOKEHTTP PROCESSOR NIFI GOOD FOR • REQUEST PREPARATION • RESULT TRANSFORMATION • HTTP ATTRIBUTE HANDLING • FAILURE AND RETRY LOGIC
  38. 38. 3 9 NIFI - RETRY LOGIC
  39. 39. 4 0 FLINK METHOD FOR CALLING SERVICES ASYNC I/O OPERATOR WORKS WITH ASYNC-CAPABLE POOLS • HTTP • JDBC CODE-YOUR-OWN NO BUILT-IN RETRY CAPABILITY TIMEOUTS CAN LEAD TO FLOW FAILURE
  40. 40. 4 1 FLINK CONNECTED STREAM PATTERN REST Service Connected Stream Operator 5 Minute Global Window Enrichment Handler
  41. 41. STATEFUL FLOWS
  42. 42. 4 3 WHAT IS “STATE”? STATE 1 STATE 2 STATE 3 ACTION ON ENTRY ACTION ON EXIT TRANSITION CONDITION TRANSITION CONDITION STATE TIMEOUT
  43. 43. 4 4 EXAMPLE STATEFUL JOURNEY ORDER PLACED IN TRANSIT OUT FOR DELIVERY “YOUR ORDER IS ON ITS WAY” “SORRY WE MISSED YOU” SHIPPED PLACED ON LOCAL TRUCK 11PM EXPIRE
  44. 44. 4 5 NIFI STATE PROCESSOR STATE (LOCAL AND CLUSTERED) BACKED BY ZOOKEEPER PROCESSORS: UPDATEATTRIBUTE (LOCAL ONLY) ATTRIBUTEROLLINGWINDOW “DISTRIBUTED” MAP CACHE IN-MEMORY OR REDIS-BACKED (NEW IN 1.8) NODE-LOCAL OR “SINGLE NODE” CENTRAL CACHE PROCESSORS: PUTDISTRIBUTEMAPCACHE, GETDISTRIBUTEDMAPCACHE BEFORE NIFI 1.8: NO EASY PARTITIONING / SHARDING 1.8 AND LATER: NODE BALANCED CONNECTIONS PARTITION BY ATTRIBUTE CACHE != STATE (but you can store state in a cache)
  45. 45. 4 6 USING EXTERNAL STATE WITH NIFI USE EXTERNAL DATABASE (E.G. MYSQL) PERIODIC QUERY TO FIND EXPIRED TIMERS BEWARE OF RACE CONDITIONS / FREQUENT UPDATES
  46. 46. 4 7 NIFI – SQL BASED STATE (STATE UPDATE)
  47. 47. 4 8 NIFI – SQL BASED STATE (TIMER EXPIRATION)
  48. 48. 4 9 FLINK APPROACH TO STATE KEYED (NODE LOCAL) STATE WINDOWED OPERATIONS (E.G. 10 MINUTE WINDOW SLIDING BY 1 MINUTE) EVERY OPERATOR CAN HAS ITS OWN STATE QUERYABLE STATE ROCKSDB (IN-MEMORY + DISK STORAGE) CHECKPOINTS AND SAVEPOINTS TO DURABLE FILESYSTEM (HDFS, S3)
  49. 49. 5 0 NETWORK IN DISTRIBUTED FLINK STATE KAFKABROKERS PARTITION 1 PARTITION 2 PARTITION 3 PARTITION 4 PARTITION 5 PARTITION 6 FlinkKafkaConsumer FlinkKafkaConsumer FlinkKafkaConsumer FlinkKafkaConsumer FlinkKafkaConsumer FlinkKafkaConsumer NODE 1 NODE 2 NODE 3 KeyedStreamOperator KeyedStreamOperator KeyedStreamOperator KeyedStreamOperator KeyedStreamOperator KeyedStreamOperator P1 P2 P3 P4 NETWORK OUT keyBy() SHUFFLE/SORT STATE STATE STATE STATE STATE STATE Local STATE
  50. 50. 5 1 WORKING WITH FLINK STATE private transient MapState<String, String> myState; public void open(Configuration config) { MapStateDescriptor<String, String> descriptor = new MapStateDescriptor<String, String>( “myStateName", // the state name String.class, String.class); // K/V types //get the mapstate for the key myState = getRuntimeContext().getMapState(descriptor); } public String map(String myField) { String myValue = myState.get(myField); myState.put(myField, myValue + “ another one”); } DECLARE VARIABLE DESCRIPTOR WITH TYPE INFORMATION INITIALIZE STATE READ/WRITE STATE
  51. 51. INTEGRATING NIFI AND FLINK
  52. 52. 5 3 OPTION 1: BUILT-IN FLINK-NIFI CONNECTOR USES NIFI “SITE TO SITE” PROTOCOL ENABLES PASSING “FLOWFILE ATTRIBUTE” AND “FLOWFILE CONTENT” INTACT public interface NiFiDataPacket { byte[] getContent(); Map<String, String> getAttributes(); } FLINK BACKPRESSURE CONCERNS HTTPS://GITHUB.COM/APACHE/FLINK/TREE/MASTER/FLINK-CONNECTORS/FLINK-CONNECTOR-NIFI
  53. 53. 5 4 OPTION 2: KAFKA TOPIC LOOSER COUPLING ALLOWS MANY “ACTION HANDLERS” (NOT JUST NIFI) MORE BUFFERING / REDUCE BACKPRESSURE RISK JSON AS STANDARD PAYLOAD
  54. 54. NIFI + FLINK SOLUTION APPROACH
  55. 55. 5 6 SOLUTION APPROACH FLINK AS THE HIGH VOLUME EVENT PROCESSOR • MANY USE CASES WITH ONE STREAM • SQL ON STREAM FLINK-BASED TRIGGER, FILTER, ENRICHMENT REQUEST, AND ACTION REQUEST FLINK MANAGES CUSTOMER JOURNEY STATE NIFI FOR: NAMED “PROFILES” FOR ENRICHMENT SERVICES NAMED “PROFILES” FOR NOTIFICATIONS AND ACTIONS Configuration-based use cases in Flink Library of handlers in NiFi
  56. 56. 5 7 HIGH LEVEL SOLUTION Trigger Enrichment Orchestration Filter Enterprise Services (REST) Event Producers Notification Services Action Orchestration Enrich Action EVENT AND MESSAGE ORCHESTRATION
  57. 57. 5 8 USE CASE CONFIGURATION (SIMPLIFIED) { "source": { "type": “kafka", “name": "vod_event_stream“ }, "triggerSql": "data.price > 5 AND data.order_type = 'Rental'“ “enrichment": {“profileName": “communicationprefs"}, "actions":[ {“profileName": "email", "templateId":"1234", “fieldMapping":[ {“field” : “cost”, “source” : “data.price”}] }, {“profileName": “sms", "templateId":“5678", “fieldMapping":[ {“field” : “cost”, “source” : “data.price”}] } }
  58. 58. 5 9 HIGH LEVEL SOLUTION (WITH STATE) Trigger Enrichment Orchestration Filter Enterprise Services (REST) Event Producers Notification Services Action Orchestration Enrich Action EVENT AND MESSAGE ORCHESTRATION FLINK LOCAL STATE Journey State Management FLINK STATE
  59. 59. 6 0 NIFI + FLINK SOLUTION SUMMARY NIFI FOR SERVICES, DATAFLOW, AND TEXT HANDLING FLINK FOR HIGH-PERFORMANCE STREAM PROCESSING FLINK FOR COMMON PATTERNS – CONFIG DRIVEN FLINK FOR STATE MANAGEMENT DECOUPLED LIBRARY OF ENRICHMENT HANDLERS AND ACTION HANDLERS
  60. 60. 6 1 FUTURE WORK FLINK + NIFI SELF-SERVICE USE CASE PORTAL INCREASE CATALOG OF ACTIONS AND ENRICHMENT PROFILES MOVE MORE COMMON CAPABILITIES TO FLINK
  61. 61. 6 2 WE’RE HIRING! PHILADELPHIA WASHINGTON, D.C. SUNNYVALE DENVER
  62. 62. THANK YOU!

×