Anúncio
Anúncio

Mais conteúdo relacionado

Apresentações para você(20)

Similar a Building High-Throughput, Low-Latency Pipelines in Kafka(20)

Anúncio

Mais de confluent(20)

Último(20)

Anúncio

Building High-Throughput, Low-Latency Pipelines in Kafka

  1. Building High-Throughput, Low-Latency Pipelines in Kafka Ben Abramson & Robert Knowles
  2. Introduction How a development department in a well-established enterprise company with no prior knowledge of Apache Kafka® built a real-time data pipeline in Kafka, learning it as we went along. This tells the story of what happened, what we learned and what we did wrong. 2
  3. Who are we and what do we do? • We are William Hill one of the oldest and most well-established companies in the gaming industry • We work in the Trading department of William Hill and we “trade” what happens in a Sports event. • We deal with managing odds for c200k sports events a year. We publish odds for the company and result the markets once events have concluded. • Cater for both traditional pre-match markets and in-play markets • We have been building applications based on messaging technology for a long time as it suits our event-based use-cases 3
  4. What do we build? (In the simplest terms…) 4
  5. Kafka - MOM 5 • Message Persistence • messages not removed when read • Consumer Position Control • replay data • Minimal Overhead • consumers reading from same “durable” topic
  6. Kafka Scalability • Scalability • partitions load balanced evenly on rebalancing 6
  7. Kafka Consumer Groups • Partitions Distributed Evenly Amongst Consumers in Consumer Group • Partition can Only be Consumed by One Consumer in Consumer Group • Partition has Offset for each Consumer Group
  8. Kafka Throughput • Throughput • load distributed across brokers 8
  9. Legacy Monoliths 9
  10. Legacy Monoliths (cont.)
  11. Kafka Development Considerations • Relatively new, the community is still growing, maturity can be an issue. • Know your use case - is it suited to Kafka? • Know your implementation :- • Native Kafka • Spring Kafka • Kafka Streams • Camel • Spring Integration 11
  12. 2016 – Our journey begins • Rapidly evolving industry = new requirements & use cases • Upgrade the tech stack 12 Kafka Microservices Docker Cloud
  13. Java Vs Scala 13 Java • More mature • More knowledge of it • More disciplined Scala • More functional • More flexible • Better suited to data crunching
  14. Microservices 70+ Unique
  15. Standardization Common approach allows many people to work with any part of the platform • Language – Java over Scala/Erlang • Messaging – Kafka over ActiveMQ/RabbitMQ • Libraries – Spring Boot, Kafka Implementation • Environments - Docker • Releases - Versioning and Deployment Strategy • Distributed Logging and Monitoring - Central UI, Format, Correlation
  16. Architectural considerations 16 • Architectural steer to avoid using persistent data stores to keep latency short • We had to think about where to keep or cache data • We started to have to think about Kafka as a data store • This is where we started trying to use Kafka Streams
  17. Architectural Options We looked at a number of ways to solve our problems with data access in apps, given our architectural steer • Kafka Streams • Creating our own abstractions on native Kafka • Using some of kind of data store 17
  18. Kafka Streams • Use case for historical data to be visible in certain UIs • UIs would subscribe to a specific event, but topics carry messages for all events • We had a need to be able read data as if it was a distributed cache • Streams solved many of those problems • Fault tolerance was a issue, we had difficulty recovering from a rebalance and we had problems in starting up, mainly caused by not being able to use a persistent data store • Tech was still in dev at the time, and Kafka 1.0 came a little late for us 18
  19. Message Format • Bad message formats can wreck your system • Common principles of Kafka messages: • A message is basically an event • Messages are of a manageable size • Messages are simple and easy to process • Messages are idempotent (i.e. a fact) • Data should be organised by resources not by specific service needs • Backward compatibility 19
  20. Full state messages 20 • Big & unwieldy • Resource heavy • Can affect latency • Wasteful • Lots of boilerplate code • Resilient – doesn’t matter if you drop it • Stateless • Don’t need to cache anything • Can gzip big messages
  21. Processing Full State Messages • Message reading and message processing done asynchronously • While the latest message is being processed, subsequent messages are pushed on to a stack • When the first message is processed, the next one is taken from the top of the stack, and the rest of the stack is emptied • Effectively a drop buffer 21
  22. Testing… • We wanted unit, integration and system level testing • Unit testing is straight forward • Integration and System testing with large distributed tools is a challenge 22
  23. The Integration Testing Elephant • There is a lot of talk in IT about DevOps and shift-left testing • There is a lot of talk around Big Data style distributed systems • Doing early integration testing with Big Data tools is difficult, and there is a gap in this area • Giving developers the tools to do local integration testing is very difficult • Kafka is not the only framework with this problem 23
  24. Developer Integration Testing • Embedded Kafka from Spring allows a local ’virtual’ Kafka • Great for unit tests and low level integration tests 24
  25. Using Embedded Kafka • Proceed with caution when trying to ensure execution order • Most tests will need to pre-load topics with messages • Quick & dirty, do it statically • Wrapper for Embedded Kafka with additional utilities • Based on JUnit ExternalResource 25
  26. Using Kafka in Docker for testing • An alternative to embedded Kafka is to spin up a Docker instance which act as a ‘Kafka-in-a-box’ – we’re still prototyping this • Single Docker instance that hosts 1-n Kafka instances and a Zookeeper instance. This means no need for a Docker swarm • Start Docker with a Maven exec on pre-integration tests • Start Docker programmatically on test set up using our JDock utility. This is more configurable • This approach is better for NFR & resiliency testing than embedded Kafka 26
  27. Caching Problem • Source Topic and Recovery Topic have Same Number of Partitions • Data with Same Key Needs to be in Same Partition • Recovery Topic is Compacting • Only the Latest Data for a Given Key is Needed Flow 1. Microservices Subscribe with Same Consumer Group to Source Topic • Rebalance Operation Dynamically Assigns Partitions Evenly 2. Microservice Manually Assigns to Same Partitions in Recovery Topic 3. Microservice Clears Cache 4. Microservice Loads All Aggregated Data in Recovery Topic to Cache 5. Microservice Consumes Data from Source Topic • MD5 Check Ignores any Duplicate Data 6. Consumed Data is Aggregated with that in the Cache 7. Aggregated Data is Stored in Recovery Topic 8. Aggregated Data is Sent to Destination Topic
  28. Considerations • SLA - 1 second for message end to end • Message time for each microservice is much less • Failover and Scaling • Rebalancing • Time to Load Cache • Message ordering • Idempotent • No duplicates • Dismissed Solutions • Dual Running Stacks • Kafka-Streams - standby replicas (only for failover)
  29. Revised Kafka Only Solution • Recovery Offset Topic has Same Number of Partitions as the Recovery Topic • When Data is Stored in the Recovery Topic for a Given Key • The Offset of that Data in the Recovery Topic is Stored in the Recovery Offset Topic with the Same Key • On a Rebalance Operation the Microservice Loads Only the Data in the Recovery Offset Topic • A Much Smaller Set of Data (Essentially an Index) • When the Microservice Consumes Data from the Source Topic • The Data that it needs to be Aggregated With in the Recovery Topic is Lazily Retrieved Directly using the Cached Offset to the Cache
  30. With Cassandra Solution • Aggregated Data is Stored in Cassandra (Key-Value Store) • No Data is loaded on a Rebalance Operation • When the Microservice Consumes Data from the Source Topic • The Data that it needs to be Aggregated With in Cassandra is Lazily Retrieved to the Cache Comparison • Revised Kafka and Cassandra Solution have comparable performance • Cassandra Solution Introduces Another Technology • Cassandra Solution Is Less Complex Enhancements • Sticky Assignor (Partition Assignor) • Preserves as many existing partition assignments as possible on a rebalance • Transactions • Exactly Once Message Processing
  31. Topic Configuration • Partitions: Kafka writes messages to a predetermined number of partitions, only one consumer can read from each partition at a time, so you need to consider the number of consumers you have • Replication: How durable do you need it to be? • Retention: How long do you want to keep messages for? • Compaction: How many updates on specific pieces of data do you need to keep? 31
  32. Operational Management • Operationally, Kafka can fall between the cracks - DBAs & SysAdmin teams generally won’t want to get involved in the configuration • Kafka is highly configurable – this is great if you know what it all does • In the early days many of these configurable fields changed between versions. It made it difficult to tune Kafka to optimal performance. • Configuration is heavily dependent on use case. Many settings are also inter-dependent. 32
  33. Summary • Getting Kafka right is not a one-size-fits-all, you must consider your use case, both developmentally and operationally • Building systems with Kafka can be done without a lot of prior expertise • You will need to refactor, it’s a trial and error approach • Don’t be afraid to get it wrong • Don’t assume that your use case has a well established best practice • Remember to focus on the NFRs as well as the functional requirements
  34. Resources • Consult with Confluent • Kafka: The Definitive Guide • https://www.confluent.io/resources/kafka-the-definitive-guide/ • GitHub Examples • https://github.com/confluentinc/examples • https://github.com/confluentinc/kafka-streams-examples • Confluent Enterprise Reference Architecture • https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
  35. Questions
  36. Thank You
Anúncio