O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Architecting Microservices Applications with Instant Analytics

820 visualizações

Publicada em

View recording here: https://www.confluent.io/online-talks/architecting-microservices-applications-with-instant-analytics

The next generation architecture for exploring and visualizing event-driven data in real-time requires the right technology. Microservices deliver significant deployment and development agility, but raise questions of how data will move between services and how it will be analyzed. This online talk explores how Apache Druid and Apache Kafka® can turn a microservices ecosystem into a distributed real-time application with instant analytics. Apache Kafka and Druid form the backbone of an architecture that meet the demands imposed on the next generation applications you are building right now. Join industry experts Tim Berglund, Confluent, and Rachel Pedreschi, Imply, as they discuss architecting microservices apps with Druid and Apache Kafka.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Architecting Microservices Applications with Instant Analytics

  1. 1. 1 Architecting Microservices Applications with Instant Analytics Tim Berglund, Sr. Director, Developer Experience, Confluent Rachel Pedreschi, Sr.l Director, Global Field Engineering, Imply Data
  2. 2. 2 What the heck is Apache Kafka and Why Should I Care?
  3. 3. K V
  4. 4. K V
  5. 5. K V
  6. 6. K V
  7. 7. K V
  8. 8. K V
  9. 9. … … … partition 0 partition 1 partition 2 Partitioned Topic producer
  10. 10. consumer A … … … partition 0 partition 1 partition 2 Partitioned Topic
  11. 11. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic
  12. 12. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic
  13. 13. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A
  14. 14. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A consumer A
  15. 15. Stream Processingapps rdbms nosql dwh/ hadoop
  16. 16. Stream Processingapps rdbms nosql dwh/ hadoop
  17. 17. consumer A consumer B … … … partition 0 partition 1 partition 2 Partitioned Topic consumer A consumer A
  18. 18. consumer A consumer A consumer A
  19. 19. Streams Application Streams Application Streams Application
  20. 20. public static void main(String args[]) { Properties streamsConfiguration = getProperties(SCHEMA_REGISTRY_URL); final Map<String, String> serdeConfig = Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, SCHEMA_REGISTRY_URL); final SpecificAvroSerde<Movie> movieSerde = getMovieAvroSerde(serdeConfig); final SpecificAvroSerde<Rating> ratingSerde = getRatingAvroSerde(serdeConfig); final SpecificAvroSerde<RatedMovie> ratedMovieSerde = new SpecificAvroSerde<>(); ratingSerde.configure(serdeConfig, false); StreamsBuilder builder = new StreamsBuilder(); KTable<Long, Double> ratingAverage = getRatingAverageTable(builder); getRatedMoviesTable(builder, ratingAverage, movieSerde); Topology topology = builder.build(); KafkaStreams streams = new KafkaStreams(topology, streamsConfiguration); Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); streams.start(); } private static SpecificAvroSerde<Rating> getRatingAvroSerde(Map<String, String> serdeConfig) { final SpecificAvroSerde<Rating> ratingSerde = new SpecificAvroSerde<>(); ratingSerde.configure(serdeConfig, false); return ratingSerde; } public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) {
  21. 21. } public static SpecificAvroSerde<Movie> getMovieAvroSerde(Map<String, String> serdeConfig) { final SpecificAvroSerde<Movie> movieSerde = new SpecificAvroSerde<>(); movieSerde.configure(serdeConfig, false); return movieSerde; } public static KTable<Long, String> getRatedMoviesTable(StreamsBuilder builder, KTable<Long, Double> ratingAverage, SpecificAvroSerde<Movie> movieSerde) { builder.stream("raw-movies", Consumed.with(Serdes.Long(), Serdes.String())) .mapValues(Parser::parseMovie) .map((key, movie) -> new KeyValue<>(movie.getMovieId(), movie)) .to("movies", Produced.with(Serdes.Long(), movieSerde)); KTable<Long, Movie> movies = builder.table("movies", Materialized .<Long, Movie, KeyValueStore<Bytes, byte[]>>as( "movies-store") .withValueSerde(movieSerde) .withKeySerde(Serdes.Long()) ); KTable<Long, String> ratedMovies = ratingAverage .join(movies, (avg, movie) -> movie.getTitle() + "=" + avg); ratedMovies.toStream().to("rated-movies", Produced.with(Serdes.Long(), Serdes.String())); return ratedMovies; } public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) { KStream<Long, String> rawRatings = builder.stream("raw-ratings", Consumed.with(Serdes.Long(), Serdes.String())); KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating)
  22. 22. return ratedMovies; } public static KTable<Long, Double> getRatingAverageTable(StreamsBuilder builder) { KStream<Long, String> rawRatings = builder.stream("raw-ratings", Consumed.with(Serdes.Long(), Serdes.String())); KStream<Long, Rating> ratings = rawRatings.mapValues(Parser::parseRating) .map((key, rating) -> new KeyValue<>(rating.getMovieId(), rating)); KStream<Long, Double> numericRatings = ratings.mapValues(Rating::getRating); KGroupedStream<Long, Double> ratingsById = numericRatings.groupByKey(); KTable<Long, Long> ratingCounts = ratingsById.count(); KTable<Long, Double> ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2); KTable<Long, Double> ratingAverage = ratingSums.join(ratingCounts, (sum, count) -> sum / count.doubleValue(), Materialized.as("average-ratings")); ratingAverage.toStream() /*.peek((key, value) -> { // debug only System.out.println("key = " + key + ", value = " + value); })*/ .to("average-ratings"); return ratingAverage; }
  23. 23. CREATE TABLE movie_ratings AS SELECT title, SUM(rating)/COUNT(rating) AS avg_rating, COUNT(rating) AS num_ratings FROM ratings LEFT OUTER JOIN movies ON ratings.movie_id = movies.movie_id GROUP BY title;
  24. 24. producer consumer KSQL Cluster KSQL Server KSQL Server
  25. 25. orders shipping users warehouse order web UI users web UI
  26. 26. orders shipping users warehouse order web UI users web UI druid product analytics
  27. 27. 27 What the heck is Apache Druid and Why Should I Care?
  28. 28. 28 TimRachel
  29. 29. 29
  30. 30. 30
  31. 31. 31
  32. 32. 32 Data Data Data Data Sources ETL Data Warehouse Some Code Usually an RDBMS Analytics Reporting Data mining Querying
  33. 33. 33
  34. 34. 34
  35. 35. 35 Data Data Data Map/reduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake HDFSRDBMS / NoSQL
  36. 36. 36
  37. 37. 37 Data Data Data Data Sources Message bus Data Lake Streaming OLAP
  38. 38. 38
  39. 39. 39
  40. 40. 40 Typical Big Data++ Challenges ● Scale: when data is large, we need a lot of servers ● Speed: aiming for sub-second response time ● Complexity: too much fine grain to precompute ● High dimensionality: 10s or 100s of dimensions ● Concurrency: many users and tenants ● Freshness: load from streams
  41. 41. 41 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  42. 42. 42 ! Batch ingestion ! Efficient storage ! Fast analytic queries Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  43. 43. 43 Druid Use Cases in the Wild 1. Digital Advertising - Publishers, Advertisers, Exchanges 2. User Event Analytics- Clickstream, QoS, Usage 3. Network Telemetry 4. Lots and Lots of Data- IoT, Product Analytics, Fraud
  44. 44. 44 Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts- its-hadoop-stuff.html
  45. 45. orders shipping users warehouse order web UI users web UI druid product analytics
  46. 46. 46
  47. 47. 47 You can Druid too! Druid community site: https://druid.apache.org/ Imply distribution: https://imply.io/get-started Contribute: https://github.com/apache/druid
  48. 48. 48

×