O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Druid realtime indexing

2.637 visualizações

Publicada em

Overview of the druid realtime indexing

Publicada em: Dados e análise
  • Seja o primeiro a comentar

Druid realtime indexing

  1. 1. Copyright © 2016 kt NexR. All rights reserved. 1 Druid Overview of real time indexing Azrael Seoeun Park seoeun25@gmail.com
  2. 2. Copyright © 2016 kt NexR. All rights reserved. 2 Introduction • Indexing Service • Design Architecture • Tranquility • Task Spec • Firehose • Plumber • Tranquility Configs • Flow of Realtime Indexing
  3. 3. Copyright © 2016 kt NexR. All rights reserved. 3 Indexing Service • Indexing Service – Runs indexing task that create druid segment • Indexing task type – Index_realtime – Index_hadoop : batch ingestion – Index Indexing Service Data Source Segment Submit Task
  4. 4. Copyright © 2016 kt NexR. All rights reserved. 4 Design Architecture Deep Storage Tranquility CoordinatorTranquility Broker Tranquility Indexing Service Overlord MiddleManager Peon Peon Peon ZooKeeper Kafka SparkStreaming Storm task Task (realtime _index) Segments segment Segment- cache Historical Segment- cache Historical
  5. 5. Copyright © 2016 kt NexR. All rights reserved. 5 Tranquility • Send event streams to Druid in real-time • Written in Scala • Samza, Spark, Strom, Kafka, Flink • Tranquility Kafka – Submit realtime indexing task to overlord • Post http request with task spec – Pull the data from kafka – Push the data to realtime indexing task
  6. 6. Copyright © 2016 kt NexR. All rights reserved. 6 Task Spec • "type" : "index_realtime", • "id" : "index_realtime_sip_2016-05-17T04:00:00.000Z_0_0", • "spec" : – "dataSchema" : • "dataSource" : "sip", • "parser" : { – "parseSpec" » "format" : "json", » "timestampSpec” » "dimensionsSpec” • "metricsSpec” • "granularitySpec" – "segmentGranularity" : "TEN_MINUTE” – "queryGranularity" : “MINUTE” – "ioConfig" : – "tuningConfig" : Data Ingestion : index firehose plumber RealtimeIndexTask
  7. 7. Copyright © 2016 kt NexR. All rights reserved. 7 RealtimeIndexTask RealtimeIndexTask Firehose Plumber Data Source Segment IncremetalIndex IndexMerger
  8. 8. Copyright © 2016 kt NexR. All rights reserved. 8 Firehose • Pipe line to read data • Type – LocalFirehose • Local file 과 연결 – IngestSegmentFirehose • Existing druid segment – CombiningFilrehose – EventReceiverFirehose • Ingest event using an http endpoint – TimedShutoffFirehose • Shutdown at a specified time
  9. 9. Copyright © 2016 kt NexR. All rights reserved. 9 Firehose – real time index example • Firehose – Shutoff time : segmentGranulariy + windowPeriod + firehoseGracePeriod – Buffer size : firehouseBufferSize “ioConfig" : { "type" : "realtime", "firehose" : { "type" : "clipped", "delegate" : { "type" : "timed", "delegate" : { "type" : "receiver", "serviceName" : "firehose:druid:overlord:sip-00-0000-0000", "bufferSize" : 100000 }, "shutoffTime" : "2016-05-17T04:15:00.000Z" } "interval" : "2016-05-17T04:00:00.000Z/2016-05-17T04:10:00.000Z” EventReceiver buffer Timed : shutofftime Clip Tranquility nextRow Plumber push
  10. 10. Copyright © 2016 kt NexR. All rights reserved. 10 Plumber • Generate segment – Intermediate persist • Indexing Task 실행 중에 index를 segment로 저장 • local에 있는 base directory에 segment 저장 • 동일한 id로 Task를 다시 실행 할 때 intermediate persist로 저장된 segment를 복구할 수 있다. – when the task finish • Indexing Task가 끝나면 전체 index를 segment로 저장 • DeepStorage에 segment를 push • Type – YeOldePlumber • This plumber creates single historical segments. – RealtimePlumber • This plumber creates real-time/mutable segments.
  11. 11. Copyright © 2016 kt NexR. All rights reserved. 11 How to indexing Firehose row currentHydrant Index FireHydrant Index Plumber FireHydrant Index Sink Intermediate Persist …/task /index_realtime_sip_2016- 05-18T05:10:00.000Z_0_0 /work /persist/sip/2016-05-18T0 5:10:00.000Z_2016-05-18T 05:20:00.000Z /0 /v8-tmp Persist and Merge …/task /index_realtime_sip_2016- 05-18T05:10:00.000Z_0_0 /work /persist/sip/2016-05-18T0 5:10:00.000Z_2016-05-18T 05:20:00.000Z /merged /v8-tmp
  12. 12. Copyright © 2016 kt NexR. All rights reserved. 12 Abstraction of index • Structure – TimeAndDims – Aggregators {"timestamp": "2016-05-18T04:31:39Z", "sip": "a", "packet_total": "10"} {"timestamp": "2016-05-18T04:31:39Z", "sip": "b", "packet_total": "3"} {"timestamp": "2016-05-18T04:33:42Z", "sip": "a", "packet_total": "10"} {"timestamp": "2016-05-18T04:33:42Z", "sip": "c", "packet_total": "5"} {"timestamp": "2016-05-18T04:37:55Z", "sip": "a", "packet_total": "10"} {"timestamp": "2016-05-18T04:45:11Z", "sip": "a", "packet_total": "7"} {"timestamp": "2016-05-18T04:45:11Z", "sip": "b", "packet_total": "8"} {"timestamp": "2016-05-18T04:45:22Z", "sip": "b", "packet_total": "8"} time sip sum 04:30:00 a (10)(10)(10)=30 04:30:00 b 3 04:30:00 c 5 04:40:00 a 7 04:40:00 b (8)(8)=16 Time 단위 : queryGranularity. - queryGrandularity 가 1 row를 의미
  13. 13. Copyright © 2016 kt NexR. All rights reserved. 13 Plumber – real time index example • Plumber – maxRowsInMemory: persist하기 전에 최대 max row – intermediatePersistPeriod: persist 주기 – maxPendingPersist: pending 할 수 있는 persist 갯수. 0 = 한 개의 persist만 실 행 • How to set persist period? – maxRowsInMemory가 크면 메모리 사용량 증가 – intermedatePersistPeriod 가 빠르면 메모리 사용량 증가 • 주기가 느리면 recovery시 데이터 유실이 많이 된다 • stream processing의 recovery는 batch로 보완 "tuningConfig" : { "type" : "realtime", "maxRowsInMemory" : 1000, "intermediatePersistPeriod" : "PT2M", "windowPeriod" : "PT1S", "basePersistDirectory" : "/Users/seoeun/libs/druid-0.9.0/var/tmp/1463447509447-0", "maxPendingPersists" : 0, ….}
  14. 14. Copyright © 2016 kt NexR. All rights reserved. 14 Segment • Files that store index, partitioned by time. • Created for each time interval configured in “SegmentGranularity” • File size: 300mb – 700mb • Row number: 5 million • Components – version.bin – meta.smoosh – XXXXX.smoosh
  15. 15. Copyright © 2016 kt NexR. All rights reserved. 15 Tranquility kafka config – about druid • druid.discovery.curator.path – Curator service discovery path – /druid/discovery • druid.selectors.indexing.serviceName – Overlord node 의 service name – druid/overlord • druidBeam.firehoseBufferSize – Size of buffer used by firehose to store events. – 100,000 • druidBeam.firehoseChunkSize – Maximum number of events to send to Druid in one HTTP request. – 1,000 • druidBeam.firehoseGracePeriod – Druid indexing tasks will shut down this long after the windowPeriod has elapsed. – PT5M • task.partitions – Number of Druid partitions to create. – 1 • task.replicants – Number of instances of each Druid partition to create. – 1
  16. 16. Copyright © 2016 kt NexR. All rights reserved. 16 Tranquility kafka config – about tranquility • tranquility.blockOnFull – Whether "send" will block (true) or throw an exception (false) when called whi le the outgoing queue is full. – true • tranquility.lingerMillis – Wait this long for batches to collect more messages (up to maxBatchSize) bef ore sending them. – 0 (disable waiting) • tranquility.maxBatchSize – Maximum number of messages to send at once. – 2,000 • tranquility.maxPendingBatches – Maximum number of batches that can be pending – 5
  17. 17. Copyright © 2016 kt NexR. All rights reserved. 17 Tranquility kafka config – about kafka • kafka.group.id – Group ID for Kafka consumers. This must be the same! – tranquility-kafka • consumer.numThreads – The number of threads that will be made available to the Kafka consumer for fetching messages. – -1 – Partition number/ consumers로 계산하여 분산 • commit.periodMillis – The frequency with which consumer offsets will be committed to ZooKeeper t o track processed messages. – 15000 • kafka.* – kafka. will be passed to the underlying Kafka consumer with the kafka. prefix removed. – kafka.fetch.message.max.bytes  fetch.message.max.bytes • The number of bytes of messages to attempt to fetch for each topic-partition in eac h fetch request. • Partition 수 만큼 곱하게 되므로 메모리 사용 유의 • broker의 message.max.bytes 보다 크게 설정
  18. 18. Copyright © 2016 kt NexR. All rights reserved. 18 Flow of real time indexing Tranquility Overlord Zookeeper Middle Manager Segment Deep Storage submit task Realtime IndexTask firehose plumber push data task firehose segement forking Task task task firehose
  19. 19. Copyright © 2016 kt NexR. All rights reserved. 19 Q&A

×