O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Punch clock for debugging apache storm

302 visualizações

Publicada em

Motivation:

To find out….

When did the batch enter/exit the Spout/Bolt ?

Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?

On which host are they stuck ?

In which Spout/Bolt are they stuck ?

Publicada em: Engenharia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Punch clock for debugging apache storm

  1. 1. Punch clock for Apache storm <just an idea>
  2. 2. Punch clock (a.ka. time clock)
  3. 3. Punch clock (a.ka. time clock) ● You have a card per person.
  4. 4. Punch clock (a.ka. time clock) ● You have a card per person. ● The person punches IN with the card when he/she enters the office.
  5. 5. Punch clock (a.ka. time clock) ● You have a card per person. ● The person punches IN with the card when he/she enters the office. ● The person punches OUT with the card when he/she leaves the office.
  6. 6. Punch clock (a.ka. time clock) ● You have a card per person. ● The person punches IN with the card when he/she enters the office. ● The person punches OUT with the card when he/she leaves the office. ● The punch clock records the time of entry/exit on the card
  7. 7. Motivation To Find out …
  8. 8. Motivation To Find out … 1. When did the Person enter / exit the office ?
  9. 9. Motivation To Find out … 1. When did the Person enter / exit the office ? 2. Who is still in office ?
  10. 10. Change of Context …
  11. 11. “Apache Storm” Tuples going In & Out of Spouts/Bolts
  12. 12. Motivation Debugging Apache Storm* * Debugging Storm Transactional Topologies
  13. 13. Debugging Transactional Topologies
  14. 14. Debugging Transactional Topologies 1. Spout emits a batch of data(tuples) which forms a transaction.
  15. 15. Debugging Transactional Topologies 1. Spout emits a batch of data(tuples) which forms a transaction. 2. Every Bolt in the topology processes that batch of data (tuples).
  16. 16. Motivation To Find out …
  17. 17. Motivation To Find out … 1. When did the batch enter/exit the Spout/Bolt ?
  18. 18. Motivation To Find out … 1. When did the batch enter/exit the Spout/Bolt ? 2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ?
  19. 19. Motivation To Find out … 1. When did the batch enter/exit the Spout/Bolt ? 2. Which batch is still in the Spout/Bolt? i.e. are any batches STUCK ? a. On which host are they stuck ? b. In which Spout/Bolt are they stuck ?
  20. 20. Possible Solution(s):
  21. 21. Possible Solution(s): Add a log statement before and after the critical section.
  22. 22. Possible Solution(s): Add a log statement before and after the critical section. log.info(“Inserting data into database ….”); // ← entering datasource.insert(table, tuples); // ←the real work log.info(“Inserted data into database.”); //← exiting
  23. 23. Possible Solution(s): Add a log statement before and after the critical section. log.info(“Inserting data into database ….”); // ← entering datasource.insert(table, tuples); // ←the real work log.info(“Inserted data into database.”); //← exiting ------------------------------------------------------------------ Cons: Logs distributed over multiple hosts, need to aggregate logs. needs a bit of work, Elastic Search Kibana ?
  24. 24. Possible Solution(s): Use http://riemann.io/index.html This was Suggested by my friend angad. I have not looked at this though.
  25. 25. My Idea Batch of Tuples Punch IN and Punch Out in a bolt / spout.
  26. 26. My Idea Batch of Tuples Punch IN and Punch Out in a bolt / spout. Punch In - Put into hashmap (or any other suitable data structure) Punch Out - Remove from hashmap (or any other suitable data structure)
  27. 27. My Idea: Batch of Tuples Punch In and Punch Out in a spout. In the emitBatch of Transactional Spout: PunchClock.getInstance().punchIn(punchCardId); // ←Punch In collector.emit(tuples); // ←Emit tuple(s) PunchClock.getInstance().punchOut(punchCardId); // ←Punch Out
  28. 28. Batch of Tuples Punch IN and Punch Out in a bolt . In the prepare method of Transactional Bolt: punchCardId ="Bolt__"+Thread.currentThread().getId()+"__"+System.currentTimeMillis(); // ←Create Punch Card for txn In the execute method of Transactional Bolt: PunchClock.getInstance().punchIn(punchCardId); // ← Punch In In the finishBatch method of Transactional Bolt: PunchClock.getInstance().punchOut(punchCardId); // ← Punch Out My Idea:
  29. 29. Yes, but it’s a simple Put / Remove call to a hashmap. When compared to logging it’s cheaper Is it intrusive ?
  30. 30. Punch Clocks
  31. 31. Punch Clocks ● Spouts / Bolts housed in a storm worker jvm.
  32. 32. Punch Clocks ● Spouts / Bolts housed in a storm worker jvm. ● One Punch Clock per JVM.
  33. 33. Punch Clocks ● Spouts / Bolts housed in a storm worker jvm. ● One Punch Clock per JVM. ● Since we have multiple JVM we have multiple Punch Clocks.
  34. 34. Punch Clocks ● Spouts / Bolts housed in a storm worker jvm. ● One Punch Clock per JVM. ● Since we have multiple JVM we have multiple Punch Clocks. ● Batches move across storm workers & we have multiple JVM, ○ We need to aggregate the data across Punch Clocks. ○ Expose Punch Clock via JMX.
  35. 35. demo:
  36. 36. thank you jaihind213@gmail.com https://github.com/jaihind213/storm-punch-clock sweetweet213@twitter

×