O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Building a REST Job Server for Interactive Spark as a Service

6.705 visualizações

Publicada em

from Romain Rigaux and Erick Tryzelaar at Spark Summit EU 2015

Publicada em: Software
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Secrets To Working Online, Hundreds of online opportunites you can profit with today! ■■■ http://ishbv.com/ezpayjobs/pdf
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • I am so glad to read your article, it is really helpful for me on http://www.downcoatseshop.com/
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Building a REST Job Server for Interactive Spark as a Service

  1. 1. BUILDING A REST JOB SERVER
 FOR INTERACTIVE SPARK AS A SERVICE Romain Rigaux - Cloudera Erick Tryzelaar - Cloudera
  2. 2. WHY?
  3. 3. NOTEBOOKS
 EASY ACCESS FROM ANYWHERE
 SHARE SPARK CONTEXTS AND RDDs
 BUILD APPS
 SPARK MAGIC
 … WHY SPARK
 AS A SERVICE?
  4. 4. MARRIED WITH FULL HADOOP ECOSYSTEM WHY SPARK
 IN HUE?
  5. 5. HISTORY
 V1: OOZIE • It works • Code snippet THE GOOD • Submit through Oozie • Shell ac:on • Very Slow • Batch THE BAD workflow.xml snippet.py stdout
  6. 6. HISTORY
 V2: SPARK IGNITER • It works beAer THE GOOD • Compiler Jar • Batch only, no shell • No Python, R • Security • Single point of failure THE BAD Compile Implement Upload json output Batch Scala jar Ooyala
  7. 7. HISTORY
 V3: NOTEBOOK • Like spark-submit / spark shells • Scala / Python / R shells • Jar / Python batch Jobs • Notebook UI • YARN THE GOOD • Beta? THE BAD Livy code snippet batch
  8. 8. GENERAL ARCHITECTURE Spark Spark Spark Livy YARN !" # $
  9. 9. Livy Spark Spark Spark YARN API !" # $ GENERAL ARCHITECTURE
  10. 10. LIVY SPARK SERVER
  11. 11. LIVY
 SPARK SERVER •REST Web server in Scala for Spark submissions •Interac:ve Shell Sessions or Batch Jobs •Backends: Scala, Java, Python, R •No dependency on Hue •Open Source: hAps://github.com/cloudera/ hue/tree/master/apps/spark/java •Read about it: hAp://gethue.com/spark/
  12. 12. ARCHITECTURE • Standard web service: wrapper around spark-submit / Spark shells • YARN mode, Spark drivers run inside the cluster (supports crashes) • No need to inherit any interface or compile code • Extended to work with additional backends
  13. 13. LIVY WEB SERVER
 ARCHITECTURE LOCAL “DEV” MODE YARN MODE
  14. 14. LOCAL MODE Livy Server Scalatra Session Manager Session Spark
 ContextSpark Client Spark Client Spark
 Interpreter
  15. 15. LOCAL MODE Livy Server Scalatra Session Manager Session Spark Client Spark Client Spark
 Context Spark
 Interpreter
  16. 16. LOCAL MODE Spark Client 1 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context Spark
 Interpreter
  17. 17. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context Spark
 Interpreter
  18. 18. LOCAL MODE Spark Client Spark
 Interpreter 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3
  19. 19. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3 4 Spark
 Interpreter
  20. 20. LOCAL MODE Spark Client 1 2 Livy Server Scalatra Session Manager Session Spark Client Spark
 Context 3 4 5 Spark
 Interpreter
  21. 21. YARN-CLUSTER
 MODE PRODUCTION SCALABLE
  22. 22. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  23. 23. Livy Server YARN Master Scalatra Spark Client Session Manager Session YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 YARN-CLUSTER
 MODE Spark
 Interpreter
  24. 24. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  25. 25. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  26. 26. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  27. 27. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 5 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  28. 28. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 2 3 4 5 6 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  29. 29. YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker 1 7 2 3 4 5 6 Livy Server Scalatra Session Manager Session YARN-CLUSTER
 MODE Spark
 Interpreter
  30. 30. SESSION CREATION AND EXECUTION % curl -XPOST localhost:8998/sessions -d '{"kind": "spark"}' { "id": 0, "kind": "spark", "log": [...], "state": "idle" } % curl -XPOST localhost:8998/sessions/0/statements -d '{"code": "1+1"}' { "id": 0, "output": { "data": { "text/plain": "res0: Int = 2" }, "execution_count": 0, "status": "ok" }, "state": "available" }
  31. 31. Jar Py Scala Python R Livy Spark Spark Spark YARN /batches /sessions BATCH OR INTERACTIVE
  32. 32. SHELL OR BATCH? YARN Master Spark Client YARN
 Node Spark
 Interpreter Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  33. 33. SHELL YARN Master Spark Client YARN
 Node pyspark Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  34. 34. BATCH YARN Master Spark Client YARN
 Node spark- submit Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session
  35. 35. LIVY INTERPRETERSScala, Python, R…
  36. 36. REMEMBER? YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session Spark
 Interpreter
  37. 37. INTERPRETERS • Pipe stdin/stdout to a running shell • Execute the code / send to Spark workers • Perform magic opera:ons • One interpreter per language • “Swappable” with other kernels (python, spark..) Interpreter > println(1 + 1) 2 println(1 + 1) 2
  38. 38. Livy Server INTERPRETER FLOW Interpreter
  39. 39. Livy Server > 1 + 1 Interpreter INTERPRETER FLOW
  40. 40. Livy Server {“code”: “1+1”} > 1 + 1 Interpreter INTERPRETER FLOW
  41. 41. Livy Server Interpreter 1+1 {“code”: “1+1”} > 1 + 1 INTERPRETER FLOW
  42. 42. Livy Server Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  43. 43. Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  44. 44. { “data”: { “application/json”: “2” } } Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 Magic INTERPRETER FLOW
  45. 45. { “data”: { “application/json”: “2” } } Livy Server 2 Interpreter 1+1 {“code”: “1+1”} > 1 + 1 2 Magic INTERPRETER FLOW
  46. 46. INTERPRETER FLOW CHART Receive lines Split into Chunks Send output
 to server Send error to server Success Execute ChunkMagic! Chunks le[? Magic chunk? No Yes NoYes Example of parsing
  47. 47. INTERPRETER MAGIC • table • json • plotting • ...
  48. 48. NO MAGIC > 1 + 1 Interpreter 1+1 sparkIMain.interpret(“1+1”) { "id": 0, "output": { "application/json": 2 } }
  49. 49. [('', 506610), ('the', 23407), ('I', 19540)... ] JSON MAGIC > counts sparkIMain.valueOfTerm(“counts”) .toJson() Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts
  50. 50. JSON MAGIC > counts sparkIMain.valueOfTerm(“counts”) .toJson() Interpreter { "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ... } val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts
  51. 51. [('', 506610), ('the', 23407), ('I', 19540)... ] TABLE MAGIC > counts Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts sparkIMain.valueOfTerm(“counts”) .guessHeaders().toList()
  52. 52. TABLE MAGIC > counts sparkIMain.valueOfTerm(“counts”) .guessHeaders().toList() Interpreter val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts "application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ] }
  53. 53. PLOT MAGIC > sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  54. 54. PLOT MAGIC > sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  55. 55. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  56. 56. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) File(’/tmp/plot.png’).read().toBase64() Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300))
  57. 57. PLOT MAGIC > png(‘/tmp/..’) > barplot > dev.off() sparkIMain.interpret(“png(‘/tmp/ plot.png’) barplot dev.off()”) File(’/tmp/plot.png’).read().toBase64() Interpreter ... barplot(sorted_data $count,names.arg=sorted_data$value, main="Resource hits", las=2, col=colfunc(nrow(sorted_data)), ylim=c(0,300)) { "data": { "image/png": "iVBORw0KGgoAAAANSUhEU ... } ... }
  58. 58. • Pluggable Backends • Livy's Spark Backends – Scala – pyspark – R • IPython / Jupyter support coming soon PLUGGABLE INTERPRETERS
  59. 59. • Re-using it • Generic Framework for Interpreters • 51 Kernels JUPYTER BACKEND

  60. 60. SPARK AS A SERVICE
  61. 61. REMEMBER AGAIN? YARN Master Spark Client YARN
 Node Spark
 Context YARN
 Node Spark
 Worker YARN
 Node Spark
 Worker Livy Server Scalatra Session Manager Session Spark
 Interpreter
  62. 62. MULTI USERS YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter YARN
 Node Spark
 Context Spark
 Interpreter YARN
 Node Spark
 Context Spark
 Interpreter Spark Client Spark Client Spark Client
  63. 63. SHARED CONTEXTS? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client
  64. 64. SHARED RDD? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD
  65. 65. SHARED RDDS? YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD
  66. 66. YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD SECURE IT?
  67. 67. YARN
 Node Spark
 Context Livy Server Scalatra Session Manager Session Spark
 Interpreter Spark Client Spark Client Spark Client RDD RDD RDD SECURE IT?
  68. 68. Livy Server Spark Spark Client Spark Client Spark Client SPARK AS SERVICE Spark
  69. 69. SHARING RDDS
  70. 70. PySpark shell RDD Shell Python Shell
  71. 71. PySpark shell RDD Shell Python Shell
  72. 72. PySpark shell RDD Shell Python Shell r = sc.parallelize([]) srdd = ShareableRdd(r)
  73. 73. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell r = sc.parallelize([]) srdd = ShareableRdd(r)
  74. 74. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell curl -XPOST /sessions/0/statement { 'code': srdd.get('ak') } r = sc.parallelize([]) srdd = ShareableRdd(r)
  75. 75. PySpark shell RDD {'ak': 'Alaska'} {'ca': 'California'} Shell Python Shell states = SharedRdd('host/sessions/0', 'srdd') states.get('ak') r = sc.parallelize([]) srdd = ShareableRdd(r) curl -XPOST /sessions/0/statement { 'code': srdd.get('ak') }
  76. 76. DEMO TIME
 https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd
  77. 77. • SSL Support • Persistent Sessions • Kerberos SECURITY
  78. 78. SPARK MAGIC •From Microsop •Python magics for working with remote Spark clusters •Open Source: hAps://github.com/jupyter- incubator/sparkmagic
  79. 79. FUTURE •Move to ext repo? •Security •iPython/Jupyter backends and file format •Shared named RDD / contexts? •Share data •Spark specific, language generic, both? •Leverage Hue 4 https://issues.cloudera.org/browse/HUE-2990
  80. 80. • Open Source: hAps://github.com/cloudera/ hue/tree/master/apps/spark/java • Read about it: hAp://gethue.com/spark/ •Scala, Java, Python, R •Type Introspec:on for Visualiza:on •YARN-cluster or local modes •Code snippets / compiled •REST API •Pluggable backends •Magic keywords •Failure resilient •Security LIVY’S
 CHEAT SHEET
  81. 81. BEDANKT!
 TWITTER @gethue USER GROUP hue-user@ WEBSITE hAp://gethue.com LEARN hAp://learn.gethue.com

×