O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Flux: Apache Storm Frictionless Topology Configuration & Deployment

227 visualizações

Publicada em

Storm BoF - Hadoop Summit Brussels 2015

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Flux: Apache Storm Frictionless Topology Configuration & Deployment

  1. 1. ███████╗██╗ ██╗ ██╗██╗ ██╗ ██╔════╝██║ ██║ ██║╚██╗██╔╝ █████╗ ██║ ██║ ██║ ╚███╔╝ ██╔══╝ ██║ ██║ ██║ ██╔██╗ ██║ ███████╗╚██████╔╝██╔╝ ██╗ ╚═╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝ Apache Storm Frictionless Topology Configuration & Deployment P. Taylor Goetz, Hortonworks @ptgoetz Storm BoF - Hadoop Summit Brussels 2015
  2. 2. About me… • VP - Apache Storm • ASF Member • Member of Technical Staff, Hortonworks
  3. 3. What is Flux? • An easier way to configure and deploy Apache Storm topologies • A YAML DSL for defining and configuring Storm topologies • And more…
  4. 4. Why Flux?
  5. 5. Because seeing duplication of effort makes me sad…
  6. 6. What’s wrong here? public static void main(String[] args) throws Exception { String name = "myTopology"; StormTopology topology = getTopology(); // logic to determine if we're running locally or not… boolean runLocal = shouldRunLocal(); // create necessary config options… Config conf = new Config(); if(runLocal){ LocalCluster cluster = new LocalCluster(); cluster.submitTopology(name, conf, topology); } else { StormSubmitter.submitTopology(name, conf, topology); } }
  7. 7. What’s wrong here? public static void main(String[] args) throws Exception { String name = "myTopology"; StormTopology topology = getTopology(); // logic to determine if we're running locally or not… boolean runLocal = shouldRunLocal(); // create necessary config options… Config conf = new Config(); if(runLocal){ LocalCluster cluster = new LocalCluster(); cluster.submitTopology(name, conf, topology); } else { StormSubmitter.submitTopology(name, conf, topology); } } • Configuration tightly coupled with code. • Changes require recompilation & repackaging.
  8. 8. Wouldn’t this be easier? storm jar mycomponents.jar org.apache.storm.flux.Flux --local config.yaml OR… storm jar mycomponents.jar org.apache.storm.flux.Flux --remote config.yaml
  9. 9. Flux allows you to package all your Storm components once. Then wire, configure, and deploy topologies using a YAML definition.
  10. 10. Flux Features • Easily configure and deploy Storm topologies (Both Storm core and Microbatch API) without embedding configuration in your topology code • Support for existing topology code • Define Storm Core API (Spouts/Bolts) using a flexible YAML DSL • YAML DSL support for virtually any Storm component (storm-kafka, storm-hdfs, storm-hbase, etc.) • Convenient support for multi-lang components • External property substitution/filtering for easily switching between configurations/environments (similar to Maven-style ${variable.name} substitution)
  11. 11. Flux YAML DSL YAML Definition Consists of: • Topology Name (1) • Includes (0…*) • Config Map (0…1) • Components (0…*) • Spouts (1…*) • Bolts (1…*) • Streams (1…*)
  12. 12. Flux YAML DSL
  13. 13. Config A Map-of-Maps (Objects) that will be passed to the topology at submission time (Storm config). # topology name name: “myTopology" # topology configuration config: topology.workers: 5 topology.max.spout.pending: 1000 # ...
  14. 14. Components • Catalog (list/map) of Objects that can be used/referenced in other parts of the YAML configuration • Roughly analogous to Spring beans.
  15. 15. Components Simple Java class with default constructor: # Components components: - id: "stringScheme" className: "storm.kafka.StringScheme"
  16. 16. Components: Constructor Arguments Component classes can be instantiate with “constructorArgs” (a list of class constructor arguments): # Components components: - id: "zkHosts" className: "storm.kafka.ZkHosts" constructorArgs: - "localhost:2181"
  17. 17. Components: References Components can be “referenced” throughout the YAML config and used as arguments: # Components components: - id: "stringScheme" className: "storm.kafka.StringScheme" - id: "stringMultiScheme" className: "backtype.storm.spout.SchemeAsMultiScheme" constructorArgs: - ref: "stringScheme"
  18. 18. Components: Properties Components can be configured using JavaBean setter methods and public instance variables: - id: "spoutConfig" className: "storm.kafka.SpoutConfig" properties: - name: "forceFromStart" value: true - name: "scheme" ref: "stringMultiScheme"
  19. 19. Components: Config Methods Call arbitrary methods to configure a component: - id: "recordFormat" className: "org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat" configMethods: - name: "withFieldDelimiter" args: ["|"] References can be used here as well.
  20. 20. Spouts A list of objects that implement the IRichSpout interface and an associated parallelism setting. # spout definitions spouts: - id: "sentence-spout" className: "org.apache.storm.flux.wrappers.spouts.FluxShellSpout" # shell spout constructor takes 2 arguments: String[], String[] constructorArgs: # command line - ["node", "randomsentence.js"] # output fields - ["word"] parallelism: 1 # ...
  21. 21. Bolts A list of objects that implement the IRichBolt or IBasicBolt interface with an associated parallelism setting. # bolt definitions bolts: - id: "splitsentence" className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt" constructorArgs: # command line - ["python", "splitsentence.py"] # output fields - ["word"] parallelism: 1 # ... - id: "count" className: "backtype.storm.testing.TestWordCounter" parallelism: 1 # ...
  22. 22. Spout and Bolt definitions are just extensions of “Component” with a “parallelism” attribute, so all component features (references, constructor args, properties, config methods) can be used.
  23. 23. Streams • Represent Spout-to-Bolt and Bolt-to-Bolt connections • In graph terms: “edges” • Also define Stream Groupings: • ALL, CUSTOM, DIRECT, SHUFFLE, LOCAL_OR_SHUFFLE, FIELDS, GLOBAL, or NONE.
  24. 24. Streams Custom stream grouping: - from: "bolt-1" to: "bolt-2" grouping: type: CUSTOM customClass: className: "backtype.storm.testing.NGrouping" constructorArgs: - 1 Again, you can use references, properties, and config methods.
  25. 25. Filtering/Variable Substitution Define properties in an external properties file, and reference them in YAML using ${} syntax: - id: "rotationAction" className: "org.apache.storm.hdfs.common.rotation.MoveFileAction" configMethods: - name: "toDestination" args: ["${hdfs.dest.dir}"] Will get replaced with value of property prior to YAML parsing.
  26. 26. Filtering/Variable Substitution Environment variables can be referenced in YAML using ${ENV-} syntax: - id: "rotationAction" className: "org.apache.storm.hdfs.common.rotation.MoveFileAction" configMethods: - name: "toDestination" args: [“${ENV-HDFS_DIR}”] Will get replaced with value of $HDFS_DIR env variable prior to YAML parsing.
  27. 27. File Includes and Overrides Include files/classpath resources and optionally override values: name: "include-topology" includes: - resource: true file: "/configs/shell_test.yaml" override: false #otherwise subsequent includes that define 'name' would override
  28. 28. Existing Topologies & Trident Topologies
  29. 29. Existing Topologies • Alternative to YAML Spout/Bolt/Stream DSL • Same syntax • Works with transactional/micro-batch (Trident) topologies • Tell Flux about the class that will produce your topology • Components, references, constructor args, properties, config methods, etc. can all be used
  30. 30. Existing Topologies Provide a class with a public method that returns a StormTopology instance: /** * Marker interface for objects that can produce `StormTopology` objects. * * If a `topology-source` class implements the `getTopology()` method, Flux will * call that method. Otherwise, it will introspect the given class and look for a * similar method that produces a `StormTopology` instance. * * Note that it is not strictly necessary for a class to implement this interface. * If a class defines a method with a similar signature, Flux should be able to find * and invoke it. * */ public interface TopologySource { public StormTopology getTopology(Map<String, Object> config); } This can be a Spout/Bolt or Trident topology.
  31. 31. Existing Topologies Define a topologySource to tell Flux how to configure the class that creates the topology: # configuration that uses an existing topology that does not implement TopologySource name: "existing-topology" topologySource: className: "org.apache.storm.flux.test.SimpleTopology" methodName: "getTopologyWithDifferentMethodName" constructorArgs: - "foo" - "bar" Components, references, constructor args, properties, config methods, etc. can all be used.
  32. 32. Flux Usage • Add the Flux dependency to your project. • Use the Maven shade plugin to create a fat jar file. • Use the `storm` command to run (locally) or deploy (remotely) your topology: storm jar mycomponents.jar org.apache.storm.flux.Flux [options] <config file>
  33. 33. Flux Usage: Command Line Options usage: storm jar <my_topology_uber_jar.jar> org.apache.storm.flux.Flux [options] <topology-config.yaml> -d,--dry-run Do not run or deploy the topology. Just build, validate, and print information about the topology. -e,--env-filter Perform environment variable substitution. Replace keysidentified with `${ENV-[NAME]}` will be replaced with the corresponding `NAME` environment value -f,--filter <file> Perform property substitution. Use the specified file as a source of properties, and replace keys identified with {$[property name]} with the value defined in the properties file. -i,--inactive Deploy the topology, but do not activate it. -l,--local Run the topology in local mode. -n,--no-splash Suppress the printing of the splash screen. -q,--no-detail Suppress the printing of topology details. -r,--remote Deploy the topology to a remote cluster. -R,--resource Treat the supplied path as a class path resource instead of a file. -s,--sleep <ms> When running locally, the amount of time to sleep (in ms.) before killing the topology and shutting down the local cluster. -z,--zookeeper <host:port> When running in local mode, use the ZooKeeper at the specified <host>:<port> instead of the in-process ZooKeeper. (requires Storm 0.9.3 or later)
  34. 34. With great power comes great responsibility. It’s up to you to avoid shooting yourself in the foot!
  35. 35. Feedback/Contributions Welcome https://github.com/ptgoetz/fluxFlux on GitHub:
  36. 36. Thank you! AMA… P. Taylor Goetz, Hortonworks @ptgoetz

×