SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
PSUG #52
Dataflow and simplified reactive
programming with Akka-streams



Stéphane Manciot
26/02/2015
Plan
○ Dataflow main features
○ Reactive Streams
○ Akka-streams - Pipeline Dataflow
○ Akka-streams - working with graphs
Dataflow - main features
Dataflow Nodes and data
○ A node is a processing element that takes inputs,
does some operation and returns the results on its
outputs
○ Data (referred to as tokens) flows between nodes
through arcs
○ The only way for a node to send and receive data is
through ports
○ A port is the connection point between an arc and a
node
Dataflow Arcs
○ Always connects one output port to one input port
○ Has a capacity : the maximum amount of tokens
that an arc can hold at any one time
○ Node activation requirements :
○ at least one token waiting on the input arc
○ space available on the output arc
○ Reading data from the input arc frees space on the
latter
○ Arcs may contain zero or more tokens as long as it
is less than the arc’s capacity
Dataflow graph
○ Explicitly states the connections between nodes
○ Dataflow graph execution :
○ Node activation requirements are fulfilled
○ A token is consumed from the input arc
○ The node executes
○ If needed, a new token is pushed to the output
arc
Dataflow features - push or pull data
○ Push :
○ nodes send token to other nodes whenever they
are available
○ the data producer is in control and initiates
transmissions
○ Pull :
○ the demand flows upstream
○ allows a node to lazily produce an output only
when needed
○ the data consumer is in control
Dataflow features - mutable or immutable
data
○ splits require to copy mutable data
○ Immutable data is preferred any time parallel
computations exist
Dataflow features - multiple inputs and/or
outputs
○ multiple inputs
○ node “firing rule”
○ ALL of its inputs have tokens (zip)
○ ANY of its inputs have tokens waiting (merge)
○ activation preconditions
○ a firing rule for the node must match the
available input tokens
○ space for new token(s) are available on the
outputs
○ multiple outputs
Dataflow features - compound nodes
○ the ability to create new nodes by using other nodes
○ a smaller data flow program
○ the interior nodes
○ should not know of anything outside of the
compound node
○ must be able to connect to the ports of the parent
compound node
Dataflow features - arc joins and/or splits
○ Fan-in : merge, zip, concat
○ Fan-out : broadcast, balance, unzip
Reactive Streams
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ Blocking calls : queue with a bounded size
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ The Push way
○ unbounded buffer : Out Of Memory error
○ bounded buffer with NACK
Reactive Streams - back-pressure
○ How handling data across asynchronous
boundary ?
○ The reactive way : non-blocking, non-dropping &
bounded
○ data items flow downstream
○ demand flows upstream
○ data items flow only when there is demand
Reactive Streams - dynamic push / pull model
○ “push” behaviour when Subscriber is faster
○ “pull” behaviour when Publisher is faster
○ switches automatically between those behaviours in
runtime
○ batching demand allows batching data
Reactive Streams - back-pressure
Reactive Streams - back-pressure
Reactive Streams - SPI
trait Publisher[T] {

def subscribe(sub: Subscriber[T]): Unit

}



trait Subscription{

def requestMore(n: Int): Unit

def cancel(): Unit

}



trait Subscriber[T] {

def onSubscribe(s: Subscription): Unit

def onNext(elem: T): Unit

def onError(thr: Throwable): Unit

def onComplete(): Unit

}

Reactive Streams - asynchronous non-blocking protocol
Akka Streams - Pipeline Dataflow
Akka Streams - Core Concepts
○ a DSL implementation of Reactive Streams relying
on Akka actors
○ Stream which involves moving and/or transforming
data
○ Element : the processing unit of streams
○ Processing Stage : building blocks that build up a
Flow or FlowGraph (map(), filter(), transform(),
junctions …)
Akka Streams - Pipeline Dataflow
○ Source : a processing stage with exactly one
output, emitting data elements when downstream
processing stages are ready to receive them
○ Sink : a processing stage with exactly one input,
requesting and accepting data elements
○ Flow : a processing stage with exactly one input
and output, which connects its up- and downstream
by moving / transforming the data elements flowing
through it
○ Runnable Flow : a Flow that has both ends
“attached” to a Source and Sink respectively
○ Materialized Flow : a Runnable Flow that ran
Akka Streams - Pipeline Dataflow
/**

* Construct the ActorSystem we will use in our application

*/

implicit lazy val system = ActorSystem("DataFlow")



implicit val _ = ActorFlowMaterializer()



val source = Source(1 to 10)



val sink = Sink.fold[Int, Int](0)(_ + _)



val transform = Flow[Int].map(_ + 1).filter(_ % 2 == 0)



// connect the Source to the Sink, obtaining a runnable flow

val runnable: RunnableFlow = source.via(transform).to(sink)



// materialize the flow

val materialized: MaterializedMap = runnable.run()



implicit val dispatcher = system.dispatcher



// retrieve the result of the folding process over the stream

val result: Future[Int] = materialized.get(sink)

result.foreach(println)
Akka Streams - Pipeline Dataflow
○ Processing stages are immutable (connecting them
returns a new processing stage)
val source = Source(1 to 10)

source.map(_ => 0) // has no effect on source, since it’s immutable

source.runWith(Sink.fold(0)(_ + _)) // 55
○ A stream can be materialized multiple times
// connect the Source to the Sink, obtaining a RunnableFlow

val sink = Sink.fold[Int, Int](0)(_ + _)

val runnable: RunnableFlow = Source(1 to 10).to(sink)

// get the materialized value of the FoldSink

val sum1: Future[Int] = runnable.run().get(sink)

val sum2: Future[Int] = runnable.run().get(sink)

// sum1 and sum2 are different Futures!

for(a <- sum1; b <- sum2) yield println(a + b) //110
○ Processing stages preserve input order of elements
Akka Streams - working with graphs
Akka Streams - bulk export to es
val bulkSize: Int = 100



val source: Source[String] = Source(types.toList)



val sum: Sink[Int] = FoldSink[Int, Int](0)(_ + _)



val g:FlowGraph = FlowGraph{ implicit builder =>

import FlowGraphImplicits._



val walk: Flow[String, Companies2Export] = Flow[String]

.map(fileTree(_).toList)

.mapConcat[Companies2Export](identity)



val parse: Flow[Companies2Export, List[Company]] = Flow[Companies2Export]

.transform(() => new LoggingStage("ExportService"))

.map[List[Company]](f => parseFile(f.source, index, f.`type`).toList)



val flatten: Flow[List[Company], Company] = Flow[List[Company]].mapConcat[Company](identity)



val group: Flow[Company, Seq[Company]] = Flow[Company].grouped(bulkSize)



val toJson: Flow[Seq[Company], String] = Flow[Seq[Company]]

.map(_.map(_.toBulkIndex(index.name)).mkString(crlf) + crlf)



val bulkIndex: Flow[String, EsBulkResponse] = Flow[String]

.mapAsyncUnordered[EsBulkResponse](esBulk(index.name, _))



val count: Flow[EsBulkResponse, Int] = Flow[EsBulkResponse].map[Int]((b)=>{

logger.debug(s"index ${b.items.size} companies within ${b.took} ms")

b.items.size

})



// connect the graph

source ~> walk ~> parse ~> flatten ~> group ~> toJson ~> bulkIndex ~> count ~> sum

}



val result: Future[Int] = g.runWith(sum)



result.foreach(c => logger.info(s"*** total companies indexed: $c"))



result.onComplete(_aliases(index.name, before))
Akka Streams - bulk export to es
Akka Streams - frequent item set
Akka Streams - frequent item setval g:FlowGraph = FlowGraph{ implicit builder =>

import FlowGraphImplicits._



val source = Source.single(sorted)



import com.sksamuel.elastic4s.ElasticDsl._



val toJson = Flow[List[Seq[Long]]].map(_.map(seq => {

val now = Calendar.getInstance().getTime

val uuid = seq.mkString("-")

update(uuid)

.in(s"${esInputStore(store)}/CartCombination")

.upsert(

"uuid" -> uuid,

"combinations" -> seq.map(_.toString),

"counter" -> 1,

"dateCreated" -> now,

"lastUpdated" -> now)

.script("ctx._source.counter += count;ctx._source.lastUpdated = now")

.params("count" -> 1, "now" -> now)

}))



val flatten = Flow[List[BulkCompatibleDefinition]].mapConcat[BulkCompatibleDefinition](identity)



val count = Flow[BulkResponse].map[Int]((resp)=>{

val nb = resp.getItems.length

logger.debug(s"index $nb combinations within ${resp.getTookInMillis} ms")

nb

})



import EsClient._



source ~> combinationsFlow ~> toJson ~> flatten ~> bulkBalancedFlow() ~> count ~> sum

}



val start = new Date().getTime

val result: Future[Int] = g.runWith(sum)



result.foreach(c => logger.debug(s"*** $c combinations indexed within ${new Date().getTime - start} ms"))
Akka Streams - frequent item setval combinationsFlow = Flow(){implicit b =>

import FlowGraphImplicits._



val undefinedSource = UndefinedSource[Seq[Long]]

val undefinedSink = UndefinedSink[List[Seq[Long]]]



undefinedSource ~> Flow[Seq[Long]].map[List[Seq[Long]]](s => {

val combinations : ListBuffer[Seq[Long]] = ListBuffer.empty

1 to s.length foreach {i => combinations ++= s.combinations(i).toList}

combinations.toList

}) ~> undefinedSink



(undefinedSource, undefinedSink)

}
def bulkBalancedFlow(bulkSize: Int = Settings.ElasticSearch.bulkSize, balanceSize: Int = 2) = 

Flow(){implicit b =>

import FlowGraphImplicits._



val in = UndefinedSource[BulkCompatibleDefinition]

val group = Flow[BulkCompatibleDefinition].grouped(bulkSize)

val bulkUpsert = Flow[Seq[BulkCompatibleDefinition]].map(bulk)

val out = UndefinedSink[BulkResponse]



if(balanceSize > 1){



val balance = Balance[Seq[BulkCompatibleDefinition]]

val merge = Merge[BulkResponse]



in ~> group ~> balance

1 to balanceSize foreach { _ =>

balance ~> bulkUpsert ~> merge

}

merge ~> out



}

else{

in ~> group ~> bulkUpsert ~> out

}



(in, out)

}
Akka Streams - frequent item set
Akka Streams - frequent item set
def fis(store: String, productId: String, frequency: Double = 0.2): Future[(Seq[String], Seq[String])] = {

implicit val _ = ActorFlowMaterializer()



import com.mogobiz.run.learning._



val source = Source.single((productId, frequency))



val extract = Flow[(Option[CartCombination], Option[CartCombination])].map((x) => {

(

x._1.map(_.combinations.toSeq).getOrElse(Seq.empty),

x._2.map(_.combinations.toSeq).getOrElse(Seq.empty))

})



val exclude = Flow[(Seq[String], Seq[String])]

.map(x => (x._1.filter(_ != productId), x._2.filter(_ != productId)))



val head = Sink.head[(Seq[String], Seq[String])]



val runnable:RunnableFlow = source

.transform(() => new LoggingStage[(String, Double)]("Learning"))

.via(frequentItemSets(store))

.via(extract)

.via(exclude)

.to(head)



runnable.run().get(head)

}
def frequentItemSets(store: String) = Flow() { implicit builder =>

import FlowGraphImplicits._



val undefinedSource = UndefinedSource[(String, Double)]

val broadcast = Broadcast[Option[(String, Long)]]

val zip = Zip[Option[CartCombination], Option[CartCombination]]

val undefinedSink = UndefinedSink[(Option[CartCombination], Option[CartCombination])]



undefinedSource ~> calculateSupportThreshold(store) ~> broadcast

broadcast ~> loadMostFrequentItemSet(store) ~> zip.left

broadcast ~> loadLargerFrequentItemSet(store) ~> zip.right

zip.out ~> undefinedSink



(undefinedSource, undefinedSink)

}
Akka Streams - frequent item set
def calculateSupportThreshold(store:String) = Flow[(String, Double)].map { tuple =>

load[CartCombination](esInputStore(store), tuple._1).map { x =>

Some(tuple._1, Math.ceil(x.counter * tuple._2).toLong)

}.getOrElse(None)

}



def loadMostFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>

searchAll[CartCombination](cartCombinations(store, x) sort {

by field "counter" order SortOrder.DESC

}).headOption).getOrElse(None))



def loadLargerFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>

searchAll[CartCombination](cartCombinations(store, x) sort {

by script "doc['combinations'].values.size()" order SortOrder.DESC

}).headOption).getOrElse(None))




Mais conteúdo relacionado

Mais procurados

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingSantosh Sahoo
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Gyula Fóra
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark StreamingKnoldus Inc.
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark StreamingGerard Maas
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Spark Summit
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedMichael Spector
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...InfluxData
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming JobsDatabricks
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Spark and spark streaming internals
Spark and spark streaming internalsSpark and spark streaming internals
Spark and spark streaming internalsSigmoid
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_sparkYiguang Hu
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineTianlun Zhang
 

Mais procurados (20)

Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark StreamingBellevue Big Data meetup: Dive Deep into Spark Streaming
Bellevue Big Data meetup: Dive Deep into Spark Streaming
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
Extending Flux to Support Other Databases and Data Stores | Adam Anthony | In...
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Productionizing your Streaming Jobs
Productionizing your Streaming JobsProductionizing your Streaming Jobs
Productionizing your Streaming Jobs
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Spark and spark streaming internals
Spark and spark streaming internalsSpark and spark streaming internals
Spark and spark streaming internals
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Data analysis scala_spark
Data analysis scala_sparkData analysis scala_spark
Data analysis scala_spark
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
Apache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engineApache Gearpump next-gen streaming engine
Apache Gearpump next-gen streaming engine
 

Destaque

Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey ClientMichal Gajdos
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsNLJUG
 
Reactive streams
Reactive streamsReactive streams
Reactive streamscodepitbull
 
Reactive Streams and RabbitMQ
Reactive Streams and RabbitMQReactive Streams and RabbitMQ
Reactive Streams and RabbitMQmkiedys
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Izzet Mustafaiev
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonClaudiu Barbura
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingAhmed Soliman
 
Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Björn Antonsson
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)Claudiu Barbura
 
Micro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsMicro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsDejan Glozic
 
12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM DeploymentJoe Kutner
 

Destaque (13)

Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey Client
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based Applications
 
Reactive streams
Reactive streamsReactive streams
Reactive streams
 
Reactive Streams and RabbitMQ
Reactive Streams and RabbitMQReactive Streams and RabbitMQ
Reactive Streams and RabbitMQ
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function Programming
 
Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Micro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsMicro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factors
 
12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment
 

Semelhante a PSUG #52 Dataflow and simplified reactive programming with Akka-streams

Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTPRoland Kuhn
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a clusterGal Marder
 
Akka stream and Akka CQRS
Akka stream and  Akka CQRSAkka stream and  Akka CQRS
Akka stream and Akka CQRSMilan Das
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?Miklos Christine
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objectsMikhail Girkin
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...DataWorks Summit
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...confluent
 
Streaming data to s3 using akka streams
Streaming data to s3 using akka streamsStreaming data to s3 using akka streams
Streaming data to s3 using akka streamsMikhail Girkin
 
Optimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxOptimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxInfluxData
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne Chauchot
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsKevin Webber
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream APIApache Apex
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsDean Wampler
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Data Con LA
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Satalia
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsKonrad Malawski
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flinkRenato Guimaraes
 

Semelhante a PSUG #52 Dataflow and simplified reactive programming with Akka-streams (20)

Intro to Akka Streams
Intro to Akka StreamsIntro to Akka Streams
Intro to Akka Streams
 
Akka Streams and HTTP
Akka Streams and HTTPAkka Streams and HTTP
Akka Streams and HTTP
 
Stream processing from single node to a cluster
Stream processing from single node to a clusterStream processing from single node to a cluster
Stream processing from single node to a cluster
 
Akka stream and Akka CQRS
Akka stream and  Akka CQRSAkka stream and  Akka CQRS
Akka stream and Akka CQRS
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objects
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
 
Streaming data to s3 using akka streams
Streaming data to s3 using akka streamsStreaming data to s3 using akka streams
Streaming data to s3 using akka streams
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Optimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for FluxOptimizing the Grafana Platform for Flux
Optimizing the Grafana Platform for Flux
 
Etienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runnerEtienne chauchot spark structured streaming runner
Etienne chauchot spark structured streaming runner
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
 
Reactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka StreamsReactive Stream Processing with Akka Streams
Reactive Stream Processing with Akka Streams
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 

Mais de Stephane Manciot

Des principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvreDes principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvreStephane Manciot
 
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015Stephane Manciot
 
DevOps avec Ansible et Docker
DevOps avec Ansible et DockerDevOps avec Ansible et Docker
DevOps avec Ansible et DockerStephane Manciot
 
De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013Stephane Manciot
 

Mais de Stephane Manciot (6)

Des principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvreDes principes de la démarche DevOps à sa mise en oeuvre
Des principes de la démarche DevOps à sa mise en oeuvre
 
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
Packaging et déploiement d'une application avec Docker et Ansible @DevoxxFR 2015
 
DevOps avec Ansible et Docker
DevOps avec Ansible et DockerDevOps avec Ansible et Docker
DevOps avec Ansible et Docker
 
Docker / Ansible
Docker / AnsibleDocker / Ansible
Docker / Ansible
 
Ansible - Introduction
Ansible - IntroductionAnsible - Introduction
Ansible - Introduction
 
De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013De Maven à SBT ScalaIO 2013
De Maven à SBT ScalaIO 2013
 

Último

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Último (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

PSUG #52 Dataflow and simplified reactive programming with Akka-streams

  • 1. PSUG #52 Dataflow and simplified reactive programming with Akka-streams
 
 Stéphane Manciot 26/02/2015
  • 2. Plan ○ Dataflow main features ○ Reactive Streams ○ Akka-streams - Pipeline Dataflow ○ Akka-streams - working with graphs
  • 3. Dataflow - main features
  • 4. Dataflow Nodes and data ○ A node is a processing element that takes inputs, does some operation and returns the results on its outputs ○ Data (referred to as tokens) flows between nodes through arcs ○ The only way for a node to send and receive data is through ports ○ A port is the connection point between an arc and a node
  • 5. Dataflow Arcs ○ Always connects one output port to one input port ○ Has a capacity : the maximum amount of tokens that an arc can hold at any one time ○ Node activation requirements : ○ at least one token waiting on the input arc ○ space available on the output arc ○ Reading data from the input arc frees space on the latter ○ Arcs may contain zero or more tokens as long as it is less than the arc’s capacity
  • 6. Dataflow graph ○ Explicitly states the connections between nodes ○ Dataflow graph execution : ○ Node activation requirements are fulfilled ○ A token is consumed from the input arc ○ The node executes ○ If needed, a new token is pushed to the output arc
  • 7. Dataflow features - push or pull data ○ Push : ○ nodes send token to other nodes whenever they are available ○ the data producer is in control and initiates transmissions ○ Pull : ○ the demand flows upstream ○ allows a node to lazily produce an output only when needed ○ the data consumer is in control
  • 8. Dataflow features - mutable or immutable data ○ splits require to copy mutable data ○ Immutable data is preferred any time parallel computations exist
  • 9. Dataflow features - multiple inputs and/or outputs ○ multiple inputs ○ node “firing rule” ○ ALL of its inputs have tokens (zip) ○ ANY of its inputs have tokens waiting (merge) ○ activation preconditions ○ a firing rule for the node must match the available input tokens ○ space for new token(s) are available on the outputs ○ multiple outputs
  • 10. Dataflow features - compound nodes ○ the ability to create new nodes by using other nodes ○ a smaller data flow program ○ the interior nodes ○ should not know of anything outside of the compound node ○ must be able to connect to the ports of the parent compound node
  • 11. Dataflow features - arc joins and/or splits ○ Fan-in : merge, zip, concat ○ Fan-out : broadcast, balance, unzip
  • 13. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ Blocking calls : queue with a bounded size
  • 14. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ The Push way ○ unbounded buffer : Out Of Memory error ○ bounded buffer with NACK
  • 15. Reactive Streams - back-pressure ○ How handling data across asynchronous boundary ? ○ The reactive way : non-blocking, non-dropping & bounded ○ data items flow downstream ○ demand flows upstream ○ data items flow only when there is demand
  • 16. Reactive Streams - dynamic push / pull model ○ “push” behaviour when Subscriber is faster ○ “pull” behaviour when Publisher is faster ○ switches automatically between those behaviours in runtime ○ batching demand allows batching data
  • 17. Reactive Streams - back-pressure
  • 18. Reactive Streams - back-pressure
  • 19. Reactive Streams - SPI trait Publisher[T] {
 def subscribe(sub: Subscriber[T]): Unit
 }
 
 trait Subscription{
 def requestMore(n: Int): Unit
 def cancel(): Unit
 }
 
 trait Subscriber[T] {
 def onSubscribe(s: Subscription): Unit
 def onNext(elem: T): Unit
 def onError(thr: Throwable): Unit
 def onComplete(): Unit
 }

  • 20. Reactive Streams - asynchronous non-blocking protocol
  • 21. Akka Streams - Pipeline Dataflow
  • 22. Akka Streams - Core Concepts ○ a DSL implementation of Reactive Streams relying on Akka actors ○ Stream which involves moving and/or transforming data ○ Element : the processing unit of streams ○ Processing Stage : building blocks that build up a Flow or FlowGraph (map(), filter(), transform(), junctions …)
  • 23. Akka Streams - Pipeline Dataflow ○ Source : a processing stage with exactly one output, emitting data elements when downstream processing stages are ready to receive them ○ Sink : a processing stage with exactly one input, requesting and accepting data elements ○ Flow : a processing stage with exactly one input and output, which connects its up- and downstream by moving / transforming the data elements flowing through it ○ Runnable Flow : a Flow that has both ends “attached” to a Source and Sink respectively ○ Materialized Flow : a Runnable Flow that ran
  • 24. Akka Streams - Pipeline Dataflow /**
 * Construct the ActorSystem we will use in our application
 */
 implicit lazy val system = ActorSystem("DataFlow")
 
 implicit val _ = ActorFlowMaterializer()
 
 val source = Source(1 to 10)
 
 val sink = Sink.fold[Int, Int](0)(_ + _)
 
 val transform = Flow[Int].map(_ + 1).filter(_ % 2 == 0)
 
 // connect the Source to the Sink, obtaining a runnable flow
 val runnable: RunnableFlow = source.via(transform).to(sink)
 
 // materialize the flow
 val materialized: MaterializedMap = runnable.run()
 
 implicit val dispatcher = system.dispatcher
 
 // retrieve the result of the folding process over the stream
 val result: Future[Int] = materialized.get(sink)
 result.foreach(println)
  • 25. Akka Streams - Pipeline Dataflow ○ Processing stages are immutable (connecting them returns a new processing stage) val source = Source(1 to 10)
 source.map(_ => 0) // has no effect on source, since it’s immutable
 source.runWith(Sink.fold(0)(_ + _)) // 55 ○ A stream can be materialized multiple times // connect the Source to the Sink, obtaining a RunnableFlow
 val sink = Sink.fold[Int, Int](0)(_ + _)
 val runnable: RunnableFlow = Source(1 to 10).to(sink)
 // get the materialized value of the FoldSink
 val sum1: Future[Int] = runnable.run().get(sink)
 val sum2: Future[Int] = runnable.run().get(sink)
 // sum1 and sum2 are different Futures!
 for(a <- sum1; b <- sum2) yield println(a + b) //110 ○ Processing stages preserve input order of elements
  • 26. Akka Streams - working with graphs
  • 27. Akka Streams - bulk export to es
  • 28. val bulkSize: Int = 100
 
 val source: Source[String] = Source(types.toList)
 
 val sum: Sink[Int] = FoldSink[Int, Int](0)(_ + _)
 
 val g:FlowGraph = FlowGraph{ implicit builder =>
 import FlowGraphImplicits._
 
 val walk: Flow[String, Companies2Export] = Flow[String]
 .map(fileTree(_).toList)
 .mapConcat[Companies2Export](identity)
 
 val parse: Flow[Companies2Export, List[Company]] = Flow[Companies2Export]
 .transform(() => new LoggingStage("ExportService"))
 .map[List[Company]](f => parseFile(f.source, index, f.`type`).toList)
 
 val flatten: Flow[List[Company], Company] = Flow[List[Company]].mapConcat[Company](identity)
 
 val group: Flow[Company, Seq[Company]] = Flow[Company].grouped(bulkSize)
 
 val toJson: Flow[Seq[Company], String] = Flow[Seq[Company]]
 .map(_.map(_.toBulkIndex(index.name)).mkString(crlf) + crlf)
 
 val bulkIndex: Flow[String, EsBulkResponse] = Flow[String]
 .mapAsyncUnordered[EsBulkResponse](esBulk(index.name, _))
 
 val count: Flow[EsBulkResponse, Int] = Flow[EsBulkResponse].map[Int]((b)=>{
 logger.debug(s"index ${b.items.size} companies within ${b.took} ms")
 b.items.size
 })
 
 // connect the graph
 source ~> walk ~> parse ~> flatten ~> group ~> toJson ~> bulkIndex ~> count ~> sum
 }
 
 val result: Future[Int] = g.runWith(sum)
 
 result.foreach(c => logger.info(s"*** total companies indexed: $c"))
 
 result.onComplete(_aliases(index.name, before)) Akka Streams - bulk export to es
  • 29. Akka Streams - frequent item set
  • 30. Akka Streams - frequent item setval g:FlowGraph = FlowGraph{ implicit builder =>
 import FlowGraphImplicits._
 
 val source = Source.single(sorted)
 
 import com.sksamuel.elastic4s.ElasticDsl._
 
 val toJson = Flow[List[Seq[Long]]].map(_.map(seq => {
 val now = Calendar.getInstance().getTime
 val uuid = seq.mkString("-")
 update(uuid)
 .in(s"${esInputStore(store)}/CartCombination")
 .upsert(
 "uuid" -> uuid,
 "combinations" -> seq.map(_.toString),
 "counter" -> 1,
 "dateCreated" -> now,
 "lastUpdated" -> now)
 .script("ctx._source.counter += count;ctx._source.lastUpdated = now")
 .params("count" -> 1, "now" -> now)
 }))
 
 val flatten = Flow[List[BulkCompatibleDefinition]].mapConcat[BulkCompatibleDefinition](identity)
 
 val count = Flow[BulkResponse].map[Int]((resp)=>{
 val nb = resp.getItems.length
 logger.debug(s"index $nb combinations within ${resp.getTookInMillis} ms")
 nb
 })
 
 import EsClient._
 
 source ~> combinationsFlow ~> toJson ~> flatten ~> bulkBalancedFlow() ~> count ~> sum
 }
 
 val start = new Date().getTime
 val result: Future[Int] = g.runWith(sum)
 
 result.foreach(c => logger.debug(s"*** $c combinations indexed within ${new Date().getTime - start} ms"))
  • 31. Akka Streams - frequent item setval combinationsFlow = Flow(){implicit b =>
 import FlowGraphImplicits._
 
 val undefinedSource = UndefinedSource[Seq[Long]]
 val undefinedSink = UndefinedSink[List[Seq[Long]]]
 
 undefinedSource ~> Flow[Seq[Long]].map[List[Seq[Long]]](s => {
 val combinations : ListBuffer[Seq[Long]] = ListBuffer.empty
 1 to s.length foreach {i => combinations ++= s.combinations(i).toList}
 combinations.toList
 }) ~> undefinedSink
 
 (undefinedSource, undefinedSink)
 } def bulkBalancedFlow(bulkSize: Int = Settings.ElasticSearch.bulkSize, balanceSize: Int = 2) = 
 Flow(){implicit b =>
 import FlowGraphImplicits._
 
 val in = UndefinedSource[BulkCompatibleDefinition]
 val group = Flow[BulkCompatibleDefinition].grouped(bulkSize)
 val bulkUpsert = Flow[Seq[BulkCompatibleDefinition]].map(bulk)
 val out = UndefinedSink[BulkResponse]
 
 if(balanceSize > 1){
 
 val balance = Balance[Seq[BulkCompatibleDefinition]]
 val merge = Merge[BulkResponse]
 
 in ~> group ~> balance
 1 to balanceSize foreach { _ =>
 balance ~> bulkUpsert ~> merge
 }
 merge ~> out
 
 }
 else{
 in ~> group ~> bulkUpsert ~> out
 }
 
 (in, out)
 }
  • 32. Akka Streams - frequent item set
  • 33. Akka Streams - frequent item set def fis(store: String, productId: String, frequency: Double = 0.2): Future[(Seq[String], Seq[String])] = {
 implicit val _ = ActorFlowMaterializer()
 
 import com.mogobiz.run.learning._
 
 val source = Source.single((productId, frequency))
 
 val extract = Flow[(Option[CartCombination], Option[CartCombination])].map((x) => {
 (
 x._1.map(_.combinations.toSeq).getOrElse(Seq.empty),
 x._2.map(_.combinations.toSeq).getOrElse(Seq.empty))
 })
 
 val exclude = Flow[(Seq[String], Seq[String])]
 .map(x => (x._1.filter(_ != productId), x._2.filter(_ != productId)))
 
 val head = Sink.head[(Seq[String], Seq[String])]
 
 val runnable:RunnableFlow = source
 .transform(() => new LoggingStage[(String, Double)]("Learning"))
 .via(frequentItemSets(store))
 .via(extract)
 .via(exclude)
 .to(head)
 
 runnable.run().get(head)
 } def frequentItemSets(store: String) = Flow() { implicit builder =>
 import FlowGraphImplicits._
 
 val undefinedSource = UndefinedSource[(String, Double)]
 val broadcast = Broadcast[Option[(String, Long)]]
 val zip = Zip[Option[CartCombination], Option[CartCombination]]
 val undefinedSink = UndefinedSink[(Option[CartCombination], Option[CartCombination])]
 
 undefinedSource ~> calculateSupportThreshold(store) ~> broadcast
 broadcast ~> loadMostFrequentItemSet(store) ~> zip.left
 broadcast ~> loadLargerFrequentItemSet(store) ~> zip.right
 zip.out ~> undefinedSink
 
 (undefinedSource, undefinedSink)
 }
  • 34. Akka Streams - frequent item set def calculateSupportThreshold(store:String) = Flow[(String, Double)].map { tuple =>
 load[CartCombination](esInputStore(store), tuple._1).map { x =>
 Some(tuple._1, Math.ceil(x.counter * tuple._2).toLong)
 }.getOrElse(None)
 }
 
 def loadMostFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>
 searchAll[CartCombination](cartCombinations(store, x) sort {
 by field "counter" order SortOrder.DESC
 }).headOption).getOrElse(None))
 
 def loadLargerFrequentItemSet(store: String) = Flow[Option[(String, Long)]].map(_.map((x) =>
 searchAll[CartCombination](cartCombinations(store, x) sort {
 by script "doc['combinations'].values.size()" order SortOrder.DESC
 }).headOption).getOrElse(None))