Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Arvindsujeeth scaladays12
1. Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown,
Hassan Chafi, Michael Wu, Victoria Popic, Kunle Olukotun
Stanford University
Pervasive Parallelism Laboratory (PPL)
Tiark Rompf, Aleksandar Prokopec, Vojin Jovanovic,
Philipp Haller, Martin Odersky
Ecole Polytechnique Federale de Lausanne (EPFL)
Programming Methods Laboratory (LAMP)
4. Pthreads Sun
OpenMP T2
CUDA Nvidia
OpenCL Fermi
Verilog Altera
VHDL FPGA
MPI
PGAS Cray
Jaguar
5. Applications
Pthreads Sun
Scientific OpenMP T2
Engineering
Virtual CUDA Nvidia
Worlds OpenCL Fermi
Personal
Robotics
Verilog Altera
VHDL FPGA
Data
Informatics
MPI
PGAS Cray
Jaguar
6. Applications
Pthreads Sun
Scientific OpenMP T2
Engineering
Virtual DSLs CUDA Nvidia
Worlds OpenCL Fermi
Personal
Robotics
Verilog Altera
VHDL FPGA
Data
Informatics
MPI
PGAS Cray
Jaguar
Too many different programming models
7. n Tiark Rompf’s talk yesterday
n In case you missed it:
n Techniques for rewriting high-level
programs to high-performance programs
n Build an intermediate representation (IR)
of Scala programs at runtime
n IR can be optimized and code generated
8. n Introduction to existing Delite DSLs
n Constructing your own Delite DSL
n Not covered – under the covers:
n Implementation details about the Delite
framework
n See http://cgo2012.hyperdsls.org/
9. n Syntax is legal Scala
A B A C
n Staged
to build an IR * *
(metaprogramming) +
n Optimized at a high level
n Compiled
to different low-level target
architectures
11. OptiML: An Implicitly Parallel Domain-Specific Language for
Machine Learning, ICML 2011
n Provides a familiar (MATLAB-like) language and
API for writing ML applications
n Ex. val
c
=
a
*
b
(a, b are Matrix[Double])
n Implicitly parallel data structures
n Base types: Vector[T], Matrix[T], Graph[V,E], Stream[T]
n Subtypes: TrainingSet, IndexVector, Image, …
n Implicitly parallel control structures
n sum{…}, (0::end) {…}, gradient { … }, untilconverged { … }
n Arguments to control structures are anonymous functions with
restricted semantics
12. untilconverged(mu,
tol){
mu
=>
//
calculate
distances
to
current
centroids
//
move
each
cluster
centroid
to
the
//
mean
of
the
points
assigned
to
it
}
13. untilconverged(mu,
tol){
mu
=>
//
calculate
distances
to
current
centroids
val
c
=
(0::m){i
=>
val
allDistances
=
mu
mapRows
{
centroid
=>
dist(x(i),
centroid)
}
allDistances.minIndex
}
//
move
each
cluster
centroid
to
the
//
mean
of
the
points
assigned
to
it
}
14. untilconverged(mu,
tol){
mu
=>
//
calculate
distances
to
current
centroids
val
c
=
(0::m){i
=>
val
allDistances
=
mu
mapRows
{
centroid
=>
dist(x(i),
centroid)
}
fused
allDistances.minIndex
}
//
move
each
cluster
centroid
to
the
//
mean
of
the
points
assigned
to
it
val
newMu
=
(0::k,*){
i
=>
val
(weightedpoints,
points)
=
sum(0,m)
{
j
=>
if
(c(i)
==
j)
(x(i),1)
}
val
d
=
if
(points
==
0)
1
else
points
weightedpoints
/
d
}
newMu
}
15. n Dataquerying of in-memory
collections
n inspired by LINQ
n SQL-like declarative language
n Use
high-level semantic knowledge to
implement query optimizer
16. //
lineItems:
Iterable[LineItem]
//
Similar
to
Q1
of
the
TPCH
benchmark
hoisted
val
q
=
lineItems
Where(_.l_shipdate
<=
Date(‘‘19981201’’)).
GroupBy(l
=>
(l.l_linestatus)).
Select(g
=>
new
Result
{
val
lineStatus
=
g.key
val
sumQty
=
g.Sum(_.l_quantity)
val
sumDiscountedPrice
=
g.Sum(r
=>
r.l_extendedprice*(1.0-‐r.l_discount))
fused
val
avgPrice
=
g.Average(_.l_extendedprice)
val
countOrder
=
g.Count
})
OrderBy(_.returnFlag)
ThenBy(_.lineStatus)
17. n A DSL for large-scale graph analysis based
on Green-Marl
Green-Marl: A DSL for Easy and Efficient Graph Analysis (Hong et. al.), ASPLOS ’12
n Directed and undirected graphs, nodes,
edges
n Collections for node/edge storage
n Set, sequence, order
n Deferred assignment and parallel reductions
with bulk synchronous consistency
18. Implicitly parallel iteration
for(t
<-‐
G.Nodes)
{
val
rank
=
((1.0
d)/
N)
+
d
*
Sum(t.InNbrs){w
=>
PR(w)
/
w.OutDegree}
PR
<=
(t,rank)
diff
+=
Math.abs(rank
-‐
PR(t))
}
Deferred assignment and scalar reduction
Writes become visible after the loop completes
19. n A port of a subset of Scala collections to a
staged Delite DSL
n Demonstrates the benefits of high-level
optimization and code generation
val
sourcedests
=
pagelinks
flatMap
{
l
=>
val
sd
=
l.split(":")
val
source
=
Long.parseLong(sd(0))
Tuples
val
dests
=
sd(1).trim.split("
")
encoded
dests.map(d
=>
(Integer.parseInt(d),
source))
as longs
}
in back-
val
inverted
=
sourcedests
groupBy
(x
=>
x._1)
end
Reverse web-link benchmark in OptiCollections
26. 1. Types
n abstract, front-end
2. Operations
n language operators and methods available on types;
represented by IR nodes
3. Data Structures
n platform-specific concrete implementation, back-end
4. Code Generators
n Scala traits that define how to emit code as strings for
various IR nodes and platforms
5. Analyses and Optimizations (Optional)
n IR rewriting via pattern matching, traversals/transformations
(e.g. fusion)
27. abstract
class
Vector[T]
extends
DeliteCollection[T]
abstract
class
Matrix[T]
extends
DeliteCollection[T]
abstract
class
Image[T]
extends
Matrix[T]
placeholders for static type
checking and method dispatch;
not bound to any implementation
28. The same abstract
trait
VectorOps
{
Vector we defined earlier
//
add
an
infix
+
operator
to
Rep[Vector[A]]
def
infix_+(lhs:
Rep[Vector[A]],
rhs:
Rep[Vector[A]])
=
vector_plus(lhs,
rhs)
//
abstract,
applications
cannot
inspect
what
happens
//
when
methods
are
called
def
vector_length(lhs:
Rep[Vector[A]]):
Rep[Int]
def
vector_plus(lhs:
Rep[Vector[A]],
rhs:
Rep[Vector[A]]):
Rep[Vector[A]]
}
29. trait
VectorOpsExp
extends
VectorOps
with
Expressions
{
//
a
Delite
parallel
op
IR
node
case
class
VectorPlus(inA:
Exp[Vector[A]],
inB:
Exp[Vector[A]])
extends
DeliteOpZipWith[Vector[A],
Vector[A],
Vector[A]]
{
//
number
of
elements
in
the
input
collections
def
size
=
inA.length
//
the
output
collection
def
alloc
=
Vector[A](inA.length)
//
the
ZipWith
function
def
func
=
(a,b)
=>
a
+
b
}
//
construct
IR
nodes
def
vector_plus(lhs:
Exp[Vector[A]],
rhs:
Exp[Vector[A]])
=
VectorPlus(lhs,
rhs)
}
30. //
a
concrete,
back-‐end
Scala
data
structure
//
will
be
instantiated
by
generated
code
class
Vector[T](__length:
Int)
{
var
_length
=
__length
var
_data:
Array[T]
=
new
Array[T](_length)
}
//
corresponding
data
structures
for
other
back-‐ends
//
(CUDA,
OpenCL,
etc.)
//
.
.
.
31. trait
ScalaGenVectorOps
extends
ScalaGen
{
val
IR:
VectorOpsExp
import
IR._
override
def
emitNode(sym:
Sym[Any],
rhs:
Def[Any])
(implicit
stream:
PrintWriter)
=
//
generate
code
for
particular
IR
nodes
rhs
match
{
The exact
case
v@VectorNew(length)
=>
back-end field
emitValDef(sym,
“new
"
+
remap("Vector")+"("
+
quote(length)
+
")")
name we
case
VectorLength(x)
=>
defined earlier
emitValDef(sym,
quote(x)
+
".
_length")
case
_
=>
super.emitNode(sym,
rhs)
}
}
32. override
def
matrix_plus[A:Manifest:Arith]
(x:
Exp[Matrix[A]],
y:
Exp[Matrix[A]])
=
(x,
y)
match
{
//
(AB
+
AD)
==
A(B
+
D)
case
(Def(MatrixTimes(a,
b)),
Def(MatrixTimes(c,
d)))
if
(a
==
c)
=>
//
return
optimized
version
matrix_times(a,
matrix_plus(b,d))
//
other
rewrites
//
case
.
.
.
case
_
=>
super.matrix_plus(x,
y)
}
33. trait
OptiML
extends
OptiMLScalaOpsPkg
with
VectorOps
with
MatrixOps
with
...
trait
OptiMLExp
extends
OptiMLScalaOpsPkgExp
with
VectorOpsExp
with
MatrixOpsExp
with
...
trait
OptiMLCodeGenScala
extends
OptiMLScalaCodeGenPkg
with
ScalaGenVectorOps
with
ScalaGenMatrixOps
with
...
trait
OptiMLCodeGenCuda
extends
OptiMLCudaCodeGenPkg
with
CudaGenVectorOps
with
CudaGenMatrixOps
with
...
34. n Delite DSLs target high performance
architectures from Scala
n Open source – use them to accelerate
your apps or build your own!
n http://github.com/stanford-ppl/Delite
n Mailing List:
n http://groups.google.com/group/delite-devel
n Thank you