SlideShare uma empresa Scribd logo
1 de 124
Baixar para ler offline
JAREKJAREK
Started on C64 (Basic, 6502), Amiga (68k, C), C++, Java, JS, Scala, Kotlin...
MOTIVATIONMOTIVATION
Mythical slow solution
IS THAT ALL TRUE?IS THAT ALL TRUE?
long time ago
tmp = $fb
Start:
ldx #10; !! decimal
clc
ldy #$0
lda #$1
fib: dex
beq end
sty tmp
tay
adc tmp
jmp fib
end:
tay
long time ago
branch not taken 2 cycles
tmp = $fb
Start:
ldx #10; !! decimal
clc
ldy #$0
lda #$1
fib: dex
beq end
sty tmp
tay
adc tmp
jmp fib
end:
tay
long time ago
branch not taken 2 cycles
branch taken 3(!!!) cycles
tmp = $fb
Start:
ldx #10; !! decimal
clc
ldy #$0
lda #$1
fib: dex
beq end
sty tmp
tay
adc tmp
jmp fib
end:
tay
Rewrite if statement to save a cycle
motorola 68040
branch taken �xed prediction strategy
PERFORMANCE MYTHSPERFORMANCE MYTHS
not precise,
valid only in certain contexts,
valid in the past,
not relevant
COST OF PERFORMANCE MYTHSCOST OF PERFORMANCE MYTHS
Code:
less readable
less �exible
less abstract
Lost focus.
PERFORMANCE IS A GOOD MARKETING TOOLPERFORMANCE IS A GOOD MARKETING TOOL
Marketing my shiny new library
SCALASCALA
Scala combines object-oriented and functional programming in one
concise, high-level language.
Scala is far from metal(s)
ARE WE IMMUNE?ARE WE IMMUNE?
SOME ARTICLESSOME ARTICLES
applicative-functors-vs-monad-composing-performance-in-scala
No More Transformers: High-Performance E�ects in Scalaz 8
ON PERFORMANCE OF THE NEW COLLECTIONS
Scala’s immutable collections can be slow as a snail
...
SCALA DOES NOT MAKE IT EASYSCALA DOES NOT MAKE IT EASY
A given piece of code may and will perform di�erently when
called/used in a di�erent context
WE WILL GO DEEPWE WILL GO DEEP
TOOLSTOOLS
All the benchmarks and numbers in this talk are presented only to
show the tools
DO NOT make any decisions based on those numbers
LET'S TALK ABOUT IMMUTABILITYLET'S TALK ABOUT IMMUTABILITY
object ImmutableCalc {
case class Calculator(total : Int = 0) {
def add(i: Int) :Calculator = this.copy(total = total + i)
def get() : Int = total
}
}
object MutableCalc {
class Calculator {
var total = 0
def add(i: Int) : Unit = total+=i
def get() : Int = total
}
}
INTO THE BYTECODEINTO THE BYTECODE
ls
'ImmutableCalc$Calculator.class'
'ImmutableCalc$Calculator$.class'
ImmutableCalc.class
'ImmutableCalc$.class'k
and ~ 20 other methods
//CODE decompiled using decompile Scala to Java
public final class ImmutableCalc {
public static class Calculator implements Product, Serializable {
private final int total;
public Iterator productElementNames() {
return Product.productElementNames$(this);
}
public int total() {
return this.total;
}
public ImmutableCalc.Calculator add(final int i) {
return this.copy(this.total() + i);
}
public int get() {
return this.total();
}
public ImmutableCalc.Calculator copy(final int total) {
If you do not like IDE...
javap -p -c 'ImmutableCalc$Calculator.class'
public calculator.immuable.ImmutableCalc$Calculator add(int);
Code:
0: aload_0
1: aload_0
2: invokevirtual #31 // Method total:()I
5: iload_1
6: iadd
7: invokevirtual #34 // Method copy:(I)Lcalculator/immuable/ImmutableCalc$Calcula
10: areturn
public calculator.immuable.ImmutableCalc$Calculator add(int);
Code:
0: aload_0 ; stack = [this]
1: aload_0 ; stack = [this,this]
2: invokevirtual #31 ; total() ; stack = [this,this.total]
5: iload_1 ; stack = [this, this.total, i]
6: iadd ; stack = [this, total + i]
7: invokevirtual #34 ; copy(...); stack = [newObject]
10: areturn ;return newObject
https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings
INFOINFO
There is also lot of metadata in .class �les (not only bytecode)
No one needs to read bytecode - unless You work on a compiler
Run it
object CalcRun {
def main(args: Array[String]): Unit = {
var sum = 0L
for ( i <-0 to 100000) {
val calc = ImmutableCalc.Calculator()
sum+=(1 to 10000).foldLeft( calc)((c,i) => c.add(i)).get()
}
println(sum)
}
}
JITJIT
JITJIT
-XX:+PrintCompilation
0,1,2,3,4 - compilation Tier - higher ~ more optimized
% - on stack replacement
made not entrant - do not use this
! - exception handler
s - synchronized
298 485 2 runners.CalcRun$::$anonfun$main$2 (6 bytes)
298 486 2 calculator.immuable.ImmutableCalc$Calculator::add (11 bytes)
298 487 2 calculator.immuable.ImmutableCalc$Calculator::copy (9 bytes)
298 472 1 java.lang.Integer::intValue (5 bytes)
298 473 1 scala.collection.immutable.RangeIterator::hasNext (5 bytes)
298 475 2 java.lang.Integer::<init> (10 bytes)
298 474 1 calculator.immuable.ImmutableCalc$Calculator::total (5 bytes)
298 476 2 java.lang.Integer::valueOf (32 bytes)
...
335 501 % 4 scala.collection.IterableOnceOps::foldLeft @ 13 (43 bytes)
341 499 % 3 scala.collection.IterableOnceOps::foldLeft @ -2 (43 bytes) made not entrant
0,1,2,3,4 - compilation Tier - higher ~ more optimized
% - on stack replacement
made not entrant - do not use this
! - exception handler
s - synchronized
298 485 2 runners.CalcRun$::$anonfun$main$2 (6 bytes)
298 486 2 calculator.immuable.ImmutableCalc$Calculator::add (11 bytes)
298 487 2 calculator.immuable.ImmutableCalc$Calculator::copy (9 bytes)
298 472 1 java.lang.Integer::intValue (5 bytes)
298 473 1 scala.collection.immutable.RangeIterator::hasNext (5 bytes)
298 475 2 java.lang.Integer::<init> (10 bytes)
298 474 1 calculator.immuable.ImmutableCalc$Calculator::total (5 bytes)
298 476 2 java.lang.Integer::valueOf (32 bytes)
...
335 501 % 4 scala.collection.IterableOnceOps::foldLeft @ 13 (43 bytes)
341 499 % 3 scala.collection.IterableOnceOps::foldLeft @ -2 (43 bytes) made not entrant
https://gist.github.com/rednaxelafx/1165804#�le_notes.md
-XX:+PrintInlining
1275 579 4 runners.CalcRun$$$Lambda$3/1967205423::apply$mcVI$sp (9 bytes)
@ 5 runners.CalcRun$::$anonfun$main$1 (56 bytes) inline (hot)
@ 7 calculator.immuable.ImmutableCalc$Calculator$::apply$default
@ 10 calculator.immuable.ImmutableCalc$Calculator::<init> (
@ 6 java.lang.Object::<init> (1 bytes) inline (hot)
@ 10 scala.Product::$init$ (1 bytes) inline (hot)
@ 26 scala.LowPriorityImplicits::intWrapper (2 bytes) inline (hot)
@ 32 scala.runtime.RichInt$::to$extension (15 bytes) inline (hot)
@ 11 scala.collection.immutable.Range$Inclusive::<init> (
@ 4 scala.collection.immutable.Range::<init> (215 bytes) inline
@ 16 scala.collection.immutable.AbstractSeq::<init> (
@ 1 scala.collection.AbstractSeq::<init> (5 bytes) inline (h
@ 1 scala.collection.AbstractIterable::<init> (5 bytes) in
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 78 scala.collection.immutable.Range::isEmpty (5 bytes) acces
@ 89 scala.collection.immutable.Range::longLength (25 bytes) i
@ 1 scala.collection.immutable.Range::gap (12 bytes) inline
@ 1 scala.collection.immutable.Range::end (5 bytes) access
@ 6 scala.collection.immutable.Range::start (5 bytes) acce
@ 5 scala.collection.immutable.Range::step (5 bytes) accesso
@ 11 scala.collection.immutable.Range::hasStub (18 bytes) in
@ 1 scala.collection.immutable.Range$Inclusive::isInclusive
WHY BOTHER WITH PRINTCOMPILATION,WHY BOTHER WITH PRINTCOMPILATION,
INLININGINLINING
1. check if critical code was in fact JITed
2. sometimes problems with bytecode can be seen (method never
compiled)
3. look for uncommon traps
MEASURINGMEASURING
JMH
jvm warmup
prevent dead code elimination
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.3.7")
VM version: JDK 11.0.6, OpenJDK 64-Bit Server VM, 11.0.6+9-jvmci-20.0-
b02
@Benchmark
def measureWrong: Unit =
Math.log(x)
@Benchmark
def measureRight: Double =
Math.log(x)
[info] JMHSample_08_DeadCode.measureRight avgt 3 9.891 ± 13.465 ns/op
[info] JMHSample_08_DeadCode.measureWrong avgt 3 0.383 ± 0.132 ns/op
@Benchmark
def measureWrong: Unit =
Math.log(x) // in fact same as = {}
Let's measure CalcCost
@Benchmark
def imperative: Int = {
val calc = new MutableCalc.Calculator()
(1 to numOps).foreach(calc.add)
calc.get()
}
@Benchmark
def immutable: Int = {
val calc = ImmutableCalc.Calculator()
(1 to numOps).foldLeft( calc)((c,i) => c.add(i)).get()
}
object ImmutableCalc {
case class Calculator(total : Int = 0) {
def add(i: Int) :Calculator = this.copy(total = total + i)
def get() : Int = total
}
}
object MutableCalc {
class Calculator {
var total = 0
def add(i: Int) : Unit = total+=i
def get() : Int = total
}
}
VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
[info] Benchmark Mode Cnt Score Error Units
[info] CalcCost.freeMonadScalaZ avgt 5 154761.225 ± 10652.856 ns/op
[info] CalcCost.immutable avgt 5 15393.204 ± 670.565 ns/op
[info] CalcCost.imperative avgt 5 284.435 ± 11.790 ns/op
[info] CalcCost.tfIO avgt 5 123841.734 ± 30096.835 ns/op
VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
[info] CalcCost.freeMonadScalaZ avgt 5 130559.311 ± 21298.291 ns/op
[info] CalcCost.immutable avgt 5 3682.014 ± 510.659 ns/op
[info] CalcCost.imperative avgt 5 263.753 ± 8.953 ns/op
[info] CalcCost.tfIO avgt 5 100277.673 ± 4185.455 ns/op
JVM 20.0.0.r11-grl
20.0.0.r11-grl
[info] CalcCost.freeMonadScalaZ avgt 15 147832.984 ± 15674.721 ns/op
[info] CalcCost.immutable avgt 15 601.389 ± 8.249 ns/op
[info] CalcCost.imperative avgt 15 420.588 ± 103.107 ns/op
[info] CalcCost.tfIO avgt 15 121616.405 ± 5716.283 ns/op
INSPECTING THE JVMINSPECTING THE JVM
JavaFlightRecorder - record "events"
Java Mission Control - browse results
The Java Flight Recorder (JFR) is a commercial feature. You can use it
for free on developer desktops/laptops, and for evaluation purposes in
test, development, and production environments. However, to enable
JFR on a production server, you require a commercial license. Using
JMC UI for other purposes on the JDK does not require a commercial
license.
Java Flight Recorder
low overhead,
detailed jvm events,
additional graphs and tools (for jfr) (intellij)
own events, (use for low overhead trace log)
only on some jvms
license - check your jvm version
-XX:+FlightRecorder
-XX:StartFlightRecording=duration=60s,filename=myrecording.j
JVM 8
JVM 8
Graal
Graal
JPSJPS
hint: jps -mlVv (detailed jvm args)
58661 Main
93654 Jps
57464 NailgunRunner
77659 org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar
80924 Launcher
jstack 91221 (java 8)
"main" #1 prio=5 os_prio=0 tid=0x00007f33b400d800 nid=0x14812 runnable [0x00007f33bdf38000]
java.lang.Thread.State: RUNNABLE
at calculator.immuable.ImmutableCalc$Calculator.<init>(ImmutableCalc.scala:4)
at calculator.immuable.ImmutableCalc$Calculator.copy(ImmutableCalc.scala:4)
at calculator.immuable.ImmutableCalc$Calculator.add(ImmutableCalc.scala:6)
at runners.CalcRun$.$anonfun$main$2(CalcRun.scala:10)
at runners.CalcRun$.$anonfun$main$2$adapted(CalcRun.scala:10)
at runners.CalcRun$$$Lambda$4/801197928.apply(Unknown Source)
at scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:636)
at scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:632)
at scala.collection.AbstractIterable.foldLeft(Iterable.scala:921)
at runners.CalcRun$.$anonfun$main$1(CalcRun.scala:10)
jstack 94038 (graal)
"main" #1 prio=5 os_prio=0 cpu=26791.23ms elapsed=26.89s tid=0x00007fafb0028800 nid=0x13d86 runnable [
java.lang.Thread.State: RUNNABLE
at scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:635)
at scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:632)
at scala.collection.AbstractIterable.foldLeft(Iterable.scala:921)
at runners.CalcRun$.$anonfun$main$1(CalcRun.scala:10)
at runners.CalcRun$$$Lambda$17/0x00000008400d8040.apply$mcVI$sp(Unknown Source)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
at runners.CalcRun$.main(CalcRun.scala:8)
at runners.CalcRun.main(CalcRun.scala)
"Reference Handler" #2 daemon prio=10 os_prio=0 cpu=0.32ms elapsed=26.87s tid=0x00007fafb024c800 nid=
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.6/Native Method)
at java.lang.ref.Reference.processPendingReferences(java.base@11.0.6/Reference.java:241
at java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.6/Reference.java:213)
BEWARE OF SAFEPOINTSBEWARE OF SAFEPOINTS
INTO ASSEMBLYINTO ASSEMBLY
We need to go deeper
-XX:+PrintAssembly
[Constants]
# {method} {0x00007f93f0e2fc48} 'add' '(I)Lcalculator/immuable/ImmutableCalc$Calculator;' in
# this: rsi:rsi = 'calculator/immuable/ImmutableCalc$Calculator'
# parm0: rdx = int
# [sp+0x80] (sp of caller)
0x00007f93f1422680: mov 0x8(%rsi),%r10d
0x00007f93f1422684: shl $0x3,%r10
0x00007f93f1422688: cmp %rax,%r10
0x00007f93f142268b: jne 0x00007f93f1045ba0 ; {runtime_call}
0x00007f93f1422691: nopl 0x0(%rax,%rax,1)
0x00007f93f1422699: nopl 0x0(%rax)
[Verified Entry Point]
0x00007f93f14226a0: mov %eax,-0x14000(%rsp)
0x00007f93f14226a7: push %rbp
0x00007f93f14226a8: sub $0x70,%rsp
0x00007f93f14226ac: movabs $0x7f93f0e43db0,%rdi
0x00007f93f14226b6: mov 0x8(%rdi),%eax
0x00007f93f14226b9: add $0x8,%eax
0x00007f93f14226bc: mov %eax,0x8(%rdi)
0x00007f93f14226bf: movabs $0x7f93f0e2fc48,%rdi ; {metadata({method} {0x00007f93f0e2fc48} 'add' '(I
0x00007f93f14226c9: and $0x3ff8,%eax
0x00007f93f14226cf: cmp $0x0,%eax
-XX:PrintAssemblyOptions=intel
0x00007f11f540f880: mov r10d,DWORD PTR [rsi+0x8]
0x00007f11f540f884: shl r10,0x3
0x00007f11f540f888: cmp r10,rax
0x00007f11f540f88b: jne 0x00007f11f5045ba0 ; {runtime_call}
0x00007f11f540f891: nop DWORD PTR [rax+rax*1+0x0]
0x00007f11f540f899: nop DWORD PTR [rax+0x0]
[Verified Entry Point]
0x00007f11f540f8a0: mov DWORD PTR [rsp-0x14000],eax
0x00007f11f540f8a7: push rbp
0x00007f11f540f8a8: sub rsp,0x70
0x00007f11f540f8ac: movabs rdi,0x7f11e3e43db0
0x00007f11f540f8b6: mov eax,DWORD PTR [rdi+0x8]
0x00007f11f540f8b9: add eax,0x8
0x00007f11f540f8bc: mov DWORD PTR [rdi+0x8],eax
0x00007f11f540f8bf: movabs rdi,0x7f11e3e2fc48 ; {metadata({method} {0x00007f11e3e2fc48} 'add' '(I)L
0x00007f11f540f8c9: and eax,0x3ff8
0x00007f11f540f8cf: cmp eax,0x0
0x00007f11f540f8d2: je 0x00007f11f540f92c ;*aload_0
JIT WATCHJIT WATCH
CHECK JSTATCHECK JSTAT
jstat java 8
jstat -gccause 128409 1000 1000
S0 S1 E O M CCS YGC YGCT FGC FGCT CGC CGCT GCT LGCC
0.00 6.25 0.00 1.10 93.80 80.76 59 0.137 0 0.000 - - 0.137
0.00 0.00 56.00 1.10 93.80 80.76 63 0.146 0 0.000 - - 0.146
0.00 6.25 6.00 1.10 93.80 80.76 67 0.155 0 0.000 - - 0.155
6.25 0.00 76.00 1.10 93.80 80.76 70 0.162 0 0.000 - - 0.162
0.00 0.00 78.00 1.10 93.80 80.76 74 0.172 0 0.000 - - 0.172
6.25 0.00 70.00 1.10 93.80 80.76 78 0.181 0 0.000 - - 0.181
6.25 0.00 44.00 1.10 93.80 80.76 82 0.193 0 0.000 - - 0.193
0.00 0.00 22.00 1.10 93.80 80.76 86 0.199 0 0.000 - - 0.199
6.25 0.00 20.00 1.10 93.80 80.76 90 0.205 0 0.000 - - 0.205
0.00 0.00 44.00 1.10 93.80 80.76 95 0.213 0 0.000 - - 0.213
0.00 6.25 58.00 1.10 93.80 80.76 101 0.226 0 0.000 - - 0.226
0.00 0.00 76.00 1.10 93.80 80.76 104 0.239 0 0.000 - - 0.239
0.00 6.25 42.00 1.10 93.80 80.76 107 0.249 0 0.000 - - 0.249
0.00 0.00 30.00 1.10 93.80 80.76 110 0.255 0 0.000 - - 0.255
jstat graal
jstat -gccause 129920 1000 1000
S0 S1 E O M CCS YGC YGCT FGC FGCT CGC CGCT GCT LGCC
0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
THERE ARE MORE TOOLSTHERE ARE MORE TOOLS
JPro�ler DynaTrace
IF WE HAVE SOME TIMEIF WE HAVE SOME TIME
LETS TALS ABOUT MONAD TRANSFORMERSLETS TALS ABOUT MONAD TRANSFORMERS
case class User(id: Long, name: String, parentId: Long)
object UserData {
val allUsers = List(
User(1, "Nikt", 8),
User(2, "Ktos", 7),
User(4, "A", 2),
User(7, "B", 4)
)
}
(scalaz used)
object Users {
val allUsers = UserData.allUsers
def findUser(id: Long): OptionT[Task, User] = OptionT.optionT {
Task.delay({
val found: Option[User] = allUsers.find(u => u.id == id)
found
}
)
}
}
val indices: Seq[Int] = (1 to 100).toList
def run = {
val users: Seq[OptionT[Task, User]] = indices.map(id => Users.findUser(id))
val parents = users.map(u => u.flatMap(z => Users.findUser(z.parentId)))
parents.map(u => u.run.unsafePerformSync).count(u => u.isDefined)
}
object StackedMonads {
val indices: Seq[Int] = (1 to 100).toList
def run = {
val users = indices.map ( id => Users.findUser(id))
val parents = users.map ((u: Task[Option[User]]) =>
u.flatMap((ou: Option[User]) => ou match {
case None => Task.delay({None})
case Some(x) => Users.findUser(x.parentId)
}
)
)
parents.map (u=>u.unsafePerformSync).count(u =>u.isDefined)
}
def findUser(id: Long): Task[Option[User]] = {
Task.delay {
val found: Option[User] = allUsers.find(u => u.id == id)
found
}
}
Monad transformer vs simply stacked (jvm 8)
Let's simply use null
object NOMonad {
val indices: Seq[Int] = (1 to 100).toList
def run = {
val users = indices.map(id => Users.findUser(id))
val parents: Seq[User] = users.map(u =>
if (u != null) {
Users.findUser(u.parentId)
} else {
null
}
)
parents.count((u: User) =>u != null)
}
object Users {
val allUsers = UserData.allUsers
def findUser(id: Long): User = {
val found= allUsers.find(u => u.id == id).orNull
found
}
}
wow, nulls are awesome! (jvm 8)
object SimpleDB {
val connection = DB.initializeDB
def run = {
val users = indices.map(id => Users.findUser(id))
val parents: Seq[User] = users.map(u =>
if (u != null) {
Users.findUser(u.parentId)
} else {
null
}
)
parents.count((u: User) => u != null)
}
object Users {
val userQuery = connection.prepareStatement("SELECT * FROM USERS WHERE id= ?")
def findUser(id: Long): User = {
userQuery.clearParameters()
userQuery.setLong(1, id)
val res = userQuery.executeQuery()
Time to DROP DATABASE
Monad Transformer combined with DB
object OptimizedDB {
val connection = DB.initializeDB
def run = {
val count = Users.countParents()
count
}
object Users {
val userQuery = connection.prepareStatement("SELECT " +
"count(*) FROM USERS u1, USERS u2 WHERE u1.id>1 and u1.id < 100 AND u1.parentId = u2.id"
def countParents(): Int = {
userQuery.clearParameters()
val res = userQuery.executeQuery()
try {
if (res.next()) {
res.getInt(1)
} else {
throw new IllegalStateException("no count from DB")
}
Optimized DB (java 8)
Same on graal
jvm 8
[info] MT.monadTR avgt 3 16210.589 ± 14114.784 ns/op
[info] MT.monadTRDB avgt 3 112424.770 ± 109957.046 ns/op
[info] MT.noMonad avgt 3 1910.297 ± 64.699 ns/op
[info] MT.optimizedDB avgt 3 685.030 ± 99.553 ns/op
[info] MT.simpleDB avgt 3 103198.262 ± 36190.463 ns/op
[info] MT.stackedMonads avgt 3 13040.031 ± 733.337 ns/op
jvm 14
[info] MT.monadTR avgt 3 13328.252 ± 9569.020 ns/op
[info] MT.monadTRDB avgt 3 123824.784 ± 148322.320 ns/op
[info] MT.noMonad avgt 3 1934.838 ± 1596.319 ns/op
[info] MT.optimizedDB avgt 3 767.592 ± 49.940 ns/op
[info] MT.simpleDB avgt 3 96992.954 ± 828.014 ns/op
[info] MT.stackedMonads avgt 3 15777.281 ± 7370.754 ns/op
graal 20
[info] MT.monadTR avgt 3 11057.304 ± 4885.176 ns/op
[info] MT.monadTRDB avgt 3 119223.204 ± 109133.489 ns/op
[info] MT.noMonad avgt 3 1529.025 ± 71.307 ns/op
[info] MT.optimizedDB avgt 3 905.659 ± 34.349 ns/op
[info] MT.simpleDB avgt 3 107627.684 ± 6618.677 ns/op
[info] MT.stackedMonads avgt 3 8534.377 ± 739.795 ns/op
DEEPER THAN ASSEMBLYDEEPER THAN ASSEMBLY
Just looping
simple for
val array = Array.ofDim[Long](sz + 1, sz + 1)
@Benchmark
def forBasedLoop: Long = {
var sum: Long = 0
for (i <- 0 to sz) {
for (j <- 0 to sz) {
sum += array(i)(j)
}
}
sum
}
while
@Benchmark
def whileLoopA: Long = {
var sum: Long = 0
var i = 0
while (i < sz) {
var j = 0
while (j < sz) {
sum += array(i)(j)
j = j + 1
}
i = i + 1
}
sum
}
while reversed (and transposed)
@Benchmark
def whileRevLoop: Long = {
var sum: Long = 0
var j = sz
while (j >= 0) {
var i = sz
while (i >= 0) {
sum += array(i)(j)
i = i - 1
}
j = j - 1
}
sum
}
PERFORMANCE COUNTERSPERFORMANCE COUNTERS
jmh:run -prof perf -w 5s -r 5s -wi 3 -i 3 -f 3
.*ArrayLooping.*
[info] 14,650.91 msec task-clock # 0.482 CPUs utilized
[info] 525 context-switches # 0.036 K/sec
[info] 65 cpu-migrations # 0.004 K/sec
[info] 185 page-faults # 0.013 K/sec
[info] 30,370,384,927 cycles # 2.073 GHz (
[info] 285,624,461 stalled-cycles-frontend # 0.94% frontend cycles idle (
[info] 19,572,932,807 stalled-cycles-backend # 64.45% backend cycles idle (
[info] 35,907,154,673 instructions # 1.18 insn per cycle
[info] # 0.55 stalled cycles per insn (
[info] 6,706,376,215 branches # 457.745 M/sec (
[info] 12,810,882 branch-misses # 0.19% of all branches (
[info] 16,163,472,375 L1-dcache-loads # 1103.240 M/sec (
[info] 10,975,494,751 L1-dcache-load-misses # 67.90% of all L1-dcache hits (
[info] <not supported> LLC-loads
[info] <not supported> LLC-load-misses
[info] 175,788,792 L1-icache-loads # 11.998 M/sec (
[info] 7,051,118 L1-icache-load-misses # 4.01% of all L1-icache hits (
[info] 5,302,762,430 dTLB-loads # 361.941 M/sec (
[info] 14,779,238 dTLB-load-misses # 0.28% of all dTLB cache hits (
[info] 245,713 iTLB-loads # 0.017 M/sec (
[info] 1,195,242 iTLB-load-misses # 486.44% of all iTLB cache hits (
[info] 289,031,594 L1-dcache-prefetches # 19.728 M/sec (
while T+R
[info] 14,650.91 msec task-clock # 0.482 CPUs utilized
[info] 525 context-switches # 0.036 K/sec
[info] 65 cpu-migrations # 0.004 K/sec
[info] 185 page-faults # 0.013 K/sec
[info] 30,370,384,927 cycles # 2.073 GHz (
[info] 285,624,461 stalled-cycles-frontend # 0.94% frontend cycles idle (
[info] 19,572,932,807 stalled-cycles-backend # 64.45% backend cycles idle (
[info] 35,907,154,673 instructions # 1.18 insn per cycle
[info] # 0.55 stalled cycles per insn (
[info] 6,706,376,215 branches # 457.745 M/sec (
[info] 12,810,882 branch-misses # 0.19% of all branches (
[info] 16,163,472,375 L1-dcache-loads # 1103.240 M/sec (
[info] 10,975,494,751 L1-dcache-load-misses # 67.90% of all L1-dcache hits (
[info] <not supported> LLC-loads
[info] <not supported> LLC-load-misses
[info] 175,788,792 L1-icache-loads # 11.998 M/sec (
[info] 7,051,118 L1-icache-load-misses # 4.01% of all L1-icache hits (
[info] 5,302,762,430 dTLB-loads # 361.941 M/sec (
[info] 14,779,238 dTLB-load-misses # 0.28% of all dTLB cache hits (
[info] 245,713 iTLB-loads # 0.017 M/sec (
[info] 1,195,242 iTLB-load-misses # 486.44% of all iTLB cache hits (
[info] 289,031,594 L1-dcache-prefetches # 19.728 M/sec (
while T+R
L1
[info] 14,650.91 msec task-clock # 0.482 CPUs utilized
[info] 525 context-switches # 0.036 K/sec
[info] 65 cpu-migrations # 0.004 K/sec
[info] 185 page-faults # 0.013 K/sec
[info] 30,370,384,927 cycles # 2.073 GHz (
[info] 285,624,461 stalled-cycles-frontend # 0.94% frontend cycles idle (
[info] 19,572,932,807 stalled-cycles-backend # 64.45% backend cycles idle (
[info] 35,907,154,673 instructions # 1.18 insn per cycle
[info] # 0.55 stalled cycles per insn (
[info] 6,706,376,215 branches # 457.745 M/sec (
[info] 12,810,882 branch-misses # 0.19% of all branches (
[info] 16,163,472,375 L1-dcache-loads # 1103.240 M/sec (
[info] 10,975,494,751 L1-dcache-load-misses # 67.90% of all L1-dcache hits (
[info] <not supported> LLC-loads
[info] <not supported> LLC-load-misses
[info] 175,788,792 L1-icache-loads # 11.998 M/sec (
[info] 7,051,118 L1-icache-load-misses # 4.01% of all L1-icache hits (
[info] 5,302,762,430 dTLB-loads # 361.941 M/sec (
[info] 14,779,238 dTLB-load-misses # 0.28% of all dTLB cache hits (
[info] 245,713 iTLB-loads # 0.017 M/sec (
[info] 1,195,242 iTLB-load-misses # 486.44% of all iTLB cache hits (
[info] 289,031,594 L1-dcache-prefetches # 19.728 M/sec (
[info] 14,601.12 msec task-clock # 0.481 CPUs utilized
[info] 551 context-switches # 0.038 K/sec
[info] 79 cpu-migrations # 0.005 K/sec
[info] 142 page-faults # 0.010 K/sec
[info] 56,887,530,219 cycles # 3.896 GHz (
[info] 371,071,536 stalled-cycles-frontend # 0.65% frontend cycles idle (
[info] 39,685,104,564 stalled-cycles-backend # 69.76% backend cycles idle (
[info] 85,472,875,315 instructions # 1.50 insn per cycle
[info] # 0.46 stalled cycles per insn (
[info] 7,673,425,909 branches # 525.537 M/sec (
[info] 112,782,902 branch-misses # 1.47% of all branches (
[info] 56,302,951,054 L1-dcache-loads # 3856.070 M/sec (
[info] 7,136,733,184 L1-dcache-load-misses # 12.68% of all L1-dcache hits (
[info] <not supported> LLC-loads
[info] <not supported> LLC-load-misses
[info] 355,107,740 L1-icache-loads # 24.321 M/sec (
[info] 10,933,836 L1-icache-load-misses # 3.08% of all L1-icache hits (
[info] 113,201,590 dTLB-loads # 7.753 M/sec (
[info] 8,907,743 dTLB-load-misses # 7.87% of all dTLB cache hits (
[info] 46,228 iTLB-loads # 0.003 M/sec (
[info] 939,260 iTLB-load-misses # 2031.80% of all iTLB cache hits (
[info] 7,001,888,370 L1-dcache-prefetches # 479.545 M/sec (
normal while
[info] 14,601.12 msec task-clock # 0.481 CPUs utilized
[info] 551 context-switches # 0.038 K/sec
[info] 79 cpu-migrations # 0.005 K/sec
[info] 142 page-faults # 0.010 K/sec
[info] 56,887,530,219 cycles # 3.896 GHz (
[info] 371,071,536 stalled-cycles-frontend # 0.65% frontend cycles idle (
[info] 39,685,104,564 stalled-cycles-backend # 69.76% backend cycles idle (
[info] 85,472,875,315 instructions # 1.50 insn per cycle
[info] # 0.46 stalled cycles per insn (
[info] 7,673,425,909 branches # 525.537 M/sec (
[info] 112,782,902 branch-misses # 1.47% of all branches (
[info] 56,302,951,054 L1-dcache-loads # 3856.070 M/sec (
[info] 7,136,733,184 L1-dcache-load-misses # 12.68% of all L1-dcache hits (
[info] <not supported> LLC-loads
[info] <not supported> LLC-load-misses
[info] 355,107,740 L1-icache-loads # 24.321 M/sec (
[info] 10,933,836 L1-icache-load-misses # 3.08% of all L1-icache hits (
[info] 113,201,590 dTLB-loads # 7.753 M/sec (
[info] 8,907,743 dTLB-load-misses # 7.87% of all dTLB cache hits (
[info] 46,228 iTLB-loads # 0.003 M/sec (
[info] 939,260 iTLB-load-misses # 2031.80% of all iTLB cache hits (
[info] 7,001,888,370 L1-dcache-prefetches # 479.545 M/sec (
normal while
L1
[info] 14,601.12 msec task-clock # 0.481 CPUs utilized
[info] 551 context-switches # 0.038 K/sec
[info] 79 cpu-migrations # 0.005 K/sec
[info] 142 page-faults # 0.010 K/sec
[info] 56,887,530,219 cycles # 3.896 GHz (
[info] 371,071,536 stalled-cycles-frontend # 0.65% frontend cycles idle (
[info] 39,685,104,564 stalled-cycles-backend # 69.76% backend cycles idle (
[info] 85,472,875,315 instructions # 1.50 insn per cycle
[info] # 0.46 stalled cycles per insn (
[info] 7,673,425,909 branches # 525.537 M/sec (
[info] 112,782,902 branch-misses # 1.47% of all branches (
[info] 56,302,951,054 L1-dcache-loads # 3856.070 M/sec (
[info] 7,136,733,184 L1-dcache-load-misses # 12.68% of all L1-dcache hits (
[info] <not supported> LLC-loads
[info] <not supported> LLC-load-misses
[info] 355,107,740 L1-icache-loads # 24.321 M/sec (
[info] 10,933,836 L1-icache-load-misses # 3.08% of all L1-icache hits (
[info] 113,201,590 dTLB-loads # 7.753 M/sec (
[info] 8,907,743 dTLB-load-misses # 7.87% of all dTLB cache hits (
[info] 46,228 iTLB-loads # 0.003 M/sec (
[info] 939,260 iTLB-load-misses # 2031.80% of all iTLB cache hits (
[info] 7,001,888,370 L1-dcache-prefetches # 479.545 M/sec (
NOT COVEREDNOT COVERED
Garbage collectors
Ahead of time compilation
JVM options
BENCHMARKSBENCHMARKS
All benchmarks above were performed unprofessionaly
All benchmarks lie
ON A GOOD BENCHMARKSON A GOOD BENCHMARKS
Benchmark might be ok:
you test it for yourself
you read about the tools tested
you can �nd what causes di�erences
HINTSHINTS
Tools will not magically solve your problems
Tools let you feel more comfortable
Do not expect PrintAssembly or Performance counters will help you in
typical problems
REAL PROBLEMS I HAD ON PRODUCTIONREAL PROBLEMS I HAD ON PRODUCTION
hidden read of whole database in a dirty piece of code
static synchronized
hashCode bugs
jvm bugs (metaspace)
Lot of them could be spotted by using jstack
THANK YOUTHANK YOU

Mais conteúdo relacionado

Mais procurados

Reactive Programming for a demanding world: building event-driven and respons...
Reactive Programming for a demanding world: building event-driven and respons...Reactive Programming for a demanding world: building event-driven and respons...
Reactive Programming for a demanding world: building event-driven and respons...Mario Fusco
 
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015NAVER / MusicPlatform
 
ADG Poznań - Kotlin for Android developers
ADG Poznań - Kotlin for Android developersADG Poznań - Kotlin for Android developers
ADG Poznań - Kotlin for Android developersBartosz Kosarzycki
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestPavan Chitumalla
 
PVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error ExamplesPVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error ExamplesAndrey Karpov
 
Compact and safely: static DSL on Kotlin
Compact and safely: static DSL on KotlinCompact and safely: static DSL on Kotlin
Compact and safely: static DSL on KotlinDmitry Pranchuk
 
How Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerHow Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerAndrey Karpov
 
If You Think You Can Stay Away from Functional Programming, You Are Wrong
If You Think You Can Stay Away from Functional Programming, You Are WrongIf You Think You Can Stay Away from Functional Programming, You Are Wrong
If You Think You Can Stay Away from Functional Programming, You Are WrongMario Fusco
 
Introduction to julia
Introduction to juliaIntroduction to julia
Introduction to julia岳華 杜
 
The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180Mahmoud Samir Fayed
 
Develop your next app with kotlin @ AndroidMakersFr 2017
Develop your next app with kotlin @ AndroidMakersFr 2017Develop your next app with kotlin @ AndroidMakersFr 2017
Develop your next app with kotlin @ AndroidMakersFr 2017Arnaud Giuliani
 
Kotlin hands on - MorningTech ekito 2017
Kotlin hands on - MorningTech ekito 2017Kotlin hands on - MorningTech ekito 2017
Kotlin hands on - MorningTech ekito 2017Arnaud Giuliani
 
Kotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsKotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsBartosz Kosarzycki
 
The Ring programming language version 1.5.2 book - Part 13 of 181
The Ring programming language version 1.5.2 book - Part 13 of 181The Ring programming language version 1.5.2 book - Part 13 of 181
The Ring programming language version 1.5.2 book - Part 13 of 181Mahmoud Samir Fayed
 
Kotlin 101 for Java Developers
Kotlin 101 for Java DevelopersKotlin 101 for Java Developers
Kotlin 101 for Java DevelopersChristoph Pickl
 
FP in Java - Project Lambda and beyond
FP in Java - Project Lambda and beyondFP in Java - Project Lambda and beyond
FP in Java - Project Lambda and beyondMario Fusco
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia岳華 杜
 

Mais procurados (20)

Reactive Programming for a demanding world: building event-driven and respons...
Reactive Programming for a demanding world: building event-driven and respons...Reactive Programming for a demanding world: building event-driven and respons...
Reactive Programming for a demanding world: building event-driven and respons...
 
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
서버 개발자가 바라 본 Functional Reactive Programming with RxJava - SpringCamp2015
 
Sneaking inside Kotlin features
Sneaking inside Kotlin featuresSneaking inside Kotlin features
Sneaking inside Kotlin features
 
ADG Poznań - Kotlin for Android developers
ADG Poznań - Kotlin for Android developersADG Poznań - Kotlin for Android developers
ADG Poznań - Kotlin for Android developers
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at Pinterest
 
PVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error ExamplesPVS-Studio in 2021 - Error Examples
PVS-Studio in 2021 - Error Examples
 
Compact and safely: static DSL on Kotlin
Compact and safely: static DSL on KotlinCompact and safely: static DSL on Kotlin
Compact and safely: static DSL on Kotlin
 
Blazing Fast Windows 8 Apps using Visual C++
Blazing Fast Windows 8 Apps using Visual C++Blazing Fast Windows 8 Apps using Visual C++
Blazing Fast Windows 8 Apps using Visual C++
 
How Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzerHow Data Flow analysis works in a static code analyzer
How Data Flow analysis works in a static code analyzer
 
If You Think You Can Stay Away from Functional Programming, You Are Wrong
If You Think You Can Stay Away from Functional Programming, You Are WrongIf You Think You Can Stay Away from Functional Programming, You Are Wrong
If You Think You Can Stay Away from Functional Programming, You Are Wrong
 
Introduction to julia
Introduction to juliaIntroduction to julia
Introduction to julia
 
The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180The Ring programming language version 1.5.1 book - Part 12 of 180
The Ring programming language version 1.5.1 book - Part 12 of 180
 
Develop your next app with kotlin @ AndroidMakersFr 2017
Develop your next app with kotlin @ AndroidMakersFr 2017Develop your next app with kotlin @ AndroidMakersFr 2017
Develop your next app with kotlin @ AndroidMakersFr 2017
 
Kotlin hands on - MorningTech ekito 2017
Kotlin hands on - MorningTech ekito 2017Kotlin hands on - MorningTech ekito 2017
Kotlin hands on - MorningTech ekito 2017
 
Kotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsKotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projects
 
The Ring programming language version 1.5.2 book - Part 13 of 181
The Ring programming language version 1.5.2 book - Part 13 of 181The Ring programming language version 1.5.2 book - Part 13 of 181
The Ring programming language version 1.5.2 book - Part 13 of 181
 
Kotlin 101 for Java Developers
Kotlin 101 for Java DevelopersKotlin 101 for Java Developers
Kotlin 101 for Java Developers
 
FP in Java - Project Lambda and beyond
FP in Java - Project Lambda and beyondFP in Java - Project Lambda and beyond
FP in Java - Project Lambda and beyond
 
The Language for future-julia
The Language for future-juliaThe Language for future-julia
The Language for future-julia
 
Angular2 rxjs
Angular2 rxjsAngular2 rxjs
Angular2 rxjs
 

Semelhante a Scala to assembly

Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningCarol McDonald
 
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2PVS-Studio
 
Performance measurement and tuning
Performance measurement and tuningPerformance measurement and tuning
Performance measurement and tuningAOE
 
Java Keeps Throttling Up!
Java Keeps Throttling Up!Java Keeps Throttling Up!
Java Keeps Throttling Up!José Paumard
 
Whats new in_csharp4
Whats new in_csharp4Whats new in_csharp4
Whats new in_csharp4Abed Bukhari
 
Deuce STM - CMP'09
Deuce STM - CMP'09Deuce STM - CMP'09
Deuce STM - CMP'09Guy Korland
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsHostedbyConfluent
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨flyinweb
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Alexey Fyodorov
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?Doug Hawkins
 
Performance measurement and tuning
Performance measurement and tuningPerformance measurement and tuning
Performance measurement and tuningAOE
 
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...Juan Cruz Nores
 
Kotlin advanced - language reference for android developers
Kotlin advanced - language reference for android developersKotlin advanced - language reference for android developers
Kotlin advanced - language reference for android developersBartosz Kosarzycki
 
Kotlin Advanced - language reference for Android developers
Kotlin Advanced - language reference for Android developers Kotlin Advanced - language reference for Android developers
Kotlin Advanced - language reference for Android developers STX Next
 
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...David Beazley (Dabeaz LLC)
 
Applying Compiler Techniques to Iterate At Blazing Speed
Applying Compiler Techniques to Iterate At Blazing SpeedApplying Compiler Techniques to Iterate At Blazing Speed
Applying Compiler Techniques to Iterate At Blazing SpeedPascal-Louis Perez
 

Semelhante a Scala to assembly (20)

Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
 
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
 
Performance measurement and tuning
Performance measurement and tuningPerformance measurement and tuning
Performance measurement and tuning
 
Java Keeps Throttling Up!
Java Keeps Throttling Up!Java Keeps Throttling Up!
Java Keeps Throttling Up!
 
00_Introduction to Java.ppt
00_Introduction to Java.ppt00_Introduction to Java.ppt
00_Introduction to Java.ppt
 
Whats new in_csharp4
Whats new in_csharp4Whats new in_csharp4
Whats new in_csharp4
 
Deuce STM - CMP'09
Deuce STM - CMP'09Deuce STM - CMP'09
Deuce STM - CMP'09
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)
 
JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?JVM Mechanics: When Does the JVM JIT & Deoptimize?
JVM Mechanics: When Does the JVM JIT & Deoptimize?
 
Performance measurement and tuning
Performance measurement and tuningPerformance measurement and tuning
Performance measurement and tuning
 
Xgboost
XgboostXgboost
Xgboost
 
VLSI lab manual
VLSI lab manualVLSI lab manual
VLSI lab manual
 
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
 
Kotlin advanced - language reference for android developers
Kotlin advanced - language reference for android developersKotlin advanced - language reference for android developers
Kotlin advanced - language reference for android developers
 
Kotlin Advanced - language reference for Android developers
Kotlin Advanced - language reference for Android developers Kotlin Advanced - language reference for Android developers
Kotlin Advanced - language reference for Android developers
 
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
 
Applying Compiler Techniques to Iterate At Blazing Speed
Applying Compiler Techniques to Iterate At Blazing SpeedApplying Compiler Techniques to Iterate At Blazing Speed
Applying Compiler Techniques to Iterate At Blazing Speed
 

Mais de Jarek Ratajski

Pure Kotlin Devoxx PL 2021
Pure Kotlin Devoxx PL 2021Pure Kotlin Devoxx PL 2021
Pure Kotlin Devoxx PL 2021Jarek Ratajski
 
Transaction is a monad
Transaction is a  monadTransaction is a  monad
Transaction is a monadJarek Ratajski
 
Spring, CDI, Jakarta EE good parts
Spring, CDI, Jakarta EE good partsSpring, CDI, Jakarta EE good parts
Spring, CDI, Jakarta EE good partsJarek Ratajski
 
Eta lang Beauty And The Beast
Eta lang Beauty And The Beast Eta lang Beauty And The Beast
Eta lang Beauty And The Beast Jarek Ratajski
 
Another programming language - jeszcze jeden język
Another programming language - jeszcze jeden językAnother programming language - jeszcze jeden język
Another programming language - jeszcze jeden językJarek Ratajski
 
Beauty and the beast - Haskell on JVM
Beauty and the beast  - Haskell on JVMBeauty and the beast  - Haskell on JVM
Beauty and the beast - Haskell on JVMJarek Ratajski
 
Fighting null with memes
Fighting null with memesFighting null with memes
Fighting null with memesJarek Ratajski
 
Geecon walking in CODE
Geecon walking in CODEGeecon walking in CODE
Geecon walking in CODEJarek Ratajski
 
Scalaworld lambda core hardcore
Scalaworld lambda core hardcoreScalaworld lambda core hardcore
Scalaworld lambda core hardcoreJarek Ratajski
 
Jdd 2016 DROP DATABASE
Jdd 2016 DROP DATABASEJdd 2016 DROP DATABASE
Jdd 2016 DROP DATABASEJarek Ratajski
 

Mais de Jarek Ratajski (17)

respect-estimates.pdf
respect-estimates.pdfrespect-estimates.pdf
respect-estimates.pdf
 
Pure Kotlin Devoxx PL 2021
Pure Kotlin Devoxx PL 2021Pure Kotlin Devoxx PL 2021
Pure Kotlin Devoxx PL 2021
 
Lambda hardcore
Lambda hardcoreLambda hardcore
Lambda hardcore
 
Pure kotlin
Pure kotlinPure kotlin
Pure kotlin
 
Transaction is a monad
Transaction is a  monadTransaction is a  monad
Transaction is a monad
 
Spring, CDI, Jakarta EE good parts
Spring, CDI, Jakarta EE good partsSpring, CDI, Jakarta EE good parts
Spring, CDI, Jakarta EE good parts
 
Eta lang Beauty And The Beast
Eta lang Beauty And The Beast Eta lang Beauty And The Beast
Eta lang Beauty And The Beast
 
Another programming language - jeszcze jeden język
Another programming language - jeszcze jeden językAnother programming language - jeszcze jeden język
Another programming language - jeszcze jeden język
 
Beauty and the beast - Haskell on JVM
Beauty and the beast  - Haskell on JVMBeauty and the beast  - Haskell on JVM
Beauty and the beast - Haskell on JVM
 
Fighting null with memes
Fighting null with memesFighting null with memes
Fighting null with memes
 
Eta
EtaEta
Eta
 
Geecon walking in CODE
Geecon walking in CODEGeecon walking in CODE
Geecon walking in CODE
 
Scalaworld lambda core hardcore
Scalaworld lambda core hardcoreScalaworld lambda core hardcore
Scalaworld lambda core hardcore
 
Lambda core
Lambda coreLambda core
Lambda core
 
[4 dev] lagom
[4 dev] lagom[4 dev] lagom
[4 dev] lagom
 
Jdd 2016 DROP DATABASE
Jdd 2016 DROP DATABASEJdd 2016 DROP DATABASE
Jdd 2016 DROP DATABASE
 
DROPDB Galactic story
DROPDB Galactic storyDROPDB Galactic story
DROPDB Galactic story
 

Último

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 

Último (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 

Scala to assembly

  • 1.
  • 2. JAREKJAREK Started on C64 (Basic, 6502), Amiga (68k, C), C++, Java, JS, Scala, Kotlin...
  • 5.
  • 6. IS THAT ALL TRUE?IS THAT ALL TRUE?
  • 7. long time ago tmp = $fb Start: ldx #10; !! decimal clc ldy #$0 lda #$1 fib: dex beq end sty tmp tay adc tmp jmp fib end: tay
  • 8. long time ago branch not taken 2 cycles tmp = $fb Start: ldx #10; !! decimal clc ldy #$0 lda #$1 fib: dex beq end sty tmp tay adc tmp jmp fib end: tay
  • 9. long time ago branch not taken 2 cycles branch taken 3(!!!) cycles tmp = $fb Start: ldx #10; !! decimal clc ldy #$0 lda #$1 fib: dex beq end sty tmp tay adc tmp jmp fib end: tay
  • 10. Rewrite if statement to save a cycle
  • 11. motorola 68040 branch taken �xed prediction strategy
  • 12.
  • 13.
  • 14. PERFORMANCE MYTHSPERFORMANCE MYTHS not precise, valid only in certain contexts, valid in the past, not relevant
  • 15. COST OF PERFORMANCE MYTHSCOST OF PERFORMANCE MYTHS Code: less readable less �exible less abstract Lost focus.
  • 16. PERFORMANCE IS A GOOD MARKETING TOOLPERFORMANCE IS A GOOD MARKETING TOOL
  • 17. Marketing my shiny new library
  • 18. SCALASCALA Scala combines object-oriented and functional programming in one concise, high-level language.
  • 19. Scala is far from metal(s)
  • 20. ARE WE IMMUNE?ARE WE IMMUNE?
  • 21. SOME ARTICLESSOME ARTICLES applicative-functors-vs-monad-composing-performance-in-scala No More Transformers: High-Performance E�ects in Scalaz 8 ON PERFORMANCE OF THE NEW COLLECTIONS Scala’s immutable collections can be slow as a snail ...
  • 22. SCALA DOES NOT MAKE IT EASYSCALA DOES NOT MAKE IT EASY
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. A given piece of code may and will perform di�erently when called/used in a di�erent context
  • 34. WE WILL GO DEEPWE WILL GO DEEP
  • 35. TOOLSTOOLS All the benchmarks and numbers in this talk are presented only to show the tools DO NOT make any decisions based on those numbers
  • 36. LET'S TALK ABOUT IMMUTABILITYLET'S TALK ABOUT IMMUTABILITY
  • 37. object ImmutableCalc { case class Calculator(total : Int = 0) { def add(i: Int) :Calculator = this.copy(total = total + i) def get() : Int = total } }
  • 38. object MutableCalc { class Calculator { var total = 0 def add(i: Int) : Unit = total+=i def get() : Int = total } }
  • 39. INTO THE BYTECODEINTO THE BYTECODE
  • 41. and ~ 20 other methods //CODE decompiled using decompile Scala to Java public final class ImmutableCalc { public static class Calculator implements Product, Serializable { private final int total; public Iterator productElementNames() { return Product.productElementNames$(this); } public int total() { return this.total; } public ImmutableCalc.Calculator add(final int i) { return this.copy(this.total() + i); } public int get() { return this.total(); } public ImmutableCalc.Calculator copy(final int total) {
  • 42. If you do not like IDE...
  • 43. javap -p -c 'ImmutableCalc$Calculator.class' public calculator.immuable.ImmutableCalc$Calculator add(int); Code: 0: aload_0 1: aload_0 2: invokevirtual #31 // Method total:()I 5: iload_1 6: iadd 7: invokevirtual #34 // Method copy:(I)Lcalculator/immuable/ImmutableCalc$Calcula 10: areturn
  • 44. public calculator.immuable.ImmutableCalc$Calculator add(int); Code: 0: aload_0 ; stack = [this] 1: aload_0 ; stack = [this,this] 2: invokevirtual #31 ; total() ; stack = [this,this.total] 5: iload_1 ; stack = [this, this.total, i] 6: iadd ; stack = [this, total + i] 7: invokevirtual #34 ; copy(...); stack = [newObject] 10: areturn ;return newObject https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings
  • 45. INFOINFO There is also lot of metadata in .class �les (not only bytecode) No one needs to read bytecode - unless You work on a compiler
  • 46. Run it object CalcRun { def main(args: Array[String]): Unit = { var sum = 0L for ( i <-0 to 100000) { val calc = ImmutableCalc.Calculator() sum+=(1 to 10000).foldLeft( calc)((c,i) => c.add(i)).get() } println(sum) } }
  • 49. 0,1,2,3,4 - compilation Tier - higher ~ more optimized % - on stack replacement made not entrant - do not use this ! - exception handler s - synchronized 298 485 2 runners.CalcRun$::$anonfun$main$2 (6 bytes) 298 486 2 calculator.immuable.ImmutableCalc$Calculator::add (11 bytes) 298 487 2 calculator.immuable.ImmutableCalc$Calculator::copy (9 bytes) 298 472 1 java.lang.Integer::intValue (5 bytes) 298 473 1 scala.collection.immutable.RangeIterator::hasNext (5 bytes) 298 475 2 java.lang.Integer::<init> (10 bytes) 298 474 1 calculator.immuable.ImmutableCalc$Calculator::total (5 bytes) 298 476 2 java.lang.Integer::valueOf (32 bytes) ... 335 501 % 4 scala.collection.IterableOnceOps::foldLeft @ 13 (43 bytes) 341 499 % 3 scala.collection.IterableOnceOps::foldLeft @ -2 (43 bytes) made not entrant
  • 50. 0,1,2,3,4 - compilation Tier - higher ~ more optimized % - on stack replacement made not entrant - do not use this ! - exception handler s - synchronized 298 485 2 runners.CalcRun$::$anonfun$main$2 (6 bytes) 298 486 2 calculator.immuable.ImmutableCalc$Calculator::add (11 bytes) 298 487 2 calculator.immuable.ImmutableCalc$Calculator::copy (9 bytes) 298 472 1 java.lang.Integer::intValue (5 bytes) 298 473 1 scala.collection.immutable.RangeIterator::hasNext (5 bytes) 298 475 2 java.lang.Integer::<init> (10 bytes) 298 474 1 calculator.immuable.ImmutableCalc$Calculator::total (5 bytes) 298 476 2 java.lang.Integer::valueOf (32 bytes) ... 335 501 % 4 scala.collection.IterableOnceOps::foldLeft @ 13 (43 bytes) 341 499 % 3 scala.collection.IterableOnceOps::foldLeft @ -2 (43 bytes) made not entrant https://gist.github.com/rednaxelafx/1165804#�le_notes.md
  • 51. -XX:+PrintInlining 1275 579 4 runners.CalcRun$$$Lambda$3/1967205423::apply$mcVI$sp (9 bytes) @ 5 runners.CalcRun$::$anonfun$main$1 (56 bytes) inline (hot) @ 7 calculator.immuable.ImmutableCalc$Calculator$::apply$default @ 10 calculator.immuable.ImmutableCalc$Calculator::<init> ( @ 6 java.lang.Object::<init> (1 bytes) inline (hot) @ 10 scala.Product::$init$ (1 bytes) inline (hot) @ 26 scala.LowPriorityImplicits::intWrapper (2 bytes) inline (hot) @ 32 scala.runtime.RichInt$::to$extension (15 bytes) inline (hot) @ 11 scala.collection.immutable.Range$Inclusive::<init> ( @ 4 scala.collection.immutable.Range::<init> (215 bytes) inline @ 16 scala.collection.immutable.AbstractSeq::<init> ( @ 1 scala.collection.AbstractSeq::<init> (5 bytes) inline (h @ 1 scala.collection.AbstractIterable::<init> (5 bytes) in @ 1 java.lang.Object::<init> (1 bytes) inline (hot) @ 78 scala.collection.immutable.Range::isEmpty (5 bytes) acces @ 89 scala.collection.immutable.Range::longLength (25 bytes) i @ 1 scala.collection.immutable.Range::gap (12 bytes) inline @ 1 scala.collection.immutable.Range::end (5 bytes) access @ 6 scala.collection.immutable.Range::start (5 bytes) acce @ 5 scala.collection.immutable.Range::step (5 bytes) accesso @ 11 scala.collection.immutable.Range::hasStub (18 bytes) in @ 1 scala.collection.immutable.Range$Inclusive::isInclusive
  • 52. WHY BOTHER WITH PRINTCOMPILATION,WHY BOTHER WITH PRINTCOMPILATION, INLININGINLINING 1. check if critical code was in fact JITed 2. sometimes problems with bytecode can be seen (method never compiled) 3. look for uncommon traps
  • 53. MEASURINGMEASURING JMH jvm warmup prevent dead code elimination addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.3.7")
  • 54. VM version: JDK 11.0.6, OpenJDK 64-Bit Server VM, 11.0.6+9-jvmci-20.0- b02 @Benchmark def measureWrong: Unit = Math.log(x) @Benchmark def measureRight: Double = Math.log(x) [info] JMHSample_08_DeadCode.measureRight avgt 3 9.891 ± 13.465 ns/op [info] JMHSample_08_DeadCode.measureWrong avgt 3 0.383 ± 0.132 ns/op
  • 55. @Benchmark def measureWrong: Unit = Math.log(x) // in fact same as = {}
  • 57. @Benchmark def imperative: Int = { val calc = new MutableCalc.Calculator() (1 to numOps).foreach(calc.add) calc.get() } @Benchmark def immutable: Int = { val calc = ImmutableCalc.Calculator() (1 to numOps).foldLeft( calc)((c,i) => c.add(i)).get() }
  • 58. object ImmutableCalc { case class Calculator(total : Int = 0) { def add(i: Int) :Calculator = this.copy(total = total + i) def get() : Int = total } } object MutableCalc { class Calculator { var total = 0 def add(i: Int) : Unit = total+=i def get() : Int = total } }
  • 59. VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08
  • 60. VM version: JDK 1.8.0_242, OpenJDK 64-Bit Server VM, 25.242-b08 [info] Benchmark Mode Cnt Score Error Units [info] CalcCost.freeMonadScalaZ avgt 5 154761.225 ± 10652.856 ns/op [info] CalcCost.immutable avgt 5 15393.204 ± 670.565 ns/op [info] CalcCost.imperative avgt 5 284.435 ± 11.790 ns/op [info] CalcCost.tfIO avgt 5 123841.734 ± 30096.835 ns/op
  • 61. VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7
  • 62. VM version: JDK 14.0.1, OpenJDK 64-Bit Server VM, 14.0.1+7 [info] CalcCost.freeMonadScalaZ avgt 5 130559.311 ± 21298.291 ns/op [info] CalcCost.immutable avgt 5 3682.014 ± 510.659 ns/op [info] CalcCost.imperative avgt 5 263.753 ± 8.953 ns/op [info] CalcCost.tfIO avgt 5 100277.673 ± 4185.455 ns/op
  • 64. 20.0.0.r11-grl [info] CalcCost.freeMonadScalaZ avgt 15 147832.984 ± 15674.721 ns/op [info] CalcCost.immutable avgt 15 601.389 ± 8.249 ns/op [info] CalcCost.imperative avgt 15 420.588 ± 103.107 ns/op [info] CalcCost.tfIO avgt 15 121616.405 ± 5716.283 ns/op
  • 66. JavaFlightRecorder - record "events" Java Mission Control - browse results
  • 67. The Java Flight Recorder (JFR) is a commercial feature. You can use it for free on developer desktops/laptops, and for evaluation purposes in test, development, and production environments. However, to enable JFR on a production server, you require a commercial license. Using JMC UI for other purposes on the JDK does not require a commercial license.
  • 68. Java Flight Recorder low overhead, detailed jvm events, additional graphs and tools (for jfr) (intellij) own events, (use for low overhead trace log) only on some jvms license - check your jvm version -XX:+FlightRecorder -XX:StartFlightRecording=duration=60s,filename=myrecording.j
  • 69. JVM 8
  • 70. JVM 8
  • 71. Graal
  • 72. Graal
  • 73. JPSJPS hint: jps -mlVv (detailed jvm args) 58661 Main 93654 Jps 57464 NailgunRunner 77659 org.eclipse.equinox.launcher_1.3.0.v20140415-2008.jar 80924 Launcher
  • 74. jstack 91221 (java 8) "main" #1 prio=5 os_prio=0 tid=0x00007f33b400d800 nid=0x14812 runnable [0x00007f33bdf38000] java.lang.Thread.State: RUNNABLE at calculator.immuable.ImmutableCalc$Calculator.<init>(ImmutableCalc.scala:4) at calculator.immuable.ImmutableCalc$Calculator.copy(ImmutableCalc.scala:4) at calculator.immuable.ImmutableCalc$Calculator.add(ImmutableCalc.scala:6) at runners.CalcRun$.$anonfun$main$2(CalcRun.scala:10) at runners.CalcRun$.$anonfun$main$2$adapted(CalcRun.scala:10) at runners.CalcRun$$$Lambda$4/801197928.apply(Unknown Source) at scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:636) at scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:632) at scala.collection.AbstractIterable.foldLeft(Iterable.scala:921) at runners.CalcRun$.$anonfun$main$1(CalcRun.scala:10)
  • 75. jstack 94038 (graal) "main" #1 prio=5 os_prio=0 cpu=26791.23ms elapsed=26.89s tid=0x00007fafb0028800 nid=0x13d86 runnable [ java.lang.Thread.State: RUNNABLE at scala.collection.IterableOnceOps.foldLeft(IterableOnce.scala:635) at scala.collection.IterableOnceOps.foldLeft$(IterableOnce.scala:632) at scala.collection.AbstractIterable.foldLeft(Iterable.scala:921) at runners.CalcRun$.$anonfun$main$1(CalcRun.scala:10) at runners.CalcRun$$$Lambda$17/0x00000008400d8040.apply$mcVI$sp(Unknown Source) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) at runners.CalcRun$.main(CalcRun.scala:8) at runners.CalcRun.main(CalcRun.scala) "Reference Handler" #2 daemon prio=10 os_prio=0 cpu=0.32ms elapsed=26.87s tid=0x00007fafb024c800 nid= java.lang.Thread.State: RUNNABLE at java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.6/Native Method) at java.lang.ref.Reference.processPendingReferences(java.base@11.0.6/Reference.java:241 at java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.6/Reference.java:213)
  • 76. BEWARE OF SAFEPOINTSBEWARE OF SAFEPOINTS
  • 77. INTO ASSEMBLYINTO ASSEMBLY We need to go deeper
  • 79. [Constants] # {method} {0x00007f93f0e2fc48} 'add' '(I)Lcalculator/immuable/ImmutableCalc$Calculator;' in # this: rsi:rsi = 'calculator/immuable/ImmutableCalc$Calculator' # parm0: rdx = int # [sp+0x80] (sp of caller) 0x00007f93f1422680: mov 0x8(%rsi),%r10d 0x00007f93f1422684: shl $0x3,%r10 0x00007f93f1422688: cmp %rax,%r10 0x00007f93f142268b: jne 0x00007f93f1045ba0 ; {runtime_call} 0x00007f93f1422691: nopl 0x0(%rax,%rax,1) 0x00007f93f1422699: nopl 0x0(%rax) [Verified Entry Point] 0x00007f93f14226a0: mov %eax,-0x14000(%rsp) 0x00007f93f14226a7: push %rbp 0x00007f93f14226a8: sub $0x70,%rsp 0x00007f93f14226ac: movabs $0x7f93f0e43db0,%rdi 0x00007f93f14226b6: mov 0x8(%rdi),%eax 0x00007f93f14226b9: add $0x8,%eax 0x00007f93f14226bc: mov %eax,0x8(%rdi) 0x00007f93f14226bf: movabs $0x7f93f0e2fc48,%rdi ; {metadata({method} {0x00007f93f0e2fc48} 'add' '(I 0x00007f93f14226c9: and $0x3ff8,%eax 0x00007f93f14226cf: cmp $0x0,%eax
  • 80. -XX:PrintAssemblyOptions=intel 0x00007f11f540f880: mov r10d,DWORD PTR [rsi+0x8] 0x00007f11f540f884: shl r10,0x3 0x00007f11f540f888: cmp r10,rax 0x00007f11f540f88b: jne 0x00007f11f5045ba0 ; {runtime_call} 0x00007f11f540f891: nop DWORD PTR [rax+rax*1+0x0] 0x00007f11f540f899: nop DWORD PTR [rax+0x0] [Verified Entry Point] 0x00007f11f540f8a0: mov DWORD PTR [rsp-0x14000],eax 0x00007f11f540f8a7: push rbp 0x00007f11f540f8a8: sub rsp,0x70 0x00007f11f540f8ac: movabs rdi,0x7f11e3e43db0 0x00007f11f540f8b6: mov eax,DWORD PTR [rdi+0x8] 0x00007f11f540f8b9: add eax,0x8 0x00007f11f540f8bc: mov DWORD PTR [rdi+0x8],eax 0x00007f11f540f8bf: movabs rdi,0x7f11e3e2fc48 ; {metadata({method} {0x00007f11e3e2fc48} 'add' '(I)L 0x00007f11f540f8c9: and eax,0x3ff8 0x00007f11f540f8cf: cmp eax,0x0 0x00007f11f540f8d2: je 0x00007f11f540f92c ;*aload_0
  • 82.
  • 83.
  • 84.
  • 86. jstat java 8 jstat -gccause 128409 1000 1000 S0 S1 E O M CCS YGC YGCT FGC FGCT CGC CGCT GCT LGCC 0.00 6.25 0.00 1.10 93.80 80.76 59 0.137 0 0.000 - - 0.137 0.00 0.00 56.00 1.10 93.80 80.76 63 0.146 0 0.000 - - 0.146 0.00 6.25 6.00 1.10 93.80 80.76 67 0.155 0 0.000 - - 0.155 6.25 0.00 76.00 1.10 93.80 80.76 70 0.162 0 0.000 - - 0.162 0.00 0.00 78.00 1.10 93.80 80.76 74 0.172 0 0.000 - - 0.172 6.25 0.00 70.00 1.10 93.80 80.76 78 0.181 0 0.000 - - 0.181 6.25 0.00 44.00 1.10 93.80 80.76 82 0.193 0 0.000 - - 0.193 0.00 0.00 22.00 1.10 93.80 80.76 86 0.199 0 0.000 - - 0.199 6.25 0.00 20.00 1.10 93.80 80.76 90 0.205 0 0.000 - - 0.205 0.00 0.00 44.00 1.10 93.80 80.76 95 0.213 0 0.000 - - 0.213 0.00 6.25 58.00 1.10 93.80 80.76 101 0.226 0 0.000 - - 0.226 0.00 0.00 76.00 1.10 93.80 80.76 104 0.239 0 0.000 - - 0.239 0.00 6.25 42.00 1.10 93.80 80.76 107 0.249 0 0.000 - - 0.249 0.00 0.00 30.00 1.10 93.80 80.76 110 0.255 0 0.000 - - 0.255
  • 87. jstat graal jstat -gccause 129920 1000 1000 S0 S1 E O M CCS YGC YGCT FGC FGCT CGC CGCT GCT LGCC 0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003 0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003 0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003 0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003 0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003 0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003 0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003 0.00 100.00 2.58 0.00 57.25 54.24 1 0.003 0 0.000 0 0.000 0.003
  • 88. THERE ARE MORE TOOLSTHERE ARE MORE TOOLS JPro�ler DynaTrace
  • 89. IF WE HAVE SOME TIMEIF WE HAVE SOME TIME
  • 90. LETS TALS ABOUT MONAD TRANSFORMERSLETS TALS ABOUT MONAD TRANSFORMERS
  • 91. case class User(id: Long, name: String, parentId: Long) object UserData { val allUsers = List( User(1, "Nikt", 8), User(2, "Ktos", 7), User(4, "A", 2), User(7, "B", 4) ) }
  • 92. (scalaz used) object Users { val allUsers = UserData.allUsers def findUser(id: Long): OptionT[Task, User] = OptionT.optionT { Task.delay({ val found: Option[User] = allUsers.find(u => u.id == id) found } ) } } val indices: Seq[Int] = (1 to 100).toList def run = { val users: Seq[OptionT[Task, User]] = indices.map(id => Users.findUser(id)) val parents = users.map(u => u.flatMap(z => Users.findUser(z.parentId))) parents.map(u => u.run.unsafePerformSync).count(u => u.isDefined) }
  • 93. object StackedMonads { val indices: Seq[Int] = (1 to 100).toList def run = { val users = indices.map ( id => Users.findUser(id)) val parents = users.map ((u: Task[Option[User]]) => u.flatMap((ou: Option[User]) => ou match { case None => Task.delay({None}) case Some(x) => Users.findUser(x.parentId) } ) ) parents.map (u=>u.unsafePerformSync).count(u =>u.isDefined) } def findUser(id: Long): Task[Option[User]] = { Task.delay { val found: Option[User] = allUsers.find(u => u.id == id) found } }
  • 94. Monad transformer vs simply stacked (jvm 8)
  • 95. Let's simply use null object NOMonad { val indices: Seq[Int] = (1 to 100).toList def run = { val users = indices.map(id => Users.findUser(id)) val parents: Seq[User] = users.map(u => if (u != null) { Users.findUser(u.parentId) } else { null } ) parents.count((u: User) =>u != null) } object Users { val allUsers = UserData.allUsers def findUser(id: Long): User = { val found= allUsers.find(u => u.id == id).orNull found } }
  • 96. wow, nulls are awesome! (jvm 8)
  • 97. object SimpleDB { val connection = DB.initializeDB def run = { val users = indices.map(id => Users.findUser(id)) val parents: Seq[User] = users.map(u => if (u != null) { Users.findUser(u.parentId) } else { null } ) parents.count((u: User) => u != null) } object Users { val userQuery = connection.prepareStatement("SELECT * FROM USERS WHERE id= ?") def findUser(id: Long): User = { userQuery.clearParameters() userQuery.setLong(1, id) val res = userQuery.executeQuery()
  • 98. Time to DROP DATABASE
  • 100. object OptimizedDB { val connection = DB.initializeDB def run = { val count = Users.countParents() count } object Users { val userQuery = connection.prepareStatement("SELECT " + "count(*) FROM USERS u1, USERS u2 WHERE u1.id>1 and u1.id < 100 AND u1.parentId = u2.id" def countParents(): Int = { userQuery.clearParameters() val res = userQuery.executeQuery() try { if (res.next()) { res.getInt(1) } else { throw new IllegalStateException("no count from DB") }
  • 103. jvm 8 [info] MT.monadTR avgt 3 16210.589 ± 14114.784 ns/op [info] MT.monadTRDB avgt 3 112424.770 ± 109957.046 ns/op [info] MT.noMonad avgt 3 1910.297 ± 64.699 ns/op [info] MT.optimizedDB avgt 3 685.030 ± 99.553 ns/op [info] MT.simpleDB avgt 3 103198.262 ± 36190.463 ns/op [info] MT.stackedMonads avgt 3 13040.031 ± 733.337 ns/op
  • 104. jvm 14 [info] MT.monadTR avgt 3 13328.252 ± 9569.020 ns/op [info] MT.monadTRDB avgt 3 123824.784 ± 148322.320 ns/op [info] MT.noMonad avgt 3 1934.838 ± 1596.319 ns/op [info] MT.optimizedDB avgt 3 767.592 ± 49.940 ns/op [info] MT.simpleDB avgt 3 96992.954 ± 828.014 ns/op [info] MT.stackedMonads avgt 3 15777.281 ± 7370.754 ns/op
  • 105. graal 20 [info] MT.monadTR avgt 3 11057.304 ± 4885.176 ns/op [info] MT.monadTRDB avgt 3 119223.204 ± 109133.489 ns/op [info] MT.noMonad avgt 3 1529.025 ± 71.307 ns/op [info] MT.optimizedDB avgt 3 905.659 ± 34.349 ns/op [info] MT.simpleDB avgt 3 107627.684 ± 6618.677 ns/op [info] MT.stackedMonads avgt 3 8534.377 ± 739.795 ns/op
  • 106. DEEPER THAN ASSEMBLYDEEPER THAN ASSEMBLY Just looping
  • 107. simple for val array = Array.ofDim[Long](sz + 1, sz + 1) @Benchmark def forBasedLoop: Long = { var sum: Long = 0 for (i <- 0 to sz) { for (j <- 0 to sz) { sum += array(i)(j) } } sum }
  • 108. while @Benchmark def whileLoopA: Long = { var sum: Long = 0 var i = 0 while (i < sz) { var j = 0 while (j < sz) { sum += array(i)(j) j = j + 1 } i = i + 1 } sum }
  • 109. while reversed (and transposed) @Benchmark def whileRevLoop: Long = { var sum: Long = 0 var j = sz while (j >= 0) { var i = sz while (i >= 0) { sum += array(i)(j) i = i - 1 } j = j - 1 } sum }
  • 110. PERFORMANCE COUNTERSPERFORMANCE COUNTERS jmh:run -prof perf -w 5s -r 5s -wi 3 -i 3 -f 3 .*ArrayLooping.*
  • 111.
  • 112.
  • 113. [info] 14,650.91 msec task-clock # 0.482 CPUs utilized [info] 525 context-switches # 0.036 K/sec [info] 65 cpu-migrations # 0.004 K/sec [info] 185 page-faults # 0.013 K/sec [info] 30,370,384,927 cycles # 2.073 GHz ( [info] 285,624,461 stalled-cycles-frontend # 0.94% frontend cycles idle ( [info] 19,572,932,807 stalled-cycles-backend # 64.45% backend cycles idle ( [info] 35,907,154,673 instructions # 1.18 insn per cycle [info] # 0.55 stalled cycles per insn ( [info] 6,706,376,215 branches # 457.745 M/sec ( [info] 12,810,882 branch-misses # 0.19% of all branches ( [info] 16,163,472,375 L1-dcache-loads # 1103.240 M/sec ( [info] 10,975,494,751 L1-dcache-load-misses # 67.90% of all L1-dcache hits ( [info] <not supported> LLC-loads [info] <not supported> LLC-load-misses [info] 175,788,792 L1-icache-loads # 11.998 M/sec ( [info] 7,051,118 L1-icache-load-misses # 4.01% of all L1-icache hits ( [info] 5,302,762,430 dTLB-loads # 361.941 M/sec ( [info] 14,779,238 dTLB-load-misses # 0.28% of all dTLB cache hits ( [info] 245,713 iTLB-loads # 0.017 M/sec ( [info] 1,195,242 iTLB-load-misses # 486.44% of all iTLB cache hits ( [info] 289,031,594 L1-dcache-prefetches # 19.728 M/sec (
  • 114. while T+R [info] 14,650.91 msec task-clock # 0.482 CPUs utilized [info] 525 context-switches # 0.036 K/sec [info] 65 cpu-migrations # 0.004 K/sec [info] 185 page-faults # 0.013 K/sec [info] 30,370,384,927 cycles # 2.073 GHz ( [info] 285,624,461 stalled-cycles-frontend # 0.94% frontend cycles idle ( [info] 19,572,932,807 stalled-cycles-backend # 64.45% backend cycles idle ( [info] 35,907,154,673 instructions # 1.18 insn per cycle [info] # 0.55 stalled cycles per insn ( [info] 6,706,376,215 branches # 457.745 M/sec ( [info] 12,810,882 branch-misses # 0.19% of all branches ( [info] 16,163,472,375 L1-dcache-loads # 1103.240 M/sec ( [info] 10,975,494,751 L1-dcache-load-misses # 67.90% of all L1-dcache hits ( [info] <not supported> LLC-loads [info] <not supported> LLC-load-misses [info] 175,788,792 L1-icache-loads # 11.998 M/sec ( [info] 7,051,118 L1-icache-load-misses # 4.01% of all L1-icache hits ( [info] 5,302,762,430 dTLB-loads # 361.941 M/sec ( [info] 14,779,238 dTLB-load-misses # 0.28% of all dTLB cache hits ( [info] 245,713 iTLB-loads # 0.017 M/sec ( [info] 1,195,242 iTLB-load-misses # 486.44% of all iTLB cache hits ( [info] 289,031,594 L1-dcache-prefetches # 19.728 M/sec (
  • 115. while T+R L1 [info] 14,650.91 msec task-clock # 0.482 CPUs utilized [info] 525 context-switches # 0.036 K/sec [info] 65 cpu-migrations # 0.004 K/sec [info] 185 page-faults # 0.013 K/sec [info] 30,370,384,927 cycles # 2.073 GHz ( [info] 285,624,461 stalled-cycles-frontend # 0.94% frontend cycles idle ( [info] 19,572,932,807 stalled-cycles-backend # 64.45% backend cycles idle ( [info] 35,907,154,673 instructions # 1.18 insn per cycle [info] # 0.55 stalled cycles per insn ( [info] 6,706,376,215 branches # 457.745 M/sec ( [info] 12,810,882 branch-misses # 0.19% of all branches ( [info] 16,163,472,375 L1-dcache-loads # 1103.240 M/sec ( [info] 10,975,494,751 L1-dcache-load-misses # 67.90% of all L1-dcache hits ( [info] <not supported> LLC-loads [info] <not supported> LLC-load-misses [info] 175,788,792 L1-icache-loads # 11.998 M/sec ( [info] 7,051,118 L1-icache-load-misses # 4.01% of all L1-icache hits ( [info] 5,302,762,430 dTLB-loads # 361.941 M/sec ( [info] 14,779,238 dTLB-load-misses # 0.28% of all dTLB cache hits ( [info] 245,713 iTLB-loads # 0.017 M/sec ( [info] 1,195,242 iTLB-load-misses # 486.44% of all iTLB cache hits ( [info] 289,031,594 L1-dcache-prefetches # 19.728 M/sec (
  • 116. [info] 14,601.12 msec task-clock # 0.481 CPUs utilized [info] 551 context-switches # 0.038 K/sec [info] 79 cpu-migrations # 0.005 K/sec [info] 142 page-faults # 0.010 K/sec [info] 56,887,530,219 cycles # 3.896 GHz ( [info] 371,071,536 stalled-cycles-frontend # 0.65% frontend cycles idle ( [info] 39,685,104,564 stalled-cycles-backend # 69.76% backend cycles idle ( [info] 85,472,875,315 instructions # 1.50 insn per cycle [info] # 0.46 stalled cycles per insn ( [info] 7,673,425,909 branches # 525.537 M/sec ( [info] 112,782,902 branch-misses # 1.47% of all branches ( [info] 56,302,951,054 L1-dcache-loads # 3856.070 M/sec ( [info] 7,136,733,184 L1-dcache-load-misses # 12.68% of all L1-dcache hits ( [info] <not supported> LLC-loads [info] <not supported> LLC-load-misses [info] 355,107,740 L1-icache-loads # 24.321 M/sec ( [info] 10,933,836 L1-icache-load-misses # 3.08% of all L1-icache hits ( [info] 113,201,590 dTLB-loads # 7.753 M/sec ( [info] 8,907,743 dTLB-load-misses # 7.87% of all dTLB cache hits ( [info] 46,228 iTLB-loads # 0.003 M/sec ( [info] 939,260 iTLB-load-misses # 2031.80% of all iTLB cache hits ( [info] 7,001,888,370 L1-dcache-prefetches # 479.545 M/sec (
  • 117. normal while [info] 14,601.12 msec task-clock # 0.481 CPUs utilized [info] 551 context-switches # 0.038 K/sec [info] 79 cpu-migrations # 0.005 K/sec [info] 142 page-faults # 0.010 K/sec [info] 56,887,530,219 cycles # 3.896 GHz ( [info] 371,071,536 stalled-cycles-frontend # 0.65% frontend cycles idle ( [info] 39,685,104,564 stalled-cycles-backend # 69.76% backend cycles idle ( [info] 85,472,875,315 instructions # 1.50 insn per cycle [info] # 0.46 stalled cycles per insn ( [info] 7,673,425,909 branches # 525.537 M/sec ( [info] 112,782,902 branch-misses # 1.47% of all branches ( [info] 56,302,951,054 L1-dcache-loads # 3856.070 M/sec ( [info] 7,136,733,184 L1-dcache-load-misses # 12.68% of all L1-dcache hits ( [info] <not supported> LLC-loads [info] <not supported> LLC-load-misses [info] 355,107,740 L1-icache-loads # 24.321 M/sec ( [info] 10,933,836 L1-icache-load-misses # 3.08% of all L1-icache hits ( [info] 113,201,590 dTLB-loads # 7.753 M/sec ( [info] 8,907,743 dTLB-load-misses # 7.87% of all dTLB cache hits ( [info] 46,228 iTLB-loads # 0.003 M/sec ( [info] 939,260 iTLB-load-misses # 2031.80% of all iTLB cache hits ( [info] 7,001,888,370 L1-dcache-prefetches # 479.545 M/sec (
  • 118. normal while L1 [info] 14,601.12 msec task-clock # 0.481 CPUs utilized [info] 551 context-switches # 0.038 K/sec [info] 79 cpu-migrations # 0.005 K/sec [info] 142 page-faults # 0.010 K/sec [info] 56,887,530,219 cycles # 3.896 GHz ( [info] 371,071,536 stalled-cycles-frontend # 0.65% frontend cycles idle ( [info] 39,685,104,564 stalled-cycles-backend # 69.76% backend cycles idle ( [info] 85,472,875,315 instructions # 1.50 insn per cycle [info] # 0.46 stalled cycles per insn ( [info] 7,673,425,909 branches # 525.537 M/sec ( [info] 112,782,902 branch-misses # 1.47% of all branches ( [info] 56,302,951,054 L1-dcache-loads # 3856.070 M/sec ( [info] 7,136,733,184 L1-dcache-load-misses # 12.68% of all L1-dcache hits ( [info] <not supported> LLC-loads [info] <not supported> LLC-load-misses [info] 355,107,740 L1-icache-loads # 24.321 M/sec ( [info] 10,933,836 L1-icache-load-misses # 3.08% of all L1-icache hits ( [info] 113,201,590 dTLB-loads # 7.753 M/sec ( [info] 8,907,743 dTLB-load-misses # 7.87% of all dTLB cache hits ( [info] 46,228 iTLB-loads # 0.003 M/sec ( [info] 939,260 iTLB-load-misses # 2031.80% of all iTLB cache hits ( [info] 7,001,888,370 L1-dcache-prefetches # 479.545 M/sec (
  • 119. NOT COVEREDNOT COVERED Garbage collectors Ahead of time compilation JVM options
  • 120. BENCHMARKSBENCHMARKS All benchmarks above were performed unprofessionaly All benchmarks lie
  • 121. ON A GOOD BENCHMARKSON A GOOD BENCHMARKS Benchmark might be ok: you test it for yourself you read about the tools tested you can �nd what causes di�erences
  • 122. HINTSHINTS Tools will not magically solve your problems Tools let you feel more comfortable Do not expect PrintAssembly or Performance counters will help you in typical problems
  • 123. REAL PROBLEMS I HAD ON PRODUCTIONREAL PROBLEMS I HAD ON PRODUCTION hidden read of whole database in a dirty piece of code static synchronized hashCode bugs jvm bugs (metaspace) Lot of them could be spotted by using jstack