Mais conteúdo relacionado Semelhante a Why User-Mode Threads Are Good for Performance (20) Why User-Mode Threads Are Good for Performance1. Brought to you by
Why User-Mode Threads
Are Good for Performance
Ron Pressler
Architect, Java Platform Group, Oracle
2. 2
Why?
• Why do anything?
• Why do this rather than that?
2 Copyright © 2022, Oracle and/or its affiliates
4. 4
Concurrency and Parallelism
• Parallelism:
• Speed up a task by splitting it into cooperating subtasks
scheduled onto multiple available computing resources.
• Performance measure: latency (time duration)
• Concurrency:
• Schedule available computing resources to multiple largely
independent tasks that compete over them.
• Performance measure: throughput (task/time unit)
4 Copyright © 2022, Oracle and/or its affiliates
7. 7
Little’s Law
7 Copyright © 2022, Oracle and/or its affiliates
In any stable system
with long term averages:
W
λ
λ
}L
λ — arrival rate = exit rate
= throughput
W — duration inside
L — no. items inside
8. 8
Little’s Law
8 Copyright © 2022, Oracle and/or its affiliates
In any stable system
with long term averages:
L = λW
W
λ
λ
}L
λ — arrival rate = exit rate
= throughput
W — duration inside
L — no. items inside
9. 9
Little’s Law
9 Copyright © 2022, Oracle and/or its affiliates
W
L̄
¯
λ
}
L̄ = ¯
λW
L̄
— Capacity
¯
λ
(long-term average)
(long-term average)
10. 10
Little’s Law
10 Copyright © 2022, Oracle and/or its affiliates
W
L̄
}
L̄ = ¯
λW
L̄
— Capacity
¯
λ
> ¯
λ (long-term average)
(momentary)
11. 11
Little’s Law
11 Copyright © 2022, Oracle and/or its affiliates
W
L̄
}
L̄ = ¯
λW
L̄
— Capacity
¯
λ
< ¯
λ (long-term average)
(momentary)
12. 12
Little’s Law — subsystems
12 Copyright © 2022, Oracle and/or its affiliates
L = λW
S
S′

L′

= λ′

W′

13. 13
Little’s Law — subsystems and queues
13 Copyright © 2022, Oracle and/or its affiliates
— Average queue length
}
n
L = λW
n
S
S′

L′

= λ′

W′

14. 14
Little’s Law — subsystems and queues
14 Copyright © 2022, Oracle and/or its affiliates
— Average queue length
}
n
L = λW
n
S
S′

L′

= λ′

W′

λ′

= λ
L′

= L + n
W′

= W + n
W
L
15. 15
Little’s Law — subsystems and queues
15 Copyright © 2022, Oracle and/or its affiliates
— Average queue length
}
n
L = λW
n
S
S′

L′

= λ′

W′

λ′

= λ
L′

= L + n
W′

= W + n
W
L
L′

= λ′

W′

L + n = λ(W + n
W
L
)
L(1 +
n
L
) = λW(1 +
n
L
)
L = λW
16. 16
Little’s Law — other applications
16 Copyright © 2022, Oracle and/or its affiliates
— Interference coefficient
L1 = pλW1
x
L = L1 + L2
L = λ(1 + xL)W
L =
λW
1 − xλW
L2 = (1 − p)λW2
= λ(pW1 + (1 − p)W2)
P(cache hit) = p
P(cache miss) = 1 − p
Caching Interference
= λW + (xλW)2
+ (xλW)3
+ . . .
W
17. 17
Capacity and throughput
17 Copyright © 2022, Oracle and/or its affiliates
¯
λ =
L̄
W
L̄
¯
λ
— Capacity
— Maximum throughput
W — Average latency
18. 18
CPU capacity
18 Copyright © 2022, Oracle and/or its affiliates
¯
λ =
L̄CPU
WCPU
L̄CPU
¯
λ
— #cores
— Maximum throughput
WCPU
— Average CPU consumption
WCPU = 100μs = 1 × 10−4
s (> 1500 cache misses)
L̄ = 30 cores
¯
λ = 30,000 req/s
19. 19
Capacity and throughput
19 Copyright © 2022, Oracle and/or its affiliates
¯
λ =
L̄
W
L̄
¯
λ
— Capacity
— Maximum throughput
W — Average latency
λ/3
λ/3
λ/3
Load
balancer
20. 20
20 Copyright © 2022, Oracle and/or its affiliates
Thread-per-Request — Thread Capacity
Thread-per-Request: A request consumes a thread
(either new or borrowed from a pool) for its duration, W
Request: The domain’s unit of concurrency
Thread: The software unit of concurrency
21. 21
21 Copyright © 2022, Oracle and/or its affiliates
Thread-per-Request — Thread Capacity
#threads = L = λW
Thread-per-Request: A request consumes a thread
(either new or borrowed from a pool) for its duration, W
Request: The domain’s unit of concurrency
Thread: The software unit of concurrency
22. 22
22 Copyright © 2022, Oracle and/or its affiliates
Thread-per-Request — Thread Capacity
with parallel fanout
⇒ For the purpose of estimating #threads, we can consider W to be the sum of all fanout latencies,
even if they’re done in parallel.
#threads = cL = λ(
W
c
) = λW
c — average fanout, i.e. average number of threads consumed by a request
#threads = L = λW
Thread-per-Request: A request consumes a thread
(either new or borrowed from a pool) for its duration, W
Request: The domain’s unit of concurrency
Thread: The software unit of concurrency
23. 23
23 Copyright © 2022, Oracle and/or its affiliates
CPU vs. Thread Capacity with thread/req
WCPU = 100μs
(> 1500 cache misses)
⇒
L̄CPU = #cores
L̄ = ¯
λW
W = 10ms
= (
L̄CPU
WCPU
)W
= L̄CPU(
W
WCPU
)
100 threads per core
WCPU = 50μs
(> 800 cache misses)
⇒
W = 50ms
1000 threads per core
WCPU = 10μs
(> 150 cache misses)
⇒
W = 100ms
10,000 threads per core
24. 24
• Exceptions
• Thread Locals
• Debugger
• Profiler (JFR)
Java Is Made of Threads
Copyright © 2022, Oracle and/or its affiliates
28. 28
28 Copyright © 2022, Oracle and/or its affiliates
The Impact of Context Switching
(
λ1
λ2
=
W2
W1
)
W = n(μ + t)
n(μ + t)
nμ
= 1 +
t
μ
• Context-switching affects throughput by means of duration, not capacity
• Virtual threads have a faster context-switch than OS threads
• Structured concurrency allows waiting for a set of operations with one context-switch
where
n
t
μ
avg. #operations
avg. wait duration
avg. context-switch duration
μ = 20μs
t = 1μs
⇒ 5% impact
(quite fast)
(quite slow)
⇒
29. 29
Not cooperative, but no time-sharing yet
29 Copyright © 2022, Oracle and/or its affiliates
• Non-cooperative scheduling is more composable
• … but people overestimate the importance of time-sharing in servers
• Effective time-sharing effective when #threads is #cores×105
requires study
30. 30
Summary
30 Copyright © 2022, Oracle and/or its affiliates
• Virtual threads allow higher throughput for the
thread-per-request style — the style harmonious
with the platform — by drastically increasing the
request capacity of the server.
• We can juggle more balls not by adding hands
but by enlarging the arch.
• Context-switching cost could be important, but
aren’t the main reason for the throughput
increase.
31. Brought to you by
Ron Pressler
ron.pressler@oracle.com
@pressron
• JEP 425: Virtual Threads (Preview)
• The Design of User Mode Threads in Java [video] (why not async/await)