COSMIC is middleware that allows for safe sharing of Xeon Phi coprocessors between jobs, improving utilization compared to the conservative exclusive allocation policy used in most clusters. It transparently schedules jobs to maximize sharing while preventing oversubscription of hardware threads and memory. This reduces wait times, improves throughput, and allows clusters to support more work using the same number of Xeon Phi devices. COSMIC is ready for beta customers and provides better performance and efficiency for clusters using Xeon Phi coprocessors.
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
COSMIC: Middleware for Xeon Phi Servers and Clusters
1. COSMIC: Middleware for Xeon Phi™
Servers and Clusters
Pre-commercialization, name subject to change
S Cadambi, G Coviello, C Li, K Rao, M Sankaradas, S Chakradhar
Computing Systems Architecture
NEC Laboratories America
Princeton, NJ
January 2014
www.nec-labs.com
2. The Xeon Phi™ Coprocessor
(“MIC”)
• Launched by Intel at ISC 2012
• x86-based coprocessor with 60+ cores
HOST
Multicore
PCIe
60+ cores, 240+ threads
512b vector units
8+GB memory
(7120P)
• Supports OpenMP
• Runs Linux: allows multi-processing, memory
management…
• Good for scientific applications
2
3. Xeon Phi™ Servers and Clusters
• Fast ramp-up: Many
hardware vendors
• Many clusters already
commissioned
NEC also offers a Xeon Phi™ server
Express5800/HR120b-1
• Some very high
performance ones too!
– Top500 #1: Tianhe-2
– Top500 #7: Stampede
1U form factor with 2 Xeon Phi™ coprocessors
3
4. Managing Xeon Phi™ Clusters
• Most clusters follow an “exclusive allocation”
policy for the Xeon Phi™
– 1 Phi dedicated to one unique user until job completes
BOB
Needs 1
Xeon Phi™
Has to wait for Phi to become available
AMY
CHARLIE
Needs 1
Xeon Phi™
ACTIVE
USERS
Needs 3
Xeon Phi’s
4 node cluster
HOST
XEON PHI™
60 cores, 8GB
HOST
XEON PHI™
60 cores, 8GB
HOST
XEON PHI™
60 cores, 8GB
HOST
XEON PHI™
60 cores, 8GB
6. What is
Resource Oversubscription?
• Say Amy and Bob each want to run a program that
uses a single Xeon Phi intermittently (coprocessor
offload model)
• Do they each need a device, or can they share?
AMY’S PROGRAM
Begin
BOB’S PROGRAM
Begin
Xeon Phi™
Host
Host
Xeon Phi™
Xeon Phi™
SHARE?
Host
End
HOST
PROCESSOR
XEON PHI™
COPROCESSOR
End
6
7. What is
Resource Oversubscription?
• First problem of sharing Phi the programs
together oversubscribe hardware threads
• This can cause 2-3x slowdown!
AMY’S PROGRAM
Begin
BOB’S PROGRAM
Begin
Xeon Phi™
Host
Host
Xeon Phi™
Xeon Phi™
SHARE?
Host
End
HOST
PROCESSOR
XEON PHI™
COPROCESSOR
End
7
8. What is
Resource Oversubscription?
• Second problem of sharing Phi the programs
can oversubscribe physical device memory
• This causes random crashes
AMY’S PROGRAM
Begin
BOB’S PROGRAM
Begin
Xeon Phi™
Host
Host
Xeon Phi™
Xeon Phi™
SHARE?
Host
End
HOST
PROCESSOR
XEON PHI™
COPROCESSOR
End
8
9. Why the Conservative Policy?
•
•
•
•
Avoids resource oversubscription
Safe no crashes
Easier management
BUT…
9
10. Downsides of
Conservative Policy
Poorly utilized Xeon Phi™ coprocessors
Dynamic utilization. Averages around 40%!
Only 40% of
cores are doing
useful work on
average due to
intermittent use,
conservative
scheduling
policy, …
10
12. Downsides of
Conservative Policy
• Long wait times if all Xeon Phi’s are “busy”
– Annoyed users: have to wait even if their jobs are short
– Cannot pre-empt running jobs
– Even though Phi’s may be underutilized or intermittently
used, they must wait
RUNNING PROGRAMS HAVE OCCUPIED ALL
XEON PHI’S IN CLUSTER
XEON PHI™ CLUSTER
12
13. COSMIC
• Middleware that
allows safe Xeon
Phi™ sharing
• Transparently
discovers resource
requirements and
schedules jobs to
maximally share
Xeon Phi’s
APPLICATIONS
U
S
E
R
K
E
R
N
E
L
COSMIC (invisible to
apps, kernel)
LINUX
MPSS :
MODIFIED
LINUX +
DRIVERS +
HOST
PROCESSOR
XEON PHI™
COPROCESSOR
13
14. COSMIC lets users share
the Phi
AMY’S PROGRAM
Begin
Xeon Phi™
Host
BOB’S PROGRAM
Begin
Instead of making them wait for
each other, COSMIC co-runs them
by interspersing host and Phi
portions
Xeon Phi™
Host
Xeon Phi™
Host
Xeon Phi™
Host
Host
Xeon Phi™
Host
Xeon Phi™
End
Device sharing:
users don’t
wait, better
utilization
End
14
15. COSMIC also resolves
conflicting user directives
WITHOUT COSMIC
User 1’s Xeon Phi™ portion User-specified core
User 2’s Xeon Phi™ portion
affinity may conflict
during sharing
Xeon
Phi
cores
WITH COSMIC
COSMIC
transparently
resolves conflicts
and
Xeon “spreads”
Phi
load across cores
cores
15
16. Utilization: 1-device server
Average Utilization (%)
100
WITH COSMIC
(BLACK)
AVERAGE
UTILIZATION 70.6%
90
80
70
60
50
40
30
20
10
0
Time
WITHOUT COSMIC
(BLUE)
AVERAGE
UTILIZATION
41.7%
16
17. Performance: 2-device server
64 jobs, randomly arriving
Average Latency (s)
Makespan (s)
Average Core
Utilization
Without
COSMIC
With
COSMIC
Without
COSMIC
With
COSMIC
Without
COSMIC
With
COSMIC
1099
119
3144
1238
19.9%
56.9%
Major improvements through device sharing, load balancing
17
19. Easy to Use on Clusters
• Easy to interface with third party software
• Optional COSMIC cluster component for even
better utilization
• Up to 50% footprint reduction by Phi sharing!
COSMIC
CLUSTER
COMPONENT
COSMIC
HOST
XEON PHI™
60 cores, 8GB
THIRD PARTY CLUSTER
MANAGEMENT
SOFTWARE
COSMIC
HOST
XEON PHI™
60 cores, 8GB
COSMIC
HOST
XEON PHI™
60 cores, 8GB
COSMIC
HOST
XEON PHI™
60 cores, 8GB
19
20. COSMIC Summary
• We are ready to engage with beta customers
• Do you manage Xeon Phi™ servers or clusters?
• Do you use off-the-shelf cluster management
software with exclusive allocation policies?
• If so, you likely will benefit from COSMIC
–
–
–
–
Improves Xeon Phi™ utilization by sharing
Transparent to users
Transparent to underlying system software
Easy to add-on to third-party cluster tools
20
21. How to Get More Info
• Contact us:
– NEC Japan: Y Hirotani, y-hirotani@aj.jp.nec.com
– NEC Labs America: S Cadambi, cadambi@nec-labs.com
• We make onsite presentations / demos
• If interested in evaluating COSMIC, just ask us
• See our demo online:
http://www.nec-labs.com/research/system/systems_arch-website/cosmic.php
21