SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
BL .VL BRN R S Q E5 S Q31 9 VWS
S Q E5 6RWQ GVMRQ r
1 AWPPMV . u
H YI I I IP
© 2019 IBM Corporation
2
. L
U S D3 4T SJGXOTS roep
U S D3 @ RROX fx
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
• U S @/ 3 U VOR SXW
U S1/ roep
• @ / roep
• D3 , 4OSGQ /JJOXOTS
• U S1/
© 2019 IBM Corporation
3
VMTA G7B 8U TLIYPUT
OpenPOWER Foundation Mission
“Through the growing open ecosystem of the POWER Architecture and its associated technologies, the
OpenPOWER Foundation facilitates its Members to share expertise, investment and intellectual property
to serve the evolving needs of all end users.”
Artificial Intelligence
Custom Hyperscale
Data Centers
Hybrid Cloud
Open Solutions
IT consumption models
are expanding
Price/Performance
Full system stack innovation required
Moore’s Law
Technology and
Processors
2000 2020
Firmware / OS
Accelerators
Software
Storage
Network
Full Stack
Acceleration
(Lower is
better)
IT innovation can no longer come
from just the processor
OpenPOWER	Members	includes
OpenPOWER Members include
Software
Implementation / HPC / Research
Chip / SOC
I/O / Storage / Acceleration
Boards / Systems
System / Integration
350+
Members
35
Countries
80+
ISVs
A Revolution Looks Like © 2018 OpenPOWER Foundation
OpenPOWER Ecosystem - breadth of solutions
Sub USD$1,500 developer systems to TCO competitive server solutions right up to world’s fastest
supercomputers. All part of the OpenPOWER Ecosystem, all run open software stacks firmware to apps.
May 20th Japan Mini Summit
OpenPOWER	Coming	Events
OpenPOWER Announcements at North American Summit - II
● IBM releases proof of concept POWER ISA Compliant FPGA Soft Core
○ Allows anyone to experiment with the POWER ISA - researchers to hobbyists, chip
manufacturers to hardware accelerator vendors
○ Micropython port also announced
○ Within just two weeks already additional FPGAs supported
○ Zephyr IoT kernel support under development
○ Linux Kernel support expected by year end
Refer to :
https://openpowerfoundation.org/the-next-step-in-the-openpower-foundation-journey/
POWER Instruction Set Architecture (ISA)
© 2019 IBM Corporation
11
VMTA G7B C SSPY it
© 2019 IBM Corporation
12
. L
U S D3 4T SJGXOTS roep
U S D3 @ RROX fx
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
• U S @/ 3 U VOR SXW
U S1/ roep
• @ / roep
• D3 , 4OSGQ /JJOXOTS
• U S1/
© 2019 IBM Corporation
13
1F 218 4 CC I
U S D3 @ RROX t c d
• B@ /VINOX IX V 3B 0 WOS WW 1NOSG 3SMOS VOSM s) g y
• t 8GUGS w
s U S D3 @ RROX B@ t 9OS 4T SJGXOTS
• % , ( .@GS 2O MT
e
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
− NXXUW-%%WXGXOI WIN J ITR%NTWX JFLOQ W%TU SUT VSG ,%)J% TWXMV @ 9 ( TS ( 1/ V M ( ( @ ( ( , UJL
• U S @/ 3 U VOR SXW
− NXXUW-%%WXGXOI WIN J ITR%NTWX JFLOQ W%TU SUT VSG ,% %0QGSINGVJF ( UJL
© 2019 IBM Corporation
14
© 2018 IBM Corporation | IBM Confidential 3
PostgreSQL Accelerated with Regular Expression Matching
§ Two ways for user interface:
§ UDF (User Defined Function)
§ PostgreSQL Hooks/Plugins (Standard SQL)
SELECT psql_regex_capi(table, pattern, attr_id);
SELECT * FROM table WHERE pkt ~ pattern;
2. 1F 2. 9I 2EHI /
© 2019 IBM Corporation
15
2. 1F 2. 9I 2EHI /
© 2018 IBM Corporation | IBM Confidential 4
Overall Architecture: Multi-Threading and Multi-Engine
Host Memory User Space)
Buffer Cache
Packet Buffer 0
Packet
…
Packet
FPGA
Result 0
Result 1
…
Result N
…
PostgreSQL Query
Results
Query results to
DB clients
A
X
I
I
n
t
e
r
c
o
n
n
e
c
t
Job Manager
Job Queue
Packet Buffer
Pointer 0
…
Packet Buffer
Pointer N
Result Buffer
Packet transfer via
CAPI/OpenCAPI
with Virtual
Address Pointers
Tuples
Tuples
Page
Thread N
memcpy
memcpy
……
Local Configuration Bus
Storage
Pages
Pages
…
Pages
Pages
Relations
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
Tuples
Tuples
Page
Thread 0
Pages
Pages
…
Pages
Pages
Relation 0
Relation N
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
Pages
Pages
…
Pages
Pages
M RegEx 0
M RegEx 1
M …
M RegEx N
General
Query
M
M Others
M
Packet Buffer N
Packet
…
Packet
FPGA Modules under construction
FPGA Engines / User space buffers for CAPI
PostgreSQL internal data structures
AXI/PSL
Bridge or
AXI/TLX
Bridge
© 2019 IBM Corporation
16
© 2018 IBM Corporation | IBM Confidential 9
Performance Evaluation: Environment Setups
Host Server Romulus, 2-socket, POWER 9 22 cores, 512GB memory, normal SATA hard disk
FPGA Card Xilinx VU9P
CAPI CAPI 2.0 + SNAP 2.0
CAPI-Regex 8 16X1 engines, 16 packet pipelines per engine; 2 64X1 engines, 64 pipelines per engine. All running @225Mhz
PostgreSQL
Version 11.2, complied from source
shared_buffers: the amount of memory the
database server uses for shared memory buffers
4GB, 1GB
max_worker_processes: the maximum number
of background processes that the system can
support.
176 (the max value supported by Romulus)
max_parallel_workers: the maximum
number of workers that the system can support
for parallel queries
176 (the max value supported by Romulus)
Queries CAPI-Regex in UDF mode SELECT * FROM table WHERE pkt ~ pattern
Test Table
Table Type Synthetic Tables
Table Schema
Each row contains 2 columns (ID, packet); packet is 1024-byte
random string
Table Size Number of rows varies between tables
2. 1F 2. 9I 2EHI /
© 2019 IBM Corporation
17
© 2018 IBM Corporation | IBM Confidential 10
Performance Comparison with CPU
ü CPU version rans as it is; CAPI version runs with 8 threads on 8 16X1 regex engines with the optimal number of jobs per thread
ü CAPI-regex can be ~x5 to ~x10 faster than the best PostgreSQL built-in functions (CPU multi-threads enabled)
ü Max 4 threads are enabled for CPU multi-threading when table size larger than 128000
ü Buffer cache size can impact CPU version but not too much on CAPI version
512k 256k 128k 64k 32k 16k 8k 4k
regex_capi BC 1GB 0.23 0.25 0.06 0.06 0.07 0.07 0.09 0.15
regex_capi BC 4GB 0.20 0.25 0.06 0.05 0.06 0.07 0.09 0.14
CPU BC 1GB 1.03 1.02 1.00 0.92 1.00 1.05 1.08 1.07
CPU BC 4GB 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
0.00
0.20
0.40
0.60
0.80
1.00
1.20
RelativeQueryTime
Table Size (Number of 1024-Byte Lines)
Query Time Comparison Between CAPI-regex and CPU
regex_capi BC 1GB regex_capi BC 4GB CPU BC 1GB CPU BC 4GB
2. 1F 2. 9I 2EHI /
© 2019 IBM Corporation
18
1F . LF C IH
:OIVT GXX
• AOS D3 ITV
• U S D3 @/ WIGQGV LO J UTOSX W HW X
• DVOXX S OS C 29 (
− MNJQ U S @T VI WOR QGXOTS XTTQW
− EOQOS CO GJT LTV 4 / W SXN WOW
• NXXUW-%%MOXN H ITR%GSXTSHQGSINGVJ%ROIVT GXX
© 2019 IBM Corporation
19
1F . LF C IH
:OIVTU XNTS
• @RGQQ RH JJ J XNTS OSX VUV X V
− NXXUW-%%ROIVTU XNTS TVM%
• :OING Q QOSM UTVX J OX XT XN ROIVT GXX ITV
− T RTJOLOIGXOTSW XT M S VOI ITJ RTWXQ UQGXLTVR WU IOLOI ITJ
GUUQOIGXOTS WXGVX U ITSWTQ XI
© 2019 IBM Corporation
20
53A VMT53A
© 2019 IBM Corporation
21
. L
U S D3 4T SJGXOTS roep
U S D3 @ RROX fx
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
• U S @/ 3 U VOR SXW
U S1/ roep
• @ / roep
• D3 , 4OSGQ /JJOXOTS
• U S1/
© 2019 IBM Corporation
22
11SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
Memory Subsystem
Virt Addr
CPU
Core
CPU
Core
CPU
Core
CPU
Core
CPU
Core
CPU
Core
App
External
Device
I/F
VariablesInput
Data
DD
Device Driver
Storage Area
Variables
Input
Data
Variables
Input
Data
Output
Data
Output
Data
• An application calls a device driver to utilize an Accelerator or any device outside the
chip
• The device driver performed a memory mapping operation.
3 versions of the data (not coherent).
1000s of instructions in the device driver.
An application without CAPI
© 2019 IBM Corporation
23
12SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
Memory Subsystem
Virt Addr
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
App
FPGA
PCIE
PSL
Variables
Input
Data
Output
Data
1 coherent version of the data.
No device driver call/instructions.
• CPU unloaded since no device driver and accelerator doing the application work
• The FPGA shares memory with the cores
An application with CAPI
© 2019 IBM Corporation
24
13SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
Effect of CAPI hardware vs. PCI-E Device Driver
Typical I/O Model Flow:
Flow with a CAPI Model:
Shared Mem.
Notify Accelerator
Acceleration
Shared Memory
Completion
DD Call
Copy or Pin
Source Data
MMIO Notify
Accelerator
Acceleration
Poll / Interrupt
Completion
Copy or Unpin
Result Data
Ret. From DD
Completion
Application
Dependent, but
Equal to below
Application
Dependent, but
Equal to above
300 Instructions 10,000 Instructions 3,000 Instructions
1,000 Instructions
1,000 Instructions
7.9µs 4.9µs
Total ~13µs for data prep
400 Instructions 100 Instructions
0.3µs 0.06µs
Total 0.36µs
© 2019 IBM Corporation
25
18SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
FPGA
POWER8
Core
Recall CAPI technology connections
Proprietary hardware and designs to enable coherent
acceleration
Operating system enablement
• little endian linux
• kernel driver (cxl)
• user library (libcxl)
Customer application and accelerator
POWER8 Processor
OS
App
Memory (Coherent)
AFU
PSL
PCIe
CAPP
cxl
libcxl
▪ PSLSE models the red outlined area
▪ Re-implements libcxl api calls
▪ Models memory access
▪ Provides hardware ports to afu
▪ Enables co-simulation of AFU and App
▪ Publicly available on github
© 2019 IBM Corporation
26
19SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
FPGA
POWER9
Core
OpenCAPI technology connections
Proprietary hardware and reference designs to
enable coherent acceleration
Operating system enablement
• little endian linux
• reference kernel driver (ocxl)
• reference user library (libocxl)
Customer application and accelerator
POWER9 Processor
OS
App
Memory (Coherent)
AFU
TLx
DLx
25G phy
25G phy
DL
TL
NPU
(w/CAPP
fcn)
PSL
ocxl
libocxl
▪ OCSE models the red outlined area
▪ OCSE enable AFU and App co-simulation IF
reference libocxl and reference TLx/DLx are
used.
▪ OCSE dependencies/assumptions
– Fixed reference TLx/AFU interface
– Fixed reference libocxl user API
▪ Will be available to consortium members
TERMS:
OpenCAPI Simulation Environment (OCSE)
OpenCAPI defines a Data Link Layer (DL) and Transaction Layer (TL)
cited by https://www.kernel.org/doc/html/latest/userspace-api/accelerators/ocxl.html
© 2019 IBM Corporation
27 17SNAP Framework built on Power™ CAPI technology2017, IBM Corporation
Feature CAPI 1.0 CAPI 2.0 OpenCAPI 3.0 OpenCAPI 4.0
Processor Generation POWER8 POWER9 POWER9 Future
CAPI Logic Placement FPGA/ASIC FPGA/ASIC NA
DL/TL on Host
DLx/TLx on endpoint
FPGA/ASIC
NA
DL/TL on Host
DLx/TLx on endpoint
FPGA/ASIC
Interface
Lanes per Instance
Lane bit rate
PCIe Gen3
x8/x16
8 Gb/s
PCIe Gen4
2 x (Dual x8)
16 Gb/s
Direct 25G
x8
25 Gb/s
Direct 25G+
x4, x8, x16, x32
25+ Gb/s
Address Translation on CPU No Yes Yes Yes
Native DMA from Endpoint
Accelerator
No Yes Yes Yes
Home Agent Memory on
OpenCAPI Endpoint with
Load/Store Access
No No Yes Yes
Native Atomic Ops to Host
Processor Memory from
Accelerator
No Yes Yes Yes
Accelerator -> HW Thread
Wake-up
No Yes Yes Yes
Low-latency small message push
128B Writes to Accelerator
MMIO 4/8B only MMIO 4/8B only MMIO 4/8B only Yes
Host Memory Caching Function
on Accelerator
Real Address Cache
in PSL
Real Address Cache
in PSL
No Effective Address Cache
in Accelerator
Remove PCIe layers to
reduce latency
significantlyComparison of IBM CAPI Implementations
© 2019 IBM Corporation
28
October 25th 2018 Power™ Coherent Acceleration Processor Interface (CAPI) 22
SNAP framework
Process C
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Process B
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Process A
Slave Context
libcxl
cxl
SNAP
library
Job
Queue
Application on Host Acceleration on FPGA
Software Program
PSL/AXI bridge
DRAM
on-card
Network
(TBD)
NVMeAXI
Host
DMA
Control
MMIO
Job
Manager
Job
Queue
Quick and easy developing
Use High Level Synthesis tool to convert C/C++ to RTL, or directly use RTL
Programming based on SNAP library and AXI interface
AXI is an industry standard for on-chip interconnection (https://www.arm.com/products/system-ip/amba-specifications)
C/C++
or RTL
Hardware Action
HDK:
CAPI
PSL
or
BSP
CAPI
© 2019 IBM Corporation
29
October 25th 2018 Power™ Coherent Acceleration Processor Interface (CAPI) 45
Scatter gather memory access
Results: (Power9 – CAPI2.0 – 2.154GHz, 512MB RAM) (FPGA card: FW609 + S241: VU9P Gen3x16) (SNAP)
- CAPI way saves the time for “SW gather” with relatively small penalty when K grows
N=1024 blocks
block size= 2kBytes
Traditional way
Time (µs)
CAPI way
Time (µs)
How scattered SW gather DMA Sum Verilog HLS
-RK1 309.3 183.5 492.8 171.65 173.3
-RK4 319.05 186.05 505.1 180.9 180.9
-RK16 305.1 185.7 490.8 184.6 186.95
-RK64 320.6 186.85 507.45 186.3 187.5
-RK256 318.3 185.65 503.95 218.55 215.35
-RK1024 333 189.15 522.15 236.85 224.95
-RK4096 324.4 189.35 513.75 241.15 225.55
-RK16384 307.4 185.75 493.15 240.9 224.9
0
100
200
300
400
500
600
-RK1 -RK4 -RK16 -RK64 -RK256 -RK1024 -RK4096 -RK16384
Verilog HLS Sum
Time:µs
More scatteredMore scattered
Contiguous
- Once tuned (using pragmas), HLS can compete with Verilog coding
190us to transfer 2MiB: speed = 11.04GB/s
1 2 2
R = random
K is the dispersion factor of the blocks
Allocate 2MB in a K * 2MB memory area
→ K=1 : all blocks contiguous
→ K=2: 2MB allocated amongst 4MB
→ K=4: 2MB allocated amongst 8MB
© 2019 IBM Corporation
30
c MYK%
© 2019 IBM Corporation
31
. L
U S D3 4T SJGXOTS roep
U S D3 @ RROX fx
• 1/ % U S1/ /II Q VGX J TWXMV @ 9
• U S @/ 3 U VOR SXW
U S1/ roep
• @ / roep
• D3 , 4OSGQ /JJOXOTS
• U S1/
© 2019 IBM Corporation
32
2 EFEH 218 4 2 E HHE E E M 9 . 1 4E9 C9F
© 2019 IBM Corporation 2
Proposed POWER Processor Technology and I/O Roadmap
POWER8 Architecture POWER9 Architecture
2014
POWER8
12 cores
22nm
New Micro-
Architecture
New Process
Technology
2016
POWER8
w/ NVLink
12 cores
22nm
Enhanced
Micro-
Architecture
With NVLink
2017
P9 SO
12/24 cores
14nm
New Micro-
Architecture
Direct attach
memory
New Process
Technology
2018
P9 SU
12/24 cores
14nm
Enhanced
Micro-
Architecture
Buffered
Memory
POWER7 Architecture
2010
POWER7
8 cores
45nm
New Micro-
Architecture
New Process
Technology
2012
POWER7+
8 cores
32nm
Enhanced
Micro-
Architecture
New Process
Technology
2021
P10
TBA cores
New Micro-
Architecture
New Process
Technology
POWER10
2020
P9 AIO
12/24 cores
14nm
Enhanced
Micro-
Architecture
New
Memory
Subsystem
Up To
150 GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI3.0,
NVLink
Sustained Memory Bandwidth
Standard I/O Interconnect
Advanced I/O Signaling
Advanced I/O Architecture
Up To
210 GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI3.0,
NVLink
Up To
650 GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI4.0,
NVLink
Up To
800 GB/s
PCIe Gen5
32 & 50 GT/s
TBA
Up To
210 GB/s
PCIe Gen3
N/A
CAPI 1.0
Up To
210 GB/s
PCIe Gen3
20 GT/s
160GB/s
CAPI 1.0 ,
NVLink
Up To
65 GB/s
PCIe Gen2
N/A
N/A
Up To
65 GB/s
PCIe Gen2
N/A
N/A
Statement of Direction, Subject to Change 2
Focus of today’s talk
Statement of Direction, Subject to Change
© 2019 IBM Corporation
33
9 I E IE I 218 4 2 E HHE 9C M
© 2019 IBM Corporation 6
Memory Signaling (8x8 OMI)
Memory Signaling (8x8 OMI)PowerAXON (x48)
PowerAXON (x48)
PCIeGen4Signaling(x48)
LocalSMPSignaling(3x30)
SMPandAcceleratorInterconnect
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
L3
L3
10 MB
L3 Region
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
Core Core
L2
L3
L3
Processor Chip Details
• 728 mm2
( 25.3 x 28.8 mm)
• 8 Billion Transistors
• Up to 24 SMT4 Cores
• Up to 120 MB eDRAM L3 cache
Semiconductor Technology
• 14nm finFET
• Improved device performance
• Reduced energy
• eDRAM
• 17 layer metal stack
High Bandwidth Signaling
• 25 GT/s low energy differential
• PowerAXON, OMI memory
• 16 GT/s low energy differential
• Local SMP
• 16 GT/s PCIe Gen4
Open Memory Interface (OMI)
• 16 channels x8 at 25 GT/s
• 650 GB/s peak 1:1 r/w bandwidth
• Technology Agnostic
• Offered w/ Microchip DDR4 buffer
(410 GB/s peak bandwidth)
PowerAXON 25 GT/s Attach
• Up to 16 socket glue-less SMP
(4x24 SMP added to 3x30 local)
• Up to x48 NVIDIA NVLINK GPU
attach
• Up to x48 OpenCAPI 4.0 coherent
accelerator / memory attach
Industry Standard I/O Attach
• x48 PCIe Gen 4 at 16 GT/s
• Up to x16 CAPI 2.0 coherent
accelerator / storage attach
Final Addition to the POWER9 Processor Family
2 TB/s Raw Signaling Bandwidth
Shared by 6 Attach Protocols
The Bandwidth Beast
Advanced I/O (AIO)
© 2019 IBM Corporation
34
218 4 MHI CH 0 CE M I 9I M
© 2019 IBM Corporation
Connect all memory technologies to Power systems through OpenCAPI and OMI
Why?
– It is a high speed interface that allows flexibility to attach any new and emerging
memory technology, including persistent memories like storage class memory (SCM)
CPU
P9
Switch
DDR DDR
NVME
OpenCAPI
P9 – OpenCAPI
CPU
P9’/P10
Switch
OMI OMI
NVME
OpenCAPI
P9’ Axone, P10 – OpenCAPI and OMI
POWER Systems Memory Strategy
© 2019 IBM Corporation
35
-M: 0 CE M :HMHI C -0
© 2019 IBM Corporation
• Hybrid Memory Subsystem using Low Latency NAND and DRAM
– Exclusive partnership for low latency NAND media, and with Bittware for
design of accelerator card 250-HMS
– Low Latency NAND for capacity and persistence, with DRAM used for
caching to lower average latency
• Capabilities
– SCM on OpenCAPI using Load/Store memory semantics
– Competitive latency and bandwidth at reduced cost for systems with high
capacity memory requirements
• Target Applications
– Primary: cost reduction on in-memory applications and databases with
predominantly Sequential and mostly Read-Only processing
Hybrid Memory Subsystem - HMS
© 2019 IBM Corporation
36
1F 2. HMCC I 1F 9IE II9
© 2019 IBM Corporation 9
OpenCAPI 4.0: Asymmetric Open Accelerator Attach
Roadmap of Capabilities and Host Silicon Delivery
Accelerator Protocol CAPI 1.0 CAPI 2.0 OpenCAPI 3.0 OpenCAPI 4.0 OpenCAPI 5.0
First Host Silicon POWER8
(GA 2014)
POWER9 SO
(GA 2017)
POWER9 SO
(GA 2017)
POWER9 AIO
(GA 2020)
POWER10
(GA 2021)
Functional Partitioning Asymmetric Asymmetric Asymmetric Asymmetric Asymmetric
Host Architecture POWER POWER Any Any Any
Cache Line Size Supported 128B 128B 64/128/256B 64/128/256B 64/128/256B
Attach Vehicle PCIe Gen 3
Tunneled
PCIe Gen 4
Tunneled
25 G (open)
Native DL/TL
25 G (open)
Native DL/TL
32/50 G (open)
Native DL/TL
Address Translation On Accelerator Host Host (secure) Host (secure) Host (secure)
Native DMA to Host Mem No Yes Yes Yes Yes
Atomics to Host Mem No Yes Yes Yes Yes
Host Thread Wake-up No Yes Yes Yes Yes
Host Memory Attach Agent No No Yes Yes Yes
Low Latency Short Msg 4B/8B MMIO 4B/8B MMIO 4B/8B MMIO 128B push 128B push
Posted Writes to Host Mem No No No Yes Yes
Caching of Host Mem RA Cache RA Cache No VA Cache VA Cache
© 2019 IBM Corporation
37
:9 A F
© 2019 IBM Corporation
38
4 AU MW C YMS
Intel
CPU
92 u D:MQN u7 Cs9QV O
a au7 C
IBM
POWER
CPU
150GB/s 150GB/s
150GB/s
GPU
150GB/s
32GB/s
CPU-GPU
4.7
39
92 RX T A UV P 13. X OO
• M MRRo a p F 6 3 9AE DM RI F( u
• 350))p F=PT )% nta5AE 9AE a TYMR c cp) ,
GPU GPU GPU
*0
a adAWPPMVb ce
13. B h
zc p c c c cfC SSPYd e a
) (/ - p c c c c D A, mh u krklb
1. 200
u
a ar19 t
ozp
8 3 19upxt ly
F=PT )% m5AE 9AE dA5 M 9MT* 0%, e
/- ) p c
- ns u a
/- 5A 6m )%0 p yc u b
/- 5OIPTMW *%. a5INNM *%/ p yc m
u b PTMYPKIsA G7B/ / b
* https://www-03.ibm.com/press/jp/ja/pressrelease/53461.wss
POWER9	
14nm a
DOMQN 3 C 7 C
RX T A UV P 13.
(
l / -(( 7B8b c 7BF c
l 1 )E x
l 5AE1 A G7B0 C D2( ) *) rl )
7B8 IRT , 78 BWTHR ( ( 78
7B8 IRT ) 78 BWTHR ( 78
7BF -IRT ( 78 BWTHR ( ) 78
7BF IRT - 78 BWTHR ( 78
l 1 -72 ),-94 ,()94 ( ) 94 )-72 66B )---9: (-
l 1 * 94 c ” b(. 94 ) e
l A5 M
(- 9MT =U AWUNPRM1 ) 53A )% e
/ 9MT =U AWUNPRM1 ( 53A )% e
9MT =U AWUNPRM1 (
l C88 )%, C3D3 w ) :661 D4 rl CC61 .%-/D4
l 7 C/ D9491 B UOG D 7B8/ ) 7BF/ ) i
( 72
(-94 ”
l 1 ) ) F
l C1 85:, :5 CHWQVW - )
© 2019 IBM Corporation
42 6
WISTRON “MiHawk”
24 x NVMe = 96 lanes Gen3 PCIe = 48 lanes Gen4 PCIe = 32 lanes OpenCAPI 3.0
Image Source: Wistron
© 2019 IBM Corporation
43
1F 2. E 0 9 A
7
WISTRON “MiHawk”
24 x NVMe = 96 lanes Gen3 PCIe = 48 lanes Gen4 PCIe = 32 lanes OpenCAPI 3.0
Image Source: Wistron
OpenCAPI !
© 2019 IBM Corporation
44

Mais conteúdo relacionado

Mais procurados

New optical cesium
New optical cesiumNew optical cesium
New optical cesiumADVA
 
Are You Ready for Embracing 100G Ethernet?
Are You Ready for Embracing 100G Ethernet?Are You Ready for Embracing 100G Ethernet?
Are You Ready for Embracing 100G Ethernet?Angelina Li
 
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud Hidetsugu Sugiyama
 
Photonic integrated circuits for data center interconnects
Photonic integrated circuits for data center interconnectsPhotonic integrated circuits for data center interconnects
Photonic integrated circuits for data center interconnectsADVA
 
Beyond 100GE
Beyond 100GEBeyond 100GE
Beyond 100GEAPNIC
 
Accelerating 5G enterprise networks with edge computing and latency assurance
Accelerating 5G enterprise networks with edge computing and latency assuranceAccelerating 5G enterprise networks with edge computing and latency assurance
Accelerating 5G enterprise networks with edge computing and latency assuranceADVA
 
100 g dwdm qsfp28 the enabler of 100g end-toend long distance connectivity-co...
100 g dwdm qsfp28 the enabler of 100g end-toend long distance connectivity-co...100 g dwdm qsfp28 the enabler of 100g end-toend long distance connectivity-co...
100 g dwdm qsfp28 the enabler of 100g end-toend long distance connectivity-co...CBO Connecting Technology
 
Low latency for DCI and mobile applications
Low latency for DCI and mobile applicationsLow latency for DCI and mobile applications
Low latency for DCI and mobile applicationsADVA
 
Inter-DCI and co-packaged optics
Inter-DCI and co-packaged opticsInter-DCI and co-packaged optics
Inter-DCI and co-packaged opticsADVA
 
100 g Dwdm Qsfp28 the Enabler of 100g End-Toend Llong Distance Connectivity
100 g Dwdm Qsfp28 the Enabler of 100g End-Toend Llong Distance Connectivity100 g Dwdm Qsfp28 the Enabler of 100g End-Toend Llong Distance Connectivity
100 g Dwdm Qsfp28 the Enabler of 100g End-Toend Llong Distance ConnectivityCBO Connecting Technology
 
Propelling 5G forward: a closer look at 3GPP Release-16
Propelling 5G forward: a closer look at 3GPP Release-16Propelling 5G forward: a closer look at 3GPP Release-16
Propelling 5G forward: a closer look at 3GPP Release-16Qualcomm Research
 
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...NTT DATA Technology & Innovation
 
HPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyHPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyMartin Hamilton
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...Martin Hamilton
 
Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Miya Kohno
 
OIF on 400G for Next Gen Optical Networks Conference
OIF on 400G for Next Gen Optical Networks ConferenceOIF on 400G for Next Gen Optical Networks Conference
OIF on 400G for Next Gen Optical Networks ConferenceDeborah Porchivina
 

Mais procurados (20)

NFV evolution towards 5G
NFV evolution towards 5GNFV evolution towards 5G
NFV evolution towards 5G
 
New optical cesium
New optical cesiumNew optical cesium
New optical cesium
 
Are You Ready for Embracing 100G Ethernet?
Are You Ready for Embracing 100G Ethernet?Are You Ready for Embracing 100G Ethernet?
Are You Ready for Embracing 100G Ethernet?
 
10G Ethernet Overview & Use Cases
10G Ethernet Overview & Use Cases10G Ethernet Overview & Use Cases
10G Ethernet Overview & Use Cases
 
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud OpenShift  Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
OpenShift Kubernetes Native Infrastructure for 5GC and Telco Edge Cloud
 
Photonic integrated circuits for data center interconnects
Photonic integrated circuits for data center interconnectsPhotonic integrated circuits for data center interconnects
Photonic integrated circuits for data center interconnects
 
Beyond 100GE
Beyond 100GEBeyond 100GE
Beyond 100GE
 
Accelerating 5G enterprise networks with edge computing and latency assurance
Accelerating 5G enterprise networks with edge computing and latency assuranceAccelerating 5G enterprise networks with edge computing and latency assurance
Accelerating 5G enterprise networks with edge computing and latency assurance
 
10G Ethernet Outlook for HPC
10G Ethernet Outlook for HPC10G Ethernet Outlook for HPC
10G Ethernet Outlook for HPC
 
100 g dwdm qsfp28 the enabler of 100g end-toend long distance connectivity-co...
100 g dwdm qsfp28 the enabler of 100g end-toend long distance connectivity-co...100 g dwdm qsfp28 the enabler of 100g end-toend long distance connectivity-co...
100 g dwdm qsfp28 the enabler of 100g end-toend long distance connectivity-co...
 
Low latency for DCI and mobile applications
Low latency for DCI and mobile applicationsLow latency for DCI and mobile applications
Low latency for DCI and mobile applications
 
Inter-DCI and co-packaged optics
Inter-DCI and co-packaged opticsInter-DCI and co-packaged optics
Inter-DCI and co-packaged optics
 
100 g Dwdm Qsfp28 the Enabler of 100g End-Toend Llong Distance Connectivity
100 g Dwdm Qsfp28 the Enabler of 100g End-Toend Llong Distance Connectivity100 g Dwdm Qsfp28 the Enabler of 100g End-Toend Llong Distance Connectivity
100 g Dwdm Qsfp28 the Enabler of 100g End-Toend Llong Distance Connectivity
 
Propelling 5G forward: a closer look at 3GPP Release-16
Propelling 5G forward: a closer look at 3GPP Release-16Propelling 5G forward: a closer look at 3GPP Release-16
Propelling 5G forward: a closer look at 3GPP Release-16
 
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
 
HPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case StudyHPC Midlands - E.ON Supercomputing Case Study
HPC Midlands - E.ON Supercomputing Case Study
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
 
Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Declarative Programming and a form of SDN
Declarative Programming and a form of SDN
 
JETSON : AI at the EDGE
JETSON : AI at the EDGEJETSON : AI at the EDGE
JETSON : AI at the EDGE
 
OIF on 400G for Next Gen Optical Networks Conference
OIF on 400G for Next Gen Optical Networks ConferenceOIF on 400G for Next Gen Optical Networks Conference
OIF on 400G for Next Gen Optical Networks Conference
 

Semelhante a BL .VL BRN R S Q E5 S Q31 9 VWS

Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer FugakuRCCSRENKEI
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsNeo4j
 
How Big Data is Transforming the Data Center
How Big Data is Transforming the Data CenterHow Big Data is Transforming the Data Center
How Big Data is Transforming the Data CenterHelpSystems
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...Filipe Miranda
 
IBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsIBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsDavid Spurway
 
[Café techno] - Ibm power7 - Les dernières annonces
[Café techno] - Ibm power7 - Les dernières annonces[Café techno] - Ibm power7 - Les dernières annonces
[Café techno] - Ibm power7 - Les dernières annoncesGroupe D.FI
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture explorationDeepak Shankar
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsCisco Canada
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIDataWorks Summit
 
Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayKarthik Ramasamy
 
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 KeynoteScaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 KeynoteStreamNative
 
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17David Spurway
 

Semelhante a BL .VL BRN R S Q E5 S Q31 9 VWS (20)

IBM PureSystems
IBM PureSystemsIBM PureSystems
IBM PureSystems
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
Deploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime InsightsDeploying Massive Scale Graphs for Realtime Insights
Deploying Massive Scale Graphs for Realtime Insights
 
How Big Data is Transforming the Data Center
How Big Data is Transforming the Data CenterHow Big Data is Transforming the Data Center
How Big Data is Transforming the Data Center
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
IBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsIBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutions
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
 
[Café techno] - Ibm power7 - Les dernières annonces
[Café techno] - Ibm power7 - Les dernières annonces[Café techno] - Ibm power7 - Les dernières annonces
[Café techno] - Ibm power7 - Les dernières annonces
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data Analytics
 
FPGA MeetUp
FPGA MeetUpFPGA MeetUp
FPGA MeetUp
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/day
 
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 KeynoteScaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
Scaling Apache Pulsar to 10 Petabytes/Day - Pulsar Summit NA 2021 Keynote
 
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 

Mais de Yutaka Kawai

05 high density openpower dual-socket p9 system design example
05 high density openpower dual-socket p9 system design example05 high density openpower dual-socket p9 system design example
05 high density openpower dual-socket p9 system design exampleYutaka Kawai
 
04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbersYutaka Kawai
 
03 desktop on an open powersystem
03 desktop on an open powersystem03 desktop on an open powersystem
03 desktop on an open powersystemYutaka Kawai
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...Yutaka Kawai
 
01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a boxYutaka Kawai
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2Yutaka Kawai
 
Sc19 ibm hms final
Sc19 ibm hms finalSc19 ibm hms final
Sc19 ibm hms finalYutaka Kawai
 
0 foundation update__final - Mendy Furmanek
0 foundation update__final - Mendy Furmanek0 foundation update__final - Mendy Furmanek
0 foundation update__final - Mendy FurmanekYutaka Kawai
 
10th meetup20191209b
10th meetup20191209b10th meetup20191209b
10th meetup20191209bYutaka Kawai
 
Light talk kioxia_20191023r2
Light talk kioxia_20191023r2Light talk kioxia_20191023r2
Light talk kioxia_20191023r2Yutaka Kawai
 
Open power ae_jd_20191223_v1
Open power ae_jd_20191223_v1Open power ae_jd_20191223_v1
Open power ae_jd_20191223_v1Yutaka Kawai
 
Open power keynote- openisa
Open power  keynote- openisa Open power  keynote- openisa
Open power keynote- openisa Yutaka Kawai
 
9th meetup20191023
9th meetup201910239th meetup20191023
9th meetup20191023Yutaka Kawai
 
Ibm open power_meetup_xilinx_lighting_talk_rev1.0
Ibm open power_meetup_xilinx_lighting_talk_rev1.0Ibm open power_meetup_xilinx_lighting_talk_rev1.0
Ibm open power_meetup_xilinx_lighting_talk_rev1.0Yutaka Kawai
 
Nec exp ether071719
Nec exp ether071719Nec exp ether071719
Nec exp ether071719Yutaka Kawai
 
July japan meetup latest
July japan meetup latestJuly japan meetup latest
July japan meetup latestYutaka Kawai
 
8th meetup20190717
8th meetup201907178th meetup20190717
8th meetup20190717Yutaka Kawai
 
2018 capi contest introduction japan-v2b
2018 capi contest introduction japan-v2b2018 capi contest introduction japan-v2b
2018 capi contest introduction japan-v2bYutaka Kawai
 

Mais de Yutaka Kawai (20)

05 high density openpower dual-socket p9 system design example
05 high density openpower dual-socket p9 system design example05 high density openpower dual-socket p9 system design example
05 high density openpower dual-socket p9 system design example
 
04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers
 
03 desktop on an open powersystem
03 desktop on an open powersystem03 desktop on an open powersystem
03 desktop on an open powersystem
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...
 
01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box01 high bandwidth acquisitioncomputing compressionall in a box
01 high bandwidth acquisitioncomputing compressionall in a box
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
Sc19 ibm hms final
Sc19 ibm hms finalSc19 ibm hms final
Sc19 ibm hms final
 
0 foundation update__final - Mendy Furmanek
0 foundation update__final - Mendy Furmanek0 foundation update__final - Mendy Furmanek
0 foundation update__final - Mendy Furmanek
 
10th meetup20191209b
10th meetup20191209b10th meetup20191209b
10th meetup20191209b
 
Light talk kioxia_20191023r2
Light talk kioxia_20191023r2Light talk kioxia_20191023r2
Light talk kioxia_20191023r2
 
Open power ae_jd_20191223_v1
Open power ae_jd_20191223_v1Open power ae_jd_20191223_v1
Open power ae_jd_20191223_v1
 
Open power keynote- openisa
Open power  keynote- openisa Open power  keynote- openisa
Open power keynote- openisa
 
9th meetup20191023
9th meetup201910239th meetup20191023
9th meetup20191023
 
Ibm open power_meetup_xilinx_lighting_talk_rev1.0
Ibm open power_meetup_xilinx_lighting_talk_rev1.0Ibm open power_meetup_xilinx_lighting_talk_rev1.0
Ibm open power_meetup_xilinx_lighting_talk_rev1.0
 
Ai vision u200
Ai vision u200Ai vision u200
Ai vision u200
 
Nec exp ether071719
Nec exp ether071719Nec exp ether071719
Nec exp ether071719
 
July japan meetup latest
July japan meetup latestJuly japan meetup latest
July japan meetup latest
 
8th meetup20190717
8th meetup201907178th meetup20190717
8th meetup20190717
 
2018 capi contest introduction japan-v2b
2018 capi contest introduction japan-v2b2018 capi contest introduction japan-v2b
2018 capi contest introduction japan-v2b
 
OCP48V Solution
OCP48V SolutionOCP48V Solution
OCP48V Solution
 

Último

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Último (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

BL .VL BRN R S Q E5 S Q31 9 VWS

  • 1. BL .VL BRN R S Q E5 S Q31 9 VWS S Q E5 6RWQ GVMRQ r 1 AWPPMV . u H YI I I IP
  • 2. © 2019 IBM Corporation 2 . L U S D3 4T SJGXOTS roep U S D3 @ RROX fx • 1/ % U S1/ /II Q VGX J TWXMV @ 9 • U S @/ 3 U VOR SXW U S1/ roep • @ / roep • D3 , 4OSGQ /JJOXOTS • U S1/
  • 3. © 2019 IBM Corporation 3 VMTA G7B 8U TLIYPUT
  • 4. OpenPOWER Foundation Mission “Through the growing open ecosystem of the POWER Architecture and its associated technologies, the OpenPOWER Foundation facilitates its Members to share expertise, investment and intellectual property to serve the evolving needs of all end users.” Artificial Intelligence Custom Hyperscale Data Centers Hybrid Cloud Open Solutions IT consumption models are expanding Price/Performance Full system stack innovation required Moore’s Law Technology and Processors 2000 2020 Firmware / OS Accelerators Software Storage Network Full Stack Acceleration (Lower is better) IT innovation can no longer come from just the processor
  • 6. Software Implementation / HPC / Research Chip / SOC I/O / Storage / Acceleration Boards / Systems System / Integration 350+ Members 35 Countries 80+ ISVs A Revolution Looks Like © 2018 OpenPOWER Foundation
  • 7. OpenPOWER Ecosystem - breadth of solutions Sub USD$1,500 developer systems to TCO competitive server solutions right up to world’s fastest supercomputers. All part of the OpenPOWER Ecosystem, all run open software stacks firmware to apps.
  • 8. May 20th Japan Mini Summit
  • 10. OpenPOWER Announcements at North American Summit - II ● IBM releases proof of concept POWER ISA Compliant FPGA Soft Core ○ Allows anyone to experiment with the POWER ISA - researchers to hobbyists, chip manufacturers to hardware accelerator vendors ○ Micropython port also announced ○ Within just two weeks already additional FPGAs supported ○ Zephyr IoT kernel support under development ○ Linux Kernel support expected by year end Refer to : https://openpowerfoundation.org/the-next-step-in-the-openpower-foundation-journey/ POWER Instruction Set Architecture (ISA)
  • 11. © 2019 IBM Corporation 11 VMTA G7B C SSPY it
  • 12. © 2019 IBM Corporation 12 . L U S D3 4T SJGXOTS roep U S D3 @ RROX fx • 1/ % U S1/ /II Q VGX J TWXMV @ 9 • U S @/ 3 U VOR SXW U S1/ roep • @ / roep • D3 , 4OSGQ /JJOXOTS • U S1/
  • 13. © 2019 IBM Corporation 13 1F 218 4 CC I U S D3 @ RROX t c d • B@ /VINOX IX V 3B 0 WOS WW 1NOSG 3SMOS VOSM s) g y • t 8GUGS w s U S D3 @ RROX B@ t 9OS 4T SJGXOTS • % , ( .@GS 2O MT e • 1/ % U S1/ /II Q VGX J TWXMV @ 9 − NXXUW-%%WXGXOI WIN J ITR%NTWX JFLOQ W%TU SUT VSG ,%)J% TWXMV @ 9 ( TS ( 1/ V M ( ( @ ( ( , UJL • U S @/ 3 U VOR SXW − NXXUW-%%WXGXOI WIN J ITR%NTWX JFLOQ W%TU SUT VSG ,% %0QGSINGVJF ( UJL
  • 14. © 2019 IBM Corporation 14 © 2018 IBM Corporation | IBM Confidential 3 PostgreSQL Accelerated with Regular Expression Matching § Two ways for user interface: § UDF (User Defined Function) § PostgreSQL Hooks/Plugins (Standard SQL) SELECT psql_regex_capi(table, pattern, attr_id); SELECT * FROM table WHERE pkt ~ pattern; 2. 1F 2. 9I 2EHI /
  • 15. © 2019 IBM Corporation 15 2. 1F 2. 9I 2EHI / © 2018 IBM Corporation | IBM Confidential 4 Overall Architecture: Multi-Threading and Multi-Engine Host Memory User Space) Buffer Cache Packet Buffer 0 Packet … Packet FPGA Result 0 Result 1 … Result N … PostgreSQL Query Results Query results to DB clients A X I I n t e r c o n n e c t Job Manager Job Queue Packet Buffer Pointer 0 … Packet Buffer Pointer N Result Buffer Packet transfer via CAPI/OpenCAPI with Virtual Address Pointers Tuples Tuples Page Thread N memcpy memcpy …… Local Configuration Bus Storage Pages Pages … Pages Pages Relations Pages Pages … Pages Pages Pages Pages … Pages Pages Tuples Tuples Page Thread 0 Pages Pages … Pages Pages Relation 0 Relation N Pages Pages … Pages Pages Pages Pages … Pages Pages Pages Pages … Pages Pages Pages Pages … Pages Pages Pages Pages … Pages Pages M RegEx 0 M RegEx 1 M … M RegEx N General Query M M Others M Packet Buffer N Packet … Packet FPGA Modules under construction FPGA Engines / User space buffers for CAPI PostgreSQL internal data structures AXI/PSL Bridge or AXI/TLX Bridge
  • 16. © 2019 IBM Corporation 16 © 2018 IBM Corporation | IBM Confidential 9 Performance Evaluation: Environment Setups Host Server Romulus, 2-socket, POWER 9 22 cores, 512GB memory, normal SATA hard disk FPGA Card Xilinx VU9P CAPI CAPI 2.0 + SNAP 2.0 CAPI-Regex 8 16X1 engines, 16 packet pipelines per engine; 2 64X1 engines, 64 pipelines per engine. All running @225Mhz PostgreSQL Version 11.2, complied from source shared_buffers: the amount of memory the database server uses for shared memory buffers 4GB, 1GB max_worker_processes: the maximum number of background processes that the system can support. 176 (the max value supported by Romulus) max_parallel_workers: the maximum number of workers that the system can support for parallel queries 176 (the max value supported by Romulus) Queries CAPI-Regex in UDF mode SELECT * FROM table WHERE pkt ~ pattern Test Table Table Type Synthetic Tables Table Schema Each row contains 2 columns (ID, packet); packet is 1024-byte random string Table Size Number of rows varies between tables 2. 1F 2. 9I 2EHI /
  • 17. © 2019 IBM Corporation 17 © 2018 IBM Corporation | IBM Confidential 10 Performance Comparison with CPU ü CPU version rans as it is; CAPI version runs with 8 threads on 8 16X1 regex engines with the optimal number of jobs per thread ü CAPI-regex can be ~x5 to ~x10 faster than the best PostgreSQL built-in functions (CPU multi-threads enabled) ü Max 4 threads are enabled for CPU multi-threading when table size larger than 128000 ü Buffer cache size can impact CPU version but not too much on CAPI version 512k 256k 128k 64k 32k 16k 8k 4k regex_capi BC 1GB 0.23 0.25 0.06 0.06 0.07 0.07 0.09 0.15 regex_capi BC 4GB 0.20 0.25 0.06 0.05 0.06 0.07 0.09 0.14 CPU BC 1GB 1.03 1.02 1.00 0.92 1.00 1.05 1.08 1.07 CPU BC 4GB 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.20 0.40 0.60 0.80 1.00 1.20 RelativeQueryTime Table Size (Number of 1024-Byte Lines) Query Time Comparison Between CAPI-regex and CPU regex_capi BC 1GB regex_capi BC 4GB CPU BC 1GB CPU BC 4GB 2. 1F 2. 9I 2EHI /
  • 18. © 2019 IBM Corporation 18 1F . LF C IH :OIVT GXX • AOS D3 ITV • U S D3 @/ WIGQGV LO J UTOSX W HW X • DVOXX S OS C 29 ( − MNJQ U S @T VI WOR QGXOTS XTTQW − EOQOS CO GJT LTV 4 / W SXN WOW • NXXUW-%%MOXN H ITR%GSXTSHQGSINGVJ%ROIVT GXX
  • 19. © 2019 IBM Corporation 19 1F . LF C IH :OIVTU XNTS • @RGQQ RH JJ J XNTS OSX VUV X V − NXXUW-%%ROIVTU XNTS TVM% • :OING Q QOSM UTVX J OX XT XN ROIVT GXX ITV − T RTJOLOIGXOTSW XT M S VOI ITJ RTWXQ UQGXLTVR WU IOLOI ITJ GUUQOIGXOTS WXGVX U ITSWTQ XI
  • 20. © 2019 IBM Corporation 20 53A VMT53A
  • 21. © 2019 IBM Corporation 21 . L U S D3 4T SJGXOTS roep U S D3 @ RROX fx • 1/ % U S1/ /II Q VGX J TWXMV @ 9 • U S @/ 3 U VOR SXW U S1/ roep • @ / roep • D3 , 4OSGQ /JJOXOTS • U S1/
  • 22. © 2019 IBM Corporation 22 11SNAP Framework built on Power™ CAPI technology2017, IBM Corporation Memory Subsystem Virt Addr CPU Core CPU Core CPU Core CPU Core CPU Core CPU Core App External Device I/F VariablesInput Data DD Device Driver Storage Area Variables Input Data Variables Input Data Output Data Output Data • An application calls a device driver to utilize an Accelerator or any device outside the chip • The device driver performed a memory mapping operation. 3 versions of the data (not coherent). 1000s of instructions in the device driver. An application without CAPI
  • 23. © 2019 IBM Corporation 23 12SNAP Framework built on Power™ CAPI technology2017, IBM Corporation Memory Subsystem Virt Addr POWER8 Core POWER8 Core POWER8 Core POWER8 Core POWER8 Core POWER8 Core App FPGA PCIE PSL Variables Input Data Output Data 1 coherent version of the data. No device driver call/instructions. • CPU unloaded since no device driver and accelerator doing the application work • The FPGA shares memory with the cores An application with CAPI
  • 24. © 2019 IBM Corporation 24 13SNAP Framework built on Power™ CAPI technology2017, IBM Corporation Effect of CAPI hardware vs. PCI-E Device Driver Typical I/O Model Flow: Flow with a CAPI Model: Shared Mem. Notify Accelerator Acceleration Shared Memory Completion DD Call Copy or Pin Source Data MMIO Notify Accelerator Acceleration Poll / Interrupt Completion Copy or Unpin Result Data Ret. From DD Completion Application Dependent, but Equal to below Application Dependent, but Equal to above 300 Instructions 10,000 Instructions 3,000 Instructions 1,000 Instructions 1,000 Instructions 7.9µs 4.9µs Total ~13µs for data prep 400 Instructions 100 Instructions 0.3µs 0.06µs Total 0.36µs
  • 25. © 2019 IBM Corporation 25 18SNAP Framework built on Power™ CAPI technology2017, IBM Corporation FPGA POWER8 Core Recall CAPI technology connections Proprietary hardware and designs to enable coherent acceleration Operating system enablement • little endian linux • kernel driver (cxl) • user library (libcxl) Customer application and accelerator POWER8 Processor OS App Memory (Coherent) AFU PSL PCIe CAPP cxl libcxl ▪ PSLSE models the red outlined area ▪ Re-implements libcxl api calls ▪ Models memory access ▪ Provides hardware ports to afu ▪ Enables co-simulation of AFU and App ▪ Publicly available on github
  • 26. © 2019 IBM Corporation 26 19SNAP Framework built on Power™ CAPI technology2017, IBM Corporation FPGA POWER9 Core OpenCAPI technology connections Proprietary hardware and reference designs to enable coherent acceleration Operating system enablement • little endian linux • reference kernel driver (ocxl) • reference user library (libocxl) Customer application and accelerator POWER9 Processor OS App Memory (Coherent) AFU TLx DLx 25G phy 25G phy DL TL NPU (w/CAPP fcn) PSL ocxl libocxl ▪ OCSE models the red outlined area ▪ OCSE enable AFU and App co-simulation IF reference libocxl and reference TLx/DLx are used. ▪ OCSE dependencies/assumptions – Fixed reference TLx/AFU interface – Fixed reference libocxl user API ▪ Will be available to consortium members TERMS: OpenCAPI Simulation Environment (OCSE) OpenCAPI defines a Data Link Layer (DL) and Transaction Layer (TL) cited by https://www.kernel.org/doc/html/latest/userspace-api/accelerators/ocxl.html
  • 27. © 2019 IBM Corporation 27 17SNAP Framework built on Power™ CAPI technology2017, IBM Corporation Feature CAPI 1.0 CAPI 2.0 OpenCAPI 3.0 OpenCAPI 4.0 Processor Generation POWER8 POWER9 POWER9 Future CAPI Logic Placement FPGA/ASIC FPGA/ASIC NA DL/TL on Host DLx/TLx on endpoint FPGA/ASIC NA DL/TL on Host DLx/TLx on endpoint FPGA/ASIC Interface Lanes per Instance Lane bit rate PCIe Gen3 x8/x16 8 Gb/s PCIe Gen4 2 x (Dual x8) 16 Gb/s Direct 25G x8 25 Gb/s Direct 25G+ x4, x8, x16, x32 25+ Gb/s Address Translation on CPU No Yes Yes Yes Native DMA from Endpoint Accelerator No Yes Yes Yes Home Agent Memory on OpenCAPI Endpoint with Load/Store Access No No Yes Yes Native Atomic Ops to Host Processor Memory from Accelerator No Yes Yes Yes Accelerator -> HW Thread Wake-up No Yes Yes Yes Low-latency small message push 128B Writes to Accelerator MMIO 4/8B only MMIO 4/8B only MMIO 4/8B only Yes Host Memory Caching Function on Accelerator Real Address Cache in PSL Real Address Cache in PSL No Effective Address Cache in Accelerator Remove PCIe layers to reduce latency significantlyComparison of IBM CAPI Implementations
  • 28. © 2019 IBM Corporation 28 October 25th 2018 Power™ Coherent Acceleration Processor Interface (CAPI) 22 SNAP framework Process C Slave Context libcxl cxl SNAP library Job Queue Process B Slave Context libcxl cxl SNAP library Job Queue Process A Slave Context libcxl cxl SNAP library Job Queue Application on Host Acceleration on FPGA Software Program PSL/AXI bridge DRAM on-card Network (TBD) NVMeAXI Host DMA Control MMIO Job Manager Job Queue Quick and easy developing Use High Level Synthesis tool to convert C/C++ to RTL, or directly use RTL Programming based on SNAP library and AXI interface AXI is an industry standard for on-chip interconnection (https://www.arm.com/products/system-ip/amba-specifications) C/C++ or RTL Hardware Action HDK: CAPI PSL or BSP CAPI
  • 29. © 2019 IBM Corporation 29 October 25th 2018 Power™ Coherent Acceleration Processor Interface (CAPI) 45 Scatter gather memory access Results: (Power9 – CAPI2.0 – 2.154GHz, 512MB RAM) (FPGA card: FW609 + S241: VU9P Gen3x16) (SNAP) - CAPI way saves the time for “SW gather” with relatively small penalty when K grows N=1024 blocks block size= 2kBytes Traditional way Time (µs) CAPI way Time (µs) How scattered SW gather DMA Sum Verilog HLS -RK1 309.3 183.5 492.8 171.65 173.3 -RK4 319.05 186.05 505.1 180.9 180.9 -RK16 305.1 185.7 490.8 184.6 186.95 -RK64 320.6 186.85 507.45 186.3 187.5 -RK256 318.3 185.65 503.95 218.55 215.35 -RK1024 333 189.15 522.15 236.85 224.95 -RK4096 324.4 189.35 513.75 241.15 225.55 -RK16384 307.4 185.75 493.15 240.9 224.9 0 100 200 300 400 500 600 -RK1 -RK4 -RK16 -RK64 -RK256 -RK1024 -RK4096 -RK16384 Verilog HLS Sum Time:µs More scatteredMore scattered Contiguous - Once tuned (using pragmas), HLS can compete with Verilog coding 190us to transfer 2MiB: speed = 11.04GB/s 1 2 2 R = random K is the dispersion factor of the blocks Allocate 2MB in a K * 2MB memory area → K=1 : all blocks contiguous → K=2: 2MB allocated amongst 4MB → K=4: 2MB allocated amongst 8MB
  • 30. © 2019 IBM Corporation 30 c MYK%
  • 31. © 2019 IBM Corporation 31 . L U S D3 4T SJGXOTS roep U S D3 @ RROX fx • 1/ % U S1/ /II Q VGX J TWXMV @ 9 • U S @/ 3 U VOR SXW U S1/ roep • @ / roep • D3 , 4OSGQ /JJOXOTS • U S1/
  • 32. © 2019 IBM Corporation 32 2 EFEH 218 4 2 E HHE E E M 9 . 1 4E9 C9F © 2019 IBM Corporation 2 Proposed POWER Processor Technology and I/O Roadmap POWER8 Architecture POWER9 Architecture 2014 POWER8 12 cores 22nm New Micro- Architecture New Process Technology 2016 POWER8 w/ NVLink 12 cores 22nm Enhanced Micro- Architecture With NVLink 2017 P9 SO 12/24 cores 14nm New Micro- Architecture Direct attach memory New Process Technology 2018 P9 SU 12/24 cores 14nm Enhanced Micro- Architecture Buffered Memory POWER7 Architecture 2010 POWER7 8 cores 45nm New Micro- Architecture New Process Technology 2012 POWER7+ 8 cores 32nm Enhanced Micro- Architecture New Process Technology 2021 P10 TBA cores New Micro- Architecture New Process Technology POWER10 2020 P9 AIO 12/24 cores 14nm Enhanced Micro- Architecture New Memory Subsystem Up To 150 GB/s PCIe Gen4 x48 25 GT/s 300GB/s CAPI 2.0, OpenCAPI3.0, NVLink Sustained Memory Bandwidth Standard I/O Interconnect Advanced I/O Signaling Advanced I/O Architecture Up To 210 GB/s PCIe Gen4 x48 25 GT/s 300GB/s CAPI 2.0, OpenCAPI3.0, NVLink Up To 650 GB/s PCIe Gen4 x48 25 GT/s 300GB/s CAPI 2.0, OpenCAPI4.0, NVLink Up To 800 GB/s PCIe Gen5 32 & 50 GT/s TBA Up To 210 GB/s PCIe Gen3 N/A CAPI 1.0 Up To 210 GB/s PCIe Gen3 20 GT/s 160GB/s CAPI 1.0 , NVLink Up To 65 GB/s PCIe Gen2 N/A N/A Up To 65 GB/s PCIe Gen2 N/A N/A Statement of Direction, Subject to Change 2 Focus of today’s talk Statement of Direction, Subject to Change
  • 33. © 2019 IBM Corporation 33 9 I E IE I 218 4 2 E HHE 9C M © 2019 IBM Corporation 6 Memory Signaling (8x8 OMI) Memory Signaling (8x8 OMI)PowerAXON (x48) PowerAXON (x48) PCIeGen4Signaling(x48) LocalSMPSignaling(3x30) SMPandAcceleratorInterconnect Core Core L2 Core Core L2 Core Core L2 Core Core L2 Core Core L2 Core Core L2 L3 L3 10 MB L3 Region Core Core L2 Core Core L2 Core Core L2 Core Core L2 Core Core L2 Core Core L2 L3 L3 Processor Chip Details • 728 mm2 ( 25.3 x 28.8 mm) • 8 Billion Transistors • Up to 24 SMT4 Cores • Up to 120 MB eDRAM L3 cache Semiconductor Technology • 14nm finFET • Improved device performance • Reduced energy • eDRAM • 17 layer metal stack High Bandwidth Signaling • 25 GT/s low energy differential • PowerAXON, OMI memory • 16 GT/s low energy differential • Local SMP • 16 GT/s PCIe Gen4 Open Memory Interface (OMI) • 16 channels x8 at 25 GT/s • 650 GB/s peak 1:1 r/w bandwidth • Technology Agnostic • Offered w/ Microchip DDR4 buffer (410 GB/s peak bandwidth) PowerAXON 25 GT/s Attach • Up to 16 socket glue-less SMP (4x24 SMP added to 3x30 local) • Up to x48 NVIDIA NVLINK GPU attach • Up to x48 OpenCAPI 4.0 coherent accelerator / memory attach Industry Standard I/O Attach • x48 PCIe Gen 4 at 16 GT/s • Up to x16 CAPI 2.0 coherent accelerator / storage attach Final Addition to the POWER9 Processor Family 2 TB/s Raw Signaling Bandwidth Shared by 6 Attach Protocols The Bandwidth Beast Advanced I/O (AIO)
  • 34. © 2019 IBM Corporation 34 218 4 MHI CH 0 CE M I 9I M © 2019 IBM Corporation Connect all memory technologies to Power systems through OpenCAPI and OMI Why? – It is a high speed interface that allows flexibility to attach any new and emerging memory technology, including persistent memories like storage class memory (SCM) CPU P9 Switch DDR DDR NVME OpenCAPI P9 – OpenCAPI CPU P9’/P10 Switch OMI OMI NVME OpenCAPI P9’ Axone, P10 – OpenCAPI and OMI POWER Systems Memory Strategy
  • 35. © 2019 IBM Corporation 35 -M: 0 CE M :HMHI C -0 © 2019 IBM Corporation • Hybrid Memory Subsystem using Low Latency NAND and DRAM – Exclusive partnership for low latency NAND media, and with Bittware for design of accelerator card 250-HMS – Low Latency NAND for capacity and persistence, with DRAM used for caching to lower average latency • Capabilities – SCM on OpenCAPI using Load/Store memory semantics – Competitive latency and bandwidth at reduced cost for systems with high capacity memory requirements • Target Applications – Primary: cost reduction on in-memory applications and databases with predominantly Sequential and mostly Read-Only processing Hybrid Memory Subsystem - HMS
  • 36. © 2019 IBM Corporation 36 1F 2. HMCC I 1F 9IE II9 © 2019 IBM Corporation 9 OpenCAPI 4.0: Asymmetric Open Accelerator Attach Roadmap of Capabilities and Host Silicon Delivery Accelerator Protocol CAPI 1.0 CAPI 2.0 OpenCAPI 3.0 OpenCAPI 4.0 OpenCAPI 5.0 First Host Silicon POWER8 (GA 2014) POWER9 SO (GA 2017) POWER9 SO (GA 2017) POWER9 AIO (GA 2020) POWER10 (GA 2021) Functional Partitioning Asymmetric Asymmetric Asymmetric Asymmetric Asymmetric Host Architecture POWER POWER Any Any Any Cache Line Size Supported 128B 128B 64/128/256B 64/128/256B 64/128/256B Attach Vehicle PCIe Gen 3 Tunneled PCIe Gen 4 Tunneled 25 G (open) Native DL/TL 25 G (open) Native DL/TL 32/50 G (open) Native DL/TL Address Translation On Accelerator Host Host (secure) Host (secure) Host (secure) Native DMA to Host Mem No Yes Yes Yes Yes Atomics to Host Mem No Yes Yes Yes Yes Host Thread Wake-up No Yes Yes Yes Yes Host Memory Attach Agent No No Yes Yes Yes Low Latency Short Msg 4B/8B MMIO 4B/8B MMIO 4B/8B MMIO 128B push 128B push Posted Writes to Host Mem No No No Yes Yes Caching of Host Mem RA Cache RA Cache No VA Cache VA Cache
  • 37. © 2019 IBM Corporation 37 :9 A F
  • 38. © 2019 IBM Corporation 38 4 AU MW C YMS
  • 39. Intel CPU 92 u D:MQN u7 Cs9QV O a au7 C IBM POWER CPU 150GB/s 150GB/s 150GB/s GPU 150GB/s 32GB/s CPU-GPU 4.7 39 92 RX T A UV P 13. X OO • M MRRo a p F 6 3 9AE DM RI F( u • 350))p F=PT )% nta5AE 9AE a TYMR c cp) , GPU GPU GPU *0
  • 40. a adAWPPMVb ce 13. B h zc p c c c cfC SSPYd e a ) (/ - p c c c c D A, mh u krklb 1. 200 u a ar19 t ozp 8 3 19upxt ly F=PT )% m5AE 9AE dA5 M 9MT* 0%, e /- ) p c - ns u a /- 5A 6m )%0 p yc u b /- 5OIPTMW *%. a5INNM *%/ p yc m u b PTMYPKIsA G7B/ / b * https://www-03.ibm.com/press/jp/ja/pressrelease/53461.wss POWER9 14nm a DOMQN 3 C 7 C
  • 41. RX T A UV P 13. ( l / -(( 7B8b c 7BF c l 1 )E x l 5AE1 A G7B0 C D2( ) *) rl ) 7B8 IRT , 78 BWTHR ( ( 78 7B8 IRT ) 78 BWTHR ( 78 7BF -IRT ( 78 BWTHR ( ) 78 7BF IRT - 78 BWTHR ( 78 l 1 -72 ),-94 ,()94 ( ) 94 )-72 66B )---9: (- l 1 * 94 c ” b(. 94 ) e l A5 M (- 9MT =U AWUNPRM1 ) 53A )% e / 9MT =U AWUNPRM1 ( 53A )% e 9MT =U AWUNPRM1 ( l C88 )%, C3D3 w ) :661 D4 rl CC61 .%-/D4 l 7 C/ D9491 B UOG D 7B8/ ) 7BF/ ) i ( 72 (-94 ” l 1 ) ) F l C1 85:, :5 CHWQVW - )
  • 42. © 2019 IBM Corporation 42 6 WISTRON “MiHawk” 24 x NVMe = 96 lanes Gen3 PCIe = 48 lanes Gen4 PCIe = 32 lanes OpenCAPI 3.0 Image Source: Wistron
  • 43. © 2019 IBM Corporation 43 1F 2. E 0 9 A 7 WISTRON “MiHawk” 24 x NVMe = 96 lanes Gen3 PCIe = 48 lanes Gen4 PCIe = 32 lanes OpenCAPI 3.0 Image Source: Wistron OpenCAPI !
  • 44. © 2019 IBM Corporation 44