SlideShare a Scribd company logo
1 of 29
OpenPOWER Acceleration of HPCC Systems
2015 LexisNexis® Risk Solutions HPCC Engineering Summit – Community Day
Presenters: JT Kellington & Allan Cantle
September 29, 2015
Delray Beach, FL
Accelerating the
Pace of Innovation
Overview/Agenda
• The OpenPOWER Story
• Porting HPCC Systems to ppc64el
• POWER8 at a Glance
• Power 8 Data Centric Architectures with FPGA Acceleration
• Summary
2
The OpenPOWER Story
OpenPOWER, a catalyst for Open Innovation
4
• Moore’s law no longer satisfies
performance gain
• Growing workload demands
• Numerous IT consumption models
• Mature Open software ecosystem
Performance of POWER architecture
amplified capability
Open Development
open software, open hardware
Collaboration of thought leaders
simultaneous innovation, multiple disciplines
The OpenPOWER Foundation is an open development community,
using the POWER Architecture to serve the evolving needs of customers.
• Rich software ecosystem
• Spectrum of power servers
• Multiple hardware options
• Derivative POWER chips
Market Shifts New Open Innovation
 145+ members investing on Power architecture
 Geographic, sector, and expertise diversity
 1,500 ISVs serving Linux on POWER, now with easier
porting from x86!
 Open from the chip through software
 9 workgroups chartered, 15 hardware innovations
revealed at March 2015 summit
 Partner announced plans, offerings, deployments
www.openpowerfoundation.org
Porting HPCC Systems to ppc64el
HPCC Systems and POWER8 Designed For Easy Porting
HPCC Systems Framework
• ECL is Platform Agnostic
• Hardware and architecture
changes transparent to end user
• CMake, gcc, binutils, Node.js, etc.
• Open Source!
• Quick and easy collaboration
POWER8 Designed for Open Innovation
• Little Endian OS
• More compatible with industry
• IBM Software Development Kit for
Linux on Power
• Advance Toolchain
• Post-Link Optimization
• Linux performance analysis tools
• Full support in Ubuntu, RHEL, and
SLES
6
Porting Details
Initial port took less than a week!
• Added ARCH defines for PPC64EL (system/include/platform.h)
• Added stack description for POWER8 (ecl/hql/hqlstack.hpp)
• Added binutils support for powerpc (ecl/hqlcpp/hqlres.cpp)
7
Deployment and testing took a bit longer
• Couple of bugs found:
• Stricter alignment for atomic operations
• New compiler did not allow compare of “this” to NULL
• Updates to Init system with latest Ubuntu
• Monkey on the keyboard!
• Basic setup and running was easy, but harder to go to the “next level”
• Storage configuration, memory usage, thread support, etc.
POWER8 at a Glance
POWER8 Is Designed For Big Data!
9
Technology
• 22 nm SOI, eDRAM, 15 ML 650 mm2
Caches
• 512 KB SRAM L2 / core
• 96 MB eDRAM shared L3
Memory
• Up to 230 GB/s
sustained bandwidth
Bus Interfaces
• Durable open memory attach
interface
• Up to 48x 48x Integrated PCI
gen3/socket
• SMP interconnect
• CAPI
Cores
• 12 cores (SMT8)
• 8 dispatch, 10 issue,
16 execution pipes
• 2x internal data
flows/queues
• Enhanced prefetching
• 64 KB data cache,
32 KB instruction cache
Accelerators
• Crypto and memory
expansion
• Transactional memory
• VMM assist
• Data move/VM mobility
POWER8 Scale-Out Dual Chip Module
Chip Interconnect
CoreCoreCore
L2L2L2
L3
Bank
L3
Bank
L3
Bank
L3
Bank
L3
Bank
L3
Bank
L2L2L2
Core Core Core
Chip Interconnect
Core Core Core
L2 L2 L2
L2 L2 L2
L3
Bank
L3
Bank
L3
Bank
L3
Bank
L3
Bank
L3
Bank
Core Core Core
MemoryBus
MemoryBus
SMPInterconnect
SMPInterconnect
SMPSMPCAPIPCIe
SMPCAPIPCIeSMP
POWER8 Introduces CAPI Technology
10
CAPP PCIe
POWER8 Processor
FPGA
Accelerated Functional Unit
(AFU)
CAPI
IBM Supplied POWER Service
Layer
Typical I/O Model Flow
Flow with a Coherent Model
Shared Mem.
Notify Accelerator
Acceleration
Shared Memory
Completion
DD Call
Copy or Pin
Source Data
MMIO Notify
Accelerator
Acceleration
Poll / Int
Completion
Copy or Unpin
Result Data
Ret. From DD
Completion
Advantages of Coherent Attachment Over I/O Attachment
 Virtual Addressing & Data Caching (significant latency reduction)
 Easier, Natural Programming Model (avoid application restructuring)
 Enables Apps Not Possible on I/O (Pointer chasing, shared mem semaphores, …)
strategy ( )
CAPI Attached Flash Optimization
• Attach IBM FlashSystem to POWER8 via CAPI
• Read/write commands issued via APIs from applications to eliminate 97% of code path length
• Saves 20-30 cores per 1M IOPS
Pin buffers, Translate,
Map DMA, Start I/O
Application
Read/Write Syscall
Interrupt, unmap,
unpin,Iodone scheduling
20K
instructions
reduced to
<2000
Disk and Adapter DD
strategy ( ) iodone ( )
FileSystem
Application
User Library
Posix Async
I/O Style API
Shared Memory
Work Queue
aio_read()
aio_write()
iodone ( )
LVM
CAPI Unlocks the Next Level of Performance for Flash
Identical hardware with 2 different
paths to data
FlashSystem
Conventional
I/O (FC) CAPI
0
20,000
40,000
60,000
80,000
100,000
120,000
Conventional CAPI
IOPS per Hardware Thread
0
50
100
150
200
250
300
350
400
450
500
Conventional CAPI
Latency (microseconds)
IBM POWER S822L >5x better IOPS
per HW thread
>2x lower latency
POWER 8 Data Centric Architectures
with FPGA Acceleration
• Allan Cantle – President & Founder, Nallatech
Nallatech at a glance
Server qualified accelerator cards featuring FPGAs, network I/O and an open
architecture software/firmware framework
» Energy-efficient High Performance
Heterogeneous Computing
» Real-time, low latency network
and I/O processing
» Design Services for Application Porting
» IBM Open Power partner
» Altera OpenCL partner
» Xilinx Alliance partner
» 20 Year History of FPGA Acceleration including
IBM System Z qualified & 1000+ Node deployments
14
Heterogeneous Computing – Motivation
• Efficient Evolutional Compute Performance
• 1998 - DARPA Polymorphic Computing
• 2003 – Power Wall
GPGPUProcessors
FPGA Processors Multicore Microprocessors
Polymorphic Processor
A processor that is equally efficient at processing
all different data types
Polymorphic
Application Data Types
SymbolicStreaming – SIMD - Vector
(DSP)
Bit Level
SWEPTEfficiency
Size,Weight,Energy,Performance,Time
Heterogeneous Computing – Motivation
• Efficient Evolutional Compute Performance
• 1998 - DARPA Polymorphic Computing
• 2003 – Power Wall
GPGPUProcessors
FPGA Processors Multicore Microprocessors
Application Data Types
SymbolicStreaming – SIMD - Vector
(DSP)
Bit Level
SWEPTEfficiency
Size,Weight,Energy,Performance,Time
FPGA’s for “Software Accessible” Compute Acceleration
• FPGAs are best processor for
• “In the pipe” Stream Computing Operations
• Highly parallel Bit & Byte level computation
• FPGA are NOW programmable by software engineers with OpenCL!
• Industry Standard Language managed by Khronos
• Heterogeneous language of choice for all processor types
• Low Level Explicitly Parallel language
• Heterogeneous Support is Evolving
17
Typical Thor & Roxie Cluster Node Configurations
• Large Thor Scale Out Cluster = 400 Nodes
• Large Roxie Scale Out Cluster = 100 Nodes
Xeon
E5-2670 V3
12 Cores
Memory
16GB
SAS
RAID5
Controller
PCIe
L3Cache–30MB
L2/Core–256KB
L1/Core–64KB
1.2TB
HDD
Network
Controller
~60GB/s
PCIe
10GbE
200MB/s
1.2TB
HDD
1.2TB
HDD
140 IOPs
200MB/s
140 IOPs
200MB/s
140 IOPs
Thor
Xeon
E5-2670 V3
12 Cores
L3Cache–30MB
L2/Core–256KB
L1/core–64KB
SATA
RAID5
Controller
1TB
SSD
560MB/s
1TB
SSD
1TB
SSD
100 KIOPs
560MB/s
100 KIOPs
560MB/s
100 KIOPs
Roxie
PCIe
~60GB/s
~60GB/s 2x QPI Links
Memory
16GB
400MB/s
140 IOPs
1.1 GB/s
100 KIOPs
= I/O Bus
= Memory Bus
Server
Node 1
Server
Node 400
Server
Node 2
Server
Node 399
400 Node
THOR Cluster
Hierarchical
400 Port
Network
Switch
Disc
Disc
Disc
Disc
400MB/s
140 IOPs
400MB/s
140 IOPs
400MB/s
140 IOPs
400MB/s
140 IOPs
1GB/s
1GB/s
1GB/s
1GB/s
1GB/s
Server
Node 1
Server
Node 100
Server
Node 2
Server
Node 99
100 Node
ROXIE Cluster
Hierarchical
100 Port
Network
Switch
SSD
SSD
SSD
SSD
1.1GB/s
100 KIOPs
1.1GB/s
100 KIOPs
1.1GB/s
100 KIOPs
1.1GB/s
100 KIOPs
1GB/s
1GB/s
1GB/s
1GB/s
1GB/s
• CPU intensive operations on
fragmented storage data
• Average 1GB/s to storage
• Mapper type parallel operations
• Aggregate 160GBytes/s
Storage bandwidth
• Aggregate 56K IOPs
• CPU intensive operations on
fragmented storage data
• Average 1GB/s to storage
• Pre-indexed eases limitation
• Mapper type parallel operations
• Aggregate 110GBytes/s
Storage bandwidth
• Aggregate 10M IOPs
Typical Scale Out Thor & Roxie Cluster Configurations
Memory Coherency & Data Movement Conflict
• Mapper Type – Needle in a Haystack Operations
• Little to no compute on lots of data
• But hey, it’s not a problem for the Xeon so why worry?
• Creates unnecessary traffic congestion across IO interfaces
• I/O Management Consumes CPU Threads
• Excessive Cache thrashing consumes memory bandwidth
• E.g. only 8x 4K IOPs will fit in a 32K L1 Data Cache
• Power & efficiency issues with all the unnecessary data movement
1x
Network
Controller
1x
10GbE
Storage uP
L3
L2
L1
Memory
1x
100x
Data Movement example of
CPU operating on large data
files with mapper type
functions
The burden of
using CPU centric
architectures for
Data Centric
Problems
= I/O Bus
= Memory Bus
1x
Network
Controller
1x
10GbE
Storage uP
L3
L2
L1
Memory
1x
100x
Example of CPU moving
stored data to network
for another CPU
Memory Coherency & Data Movement Conflict
• Processor Intensive Operations on fragmented data across cluster
• CPU burdened by passing lots of fragmented data from storage to the
network
• Similarly CPU has to wait for fragmented data to arrive
• Network Congestion & latency become big issues
• Less CPU cycles & cache memory available for compute intensive
operations
The burden of
using CPU centric
architectures for
Data Centric
Problems
= I/O Bus
= Memory Bus
Homogeneous Commodity Clusters still win out…right?
• Maybe, but storage is getting a LOT faster with NVMe!
Potential
72GB/s
Network
Controller
PCIe
10GbE
24 NVMe
Drives
uP
L3
L2
L1
Memory
60GB/s
Example of CPU moving
stored data to network
for another CPU
Increased Storage IO Bandwidth would
saturate CPU’s available memory
bandwidth & CPU centric architecture
breaks down completely
The burden of
using CPU centric
architectures for
Data Centric
Problems
= I/O Bus
= Memory Bus
IBM initiated Data Centric Transformation with CAPI
• CAPI = Coherent Accelerator Processor Interface
• Transport over PCIe bus
• IBM’s CAPI Flash Product
• Make Flash memory appear as Block Memory, BM
• FPGA provides “translation” BM to Flash commands
• CPU threads for storage management reduced from 40-50 down to 2
• CAPI attached Network Adaptor
• 3 Symmetrical MultiProcessing, SMP, Interconnects for Scale Up architectures
Network
Controller
PCIe
10GbE
56TB
Storage
Power 8
L3
L2
L1
Memory
230GB/s
SMP x3
FPGA
4GB/s
CAPI
CAPI
4GB/s
IBM
Standard Flash Drawer
vs CAPI Flash Drawer
= I/O Bus
= Memory Bus
Introducing In-Storage Acceleration over CAPI
• Extend CAPI Flash Product Innovations
• Increase CAPI Flash Bandwidth
• FPGA Acceleration
• Leverage NVMe IOPS & Sequential data rates
• Data Compression, Encryption, Filtering
• Bit/Byte Level Stream Computing
• Very power efficient architecture
10GbE
Power 8
L3
L2
L1
FPGA
Acceleration
4GB/s
CAPI
CAPI
Memory
230GB/s
56TB
Storage
Network
Controller
16GB/s
96GB/s
FPGA
SMP x3
= Memory Bus
P8 Node with CAPI Flash – Available Today
P8
12 Core
Socket
56TB
4GB/s
FPGA
4GB/s
1 MIOPS
P8
12 Core
Socket
56TB
4GB/s
FPGA
4GB/s
76.8GB/s
1TB Memory
230GB/s
1TB Memory
230GB/s
56TB
4GB/s
FPGA
4GB/s
56TB
4GB/s
FPGA
4GB/s
40/100GbE
NIC
40/100GbE
NIC
1 MIOPS
1 MIOPS
1 MIOPS
Maximum Configuration Shown
• Possible Configuration for Scale Out Roxie & Thor Clusters
Scale Out P8 Node with FPGA accelerated CAPI Flash
P8
12 Core
Socket
56TB
4GB/s
FPGA
1 MIOPS
P8
12 Core
Socket
56TB
4GB/s
FPGA
76.8GB/s
1TB Memory
230GB/s
1TB Memory
230GB/s
56TB
4GB/s
FPGA
56TB
4GB/s
FPGA
40/100GbE
NIC
40/100GbE
NIC
1 MIOPS
1 MIOPS
1 MIOPS
Maximum Configuration Shown
60GB/s
14.4 MIOPs
60GB/s
14.4 MIOPs
60GB/s
14.4 MIOPs
60GB/s
14.4 MIOPs
• Possible Configuration for Scale Out Roxie & Thor Clusters
Scale Up IBM E870/E880 with Scale out FPGA Acceleration
P8
Socket 56TB
4GB/s
FPGA
60GB/s
14.4 MIOPs
P8
Socket 56TB
4GB/s
FPGA
60GB/s
14.4 MIOPs
P8
Socket56TB
4GB/s
FPGA
60GB/s
14.4 MIOPs
P8
Socket56TB
4GB/s
FPGA
60GB/s
14.4 MIOPs
76.8GB/s
76.8GB/s
76.8GB/s 76.8GB/s
76.8GB/s
76.8GB/s
1TB Memory
230GB/s
1TB Memory
230GB/s
1TB Memory
230GB/s
1TB Memory
230GB/s
• Balanced Scale Up & Scale Out Solution
• Scale Up Processors ideal for Compute intensive operations
• Scale Out FPGA Accelerated Storage, ideal for Parallel Mapper type operations
P8
Socket 56TB
FPGA
P8
Socket 56TB
FPGA
P8
Socket56TB
FPGA
P8
Socket56TB
FPGA 1TB Memory 1TB Memory
1TB Memory 1TB Memory
Scale Up IBM E880 with Scale out FPGA Acceleration
• 16 Way, 192 Core, Scale Up Compute with aggregate
• 16TB Memory @ 3.68TBytes/s
• 896TB Storage @ 960 GBytes/s
In Summary: HPCC Systems and OpenPOWER Are Ready For Business!
• IBM Power8’s Data Centric Architecture is ideal for Big Data Problems
• FPGAs provide an ideal streaming, near storage, accelerator
• Opportunity to re-architect HPCC’s ECL for Power 8 with FPGA Acceleration
• Will provide a step change in performance at lower power & footprint
• A community effort will help to accelerate the transition
29
Contact information: JT Kellington | Allan Cantle
Phone: 512-286-6948 | 805 377 4993
Email: jwkellin@us.ibm.com | a.cantle@nallatech.com
Web: www.ibm.com | www.nallatech.com

More Related Content

What's hot

What's hot (20)

High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC: Energy Efficiency and Water-Cool-Technology InnovationsLenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Infrastructure et serveurs HP
Infrastructure et serveurs HPInfrastructure et serveurs HP
Infrastructure et serveurs HP
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
High Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & RankingsHigh Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & Rankings
 
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
Open CAPI, A New Standard for High Performance Attachment of Memory, Accelera...
 
GEN-Z: An Overview and Use Cases
GEN-Z: An Overview and Use CasesGEN-Z: An Overview and Use Cases
GEN-Z: An Overview and Use Cases
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Best Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing ClustersBest Practices and Performance Studies for High-Performance Computing Clusters
Best Practices and Performance Studies for High-Performance Computing Clusters
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 

Viewers also liked

Leveraging SMEs’ Strength for INSPIRE
Leveraging SMEs’ Strength for INSPIRELeveraging SMEs’ Strength for INSPIRE
Leveraging SMEs’ Strength for INSPIRE
smespire
 
Presentation supervision
Presentation supervisionPresentation supervision
Presentation supervision
Maxikar90
 
asian pharma press -published
asian pharma press -publishedasian pharma press -published
asian pharma press -published
Sachin Sangle
 
20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels
kwb_eensgezind
 
Bridge the gap alerts
Bridge the gap   alertsBridge the gap   alerts
Bridge the gap alerts
aoconno2
 
Android Training in Bangalore
Android Training in BangaloreAndroid Training in Bangalore
Android Training in Bangalore
CMS Computer
 
Manual para-realizar-estudios-de-prefactibilidad-y-factibilidad
Manual para-realizar-estudios-de-prefactibilidad-y-factibilidadManual para-realizar-estudios-de-prefactibilidad-y-factibilidad
Manual para-realizar-estudios-de-prefactibilidad-y-factibilidad
Alicia Quispe
 

Viewers also liked (20)

ORGANIZATIONS CHART
ORGANIZATIONS CHARTORGANIZATIONS CHART
ORGANIZATIONS CHART
 
Leveraging SMEs’ Strength for INSPIRE
Leveraging SMEs’ Strength for INSPIRELeveraging SMEs’ Strength for INSPIRE
Leveraging SMEs’ Strength for INSPIRE
 
Corporate etiquette
Corporate etiquetteCorporate etiquette
Corporate etiquette
 
100+pet ideas
100+pet ideas100+pet ideas
100+pet ideas
 
Presentation supervision
Presentation supervisionPresentation supervision
Presentation supervision
 
asian pharma press -published
asian pharma press -publishedasian pharma press -published
asian pharma press -published
 
Маркетинговый план LR
Маркетинговый план LRМаркетинговый план LR
Маркетинговый план LR
 
20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels20130314 het abc van sociale media sint pauwels
20130314 het abc van sociale media sint pauwels
 
LEVICK Weekly - Mar 8 2013
LEVICK Weekly - Mar 8 2013LEVICK Weekly - Mar 8 2013
LEVICK Weekly - Mar 8 2013
 
LEVICK Weekly - Jan 11 2013
LEVICK Weekly - Jan 11 2013LEVICK Weekly - Jan 11 2013
LEVICK Weekly - Jan 11 2013
 
Presentacion ciclopaseo ingles
Presentacion ciclopaseo inglesPresentacion ciclopaseo ingles
Presentacion ciclopaseo ingles
 
JB Marks Education Trust presentation: Bursary Recipients
JB Marks Education Trust presentation: Bursary RecipientsJB Marks Education Trust presentation: Bursary Recipients
JB Marks Education Trust presentation: Bursary Recipients
 
IRJP Published
IRJP PublishedIRJP Published
IRJP Published
 
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
The importance of Exchange 2013 CAS in Exchange 2013 coexistence | Part 1/2 |...
 
Bridge the gap alerts
Bridge the gap   alertsBridge the gap   alerts
Bridge the gap alerts
 
Android Training in Bangalore
Android Training in BangaloreAndroid Training in Bangalore
Android Training in Bangalore
 
Genre theory
Genre theoryGenre theory
Genre theory
 
Air Compressor Manufacturers, Air Compressor Manufacturers Gujarat
Air Compressor Manufacturers, Air Compressor Manufacturers GujaratAir Compressor Manufacturers, Air Compressor Manufacturers Gujarat
Air Compressor Manufacturers, Air Compressor Manufacturers Gujarat
 
Manual para-realizar-estudios-de-prefactibilidad-y-factibilidad
Manual para-realizar-estudios-de-prefactibilidad-y-factibilidadManual para-realizar-estudios-de-prefactibilidad-y-factibilidad
Manual para-realizar-estudios-de-prefactibilidad-y-factibilidad
 
El costo real de los ataques ¿Cómo te impacta en ROI?
El costo real de los ataques ¿Cómo te impacta en ROI?El costo real de los ataques ¿Cómo te impacta en ROI?
El costo real de los ataques ¿Cómo te impacta en ROI?
 

Similar to OpenPOWER Acceleration of HPCC Systems

GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
Achronix
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
madhuinturi
 

Similar to OpenPOWER Acceleration of HPCC Systems (20)

Demystify OpenPOWER
Demystify OpenPOWERDemystify OpenPOWER
Demystify OpenPOWER
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
 
Sparc t4 systems customer presentation
Sparc t4 systems customer presentationSparc t4 systems customer presentation
Sparc t4 systems customer presentation
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Challenges in Embedded Computing
Challenges in Embedded ComputingChallenges in Embedded Computing
Challenges in Embedded Computing
 
Hyper threading
Hyper threadingHyper threading
Hyper threading
 
@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 

More from HPCC Systems

Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

OpenPOWER Acceleration of HPCC Systems

  • 1. OpenPOWER Acceleration of HPCC Systems 2015 LexisNexis® Risk Solutions HPCC Engineering Summit – Community Day Presenters: JT Kellington & Allan Cantle September 29, 2015 Delray Beach, FL Accelerating the Pace of Innovation
  • 2. Overview/Agenda • The OpenPOWER Story • Porting HPCC Systems to ppc64el • POWER8 at a Glance • Power 8 Data Centric Architectures with FPGA Acceleration • Summary 2
  • 4. OpenPOWER, a catalyst for Open Innovation 4 • Moore’s law no longer satisfies performance gain • Growing workload demands • Numerous IT consumption models • Mature Open software ecosystem Performance of POWER architecture amplified capability Open Development open software, open hardware Collaboration of thought leaders simultaneous innovation, multiple disciplines The OpenPOWER Foundation is an open development community, using the POWER Architecture to serve the evolving needs of customers. • Rich software ecosystem • Spectrum of power servers • Multiple hardware options • Derivative POWER chips Market Shifts New Open Innovation  145+ members investing on Power architecture  Geographic, sector, and expertise diversity  1,500 ISVs serving Linux on POWER, now with easier porting from x86!  Open from the chip through software  9 workgroups chartered, 15 hardware innovations revealed at March 2015 summit  Partner announced plans, offerings, deployments www.openpowerfoundation.org
  • 5. Porting HPCC Systems to ppc64el
  • 6. HPCC Systems and POWER8 Designed For Easy Porting HPCC Systems Framework • ECL is Platform Agnostic • Hardware and architecture changes transparent to end user • CMake, gcc, binutils, Node.js, etc. • Open Source! • Quick and easy collaboration POWER8 Designed for Open Innovation • Little Endian OS • More compatible with industry • IBM Software Development Kit for Linux on Power • Advance Toolchain • Post-Link Optimization • Linux performance analysis tools • Full support in Ubuntu, RHEL, and SLES 6
  • 7. Porting Details Initial port took less than a week! • Added ARCH defines for PPC64EL (system/include/platform.h) • Added stack description for POWER8 (ecl/hql/hqlstack.hpp) • Added binutils support for powerpc (ecl/hqlcpp/hqlres.cpp) 7 Deployment and testing took a bit longer • Couple of bugs found: • Stricter alignment for atomic operations • New compiler did not allow compare of “this” to NULL • Updates to Init system with latest Ubuntu • Monkey on the keyboard! • Basic setup and running was easy, but harder to go to the “next level” • Storage configuration, memory usage, thread support, etc.
  • 8. POWER8 at a Glance
  • 9. POWER8 Is Designed For Big Data! 9 Technology • 22 nm SOI, eDRAM, 15 ML 650 mm2 Caches • 512 KB SRAM L2 / core • 96 MB eDRAM shared L3 Memory • Up to 230 GB/s sustained bandwidth Bus Interfaces • Durable open memory attach interface • Up to 48x 48x Integrated PCI gen3/socket • SMP interconnect • CAPI Cores • 12 cores (SMT8) • 8 dispatch, 10 issue, 16 execution pipes • 2x internal data flows/queues • Enhanced prefetching • 64 KB data cache, 32 KB instruction cache Accelerators • Crypto and memory expansion • Transactional memory • VMM assist • Data move/VM mobility POWER8 Scale-Out Dual Chip Module Chip Interconnect CoreCoreCore L2L2L2 L3 Bank L3 Bank L3 Bank L3 Bank L3 Bank L3 Bank L2L2L2 Core Core Core Chip Interconnect Core Core Core L2 L2 L2 L2 L2 L2 L3 Bank L3 Bank L3 Bank L3 Bank L3 Bank L3 Bank Core Core Core MemoryBus MemoryBus SMPInterconnect SMPInterconnect SMPSMPCAPIPCIe SMPCAPIPCIeSMP
  • 10. POWER8 Introduces CAPI Technology 10 CAPP PCIe POWER8 Processor FPGA Accelerated Functional Unit (AFU) CAPI IBM Supplied POWER Service Layer Typical I/O Model Flow Flow with a Coherent Model Shared Mem. Notify Accelerator Acceleration Shared Memory Completion DD Call Copy or Pin Source Data MMIO Notify Accelerator Acceleration Poll / Int Completion Copy or Unpin Result Data Ret. From DD Completion Advantages of Coherent Attachment Over I/O Attachment  Virtual Addressing & Data Caching (significant latency reduction)  Easier, Natural Programming Model (avoid application restructuring)  Enables Apps Not Possible on I/O (Pointer chasing, shared mem semaphores, …)
  • 11. strategy ( ) CAPI Attached Flash Optimization • Attach IBM FlashSystem to POWER8 via CAPI • Read/write commands issued via APIs from applications to eliminate 97% of code path length • Saves 20-30 cores per 1M IOPS Pin buffers, Translate, Map DMA, Start I/O Application Read/Write Syscall Interrupt, unmap, unpin,Iodone scheduling 20K instructions reduced to <2000 Disk and Adapter DD strategy ( ) iodone ( ) FileSystem Application User Library Posix Async I/O Style API Shared Memory Work Queue aio_read() aio_write() iodone ( ) LVM
  • 12. CAPI Unlocks the Next Level of Performance for Flash Identical hardware with 2 different paths to data FlashSystem Conventional I/O (FC) CAPI 0 20,000 40,000 60,000 80,000 100,000 120,000 Conventional CAPI IOPS per Hardware Thread 0 50 100 150 200 250 300 350 400 450 500 Conventional CAPI Latency (microseconds) IBM POWER S822L >5x better IOPS per HW thread >2x lower latency
  • 13. POWER 8 Data Centric Architectures with FPGA Acceleration • Allan Cantle – President & Founder, Nallatech
  • 14. Nallatech at a glance Server qualified accelerator cards featuring FPGAs, network I/O and an open architecture software/firmware framework » Energy-efficient High Performance Heterogeneous Computing » Real-time, low latency network and I/O processing » Design Services for Application Porting » IBM Open Power partner » Altera OpenCL partner » Xilinx Alliance partner » 20 Year History of FPGA Acceleration including IBM System Z qualified & 1000+ Node deployments 14
  • 15. Heterogeneous Computing – Motivation • Efficient Evolutional Compute Performance • 1998 - DARPA Polymorphic Computing • 2003 – Power Wall GPGPUProcessors FPGA Processors Multicore Microprocessors Polymorphic Processor A processor that is equally efficient at processing all different data types Polymorphic Application Data Types SymbolicStreaming – SIMD - Vector (DSP) Bit Level SWEPTEfficiency Size,Weight,Energy,Performance,Time
  • 16. Heterogeneous Computing – Motivation • Efficient Evolutional Compute Performance • 1998 - DARPA Polymorphic Computing • 2003 – Power Wall GPGPUProcessors FPGA Processors Multicore Microprocessors Application Data Types SymbolicStreaming – SIMD - Vector (DSP) Bit Level SWEPTEfficiency Size,Weight,Energy,Performance,Time
  • 17. FPGA’s for “Software Accessible” Compute Acceleration • FPGAs are best processor for • “In the pipe” Stream Computing Operations • Highly parallel Bit & Byte level computation • FPGA are NOW programmable by software engineers with OpenCL! • Industry Standard Language managed by Khronos • Heterogeneous language of choice for all processor types • Low Level Explicitly Parallel language • Heterogeneous Support is Evolving 17
  • 18. Typical Thor & Roxie Cluster Node Configurations • Large Thor Scale Out Cluster = 400 Nodes • Large Roxie Scale Out Cluster = 100 Nodes Xeon E5-2670 V3 12 Cores Memory 16GB SAS RAID5 Controller PCIe L3Cache–30MB L2/Core–256KB L1/Core–64KB 1.2TB HDD Network Controller ~60GB/s PCIe 10GbE 200MB/s 1.2TB HDD 1.2TB HDD 140 IOPs 200MB/s 140 IOPs 200MB/s 140 IOPs Thor Xeon E5-2670 V3 12 Cores L3Cache–30MB L2/Core–256KB L1/core–64KB SATA RAID5 Controller 1TB SSD 560MB/s 1TB SSD 1TB SSD 100 KIOPs 560MB/s 100 KIOPs 560MB/s 100 KIOPs Roxie PCIe ~60GB/s ~60GB/s 2x QPI Links Memory 16GB 400MB/s 140 IOPs 1.1 GB/s 100 KIOPs = I/O Bus = Memory Bus
  • 19. Server Node 1 Server Node 400 Server Node 2 Server Node 399 400 Node THOR Cluster Hierarchical 400 Port Network Switch Disc Disc Disc Disc 400MB/s 140 IOPs 400MB/s 140 IOPs 400MB/s 140 IOPs 400MB/s 140 IOPs 1GB/s 1GB/s 1GB/s 1GB/s 1GB/s Server Node 1 Server Node 100 Server Node 2 Server Node 99 100 Node ROXIE Cluster Hierarchical 100 Port Network Switch SSD SSD SSD SSD 1.1GB/s 100 KIOPs 1.1GB/s 100 KIOPs 1.1GB/s 100 KIOPs 1.1GB/s 100 KIOPs 1GB/s 1GB/s 1GB/s 1GB/s 1GB/s • CPU intensive operations on fragmented storage data • Average 1GB/s to storage • Mapper type parallel operations • Aggregate 160GBytes/s Storage bandwidth • Aggregate 56K IOPs • CPU intensive operations on fragmented storage data • Average 1GB/s to storage • Pre-indexed eases limitation • Mapper type parallel operations • Aggregate 110GBytes/s Storage bandwidth • Aggregate 10M IOPs Typical Scale Out Thor & Roxie Cluster Configurations
  • 20. Memory Coherency & Data Movement Conflict • Mapper Type – Needle in a Haystack Operations • Little to no compute on lots of data • But hey, it’s not a problem for the Xeon so why worry? • Creates unnecessary traffic congestion across IO interfaces • I/O Management Consumes CPU Threads • Excessive Cache thrashing consumes memory bandwidth • E.g. only 8x 4K IOPs will fit in a 32K L1 Data Cache • Power & efficiency issues with all the unnecessary data movement 1x Network Controller 1x 10GbE Storage uP L3 L2 L1 Memory 1x 100x Data Movement example of CPU operating on large data files with mapper type functions The burden of using CPU centric architectures for Data Centric Problems = I/O Bus = Memory Bus
  • 21. 1x Network Controller 1x 10GbE Storage uP L3 L2 L1 Memory 1x 100x Example of CPU moving stored data to network for another CPU Memory Coherency & Data Movement Conflict • Processor Intensive Operations on fragmented data across cluster • CPU burdened by passing lots of fragmented data from storage to the network • Similarly CPU has to wait for fragmented data to arrive • Network Congestion & latency become big issues • Less CPU cycles & cache memory available for compute intensive operations The burden of using CPU centric architectures for Data Centric Problems = I/O Bus = Memory Bus
  • 22. Homogeneous Commodity Clusters still win out…right? • Maybe, but storage is getting a LOT faster with NVMe! Potential 72GB/s Network Controller PCIe 10GbE 24 NVMe Drives uP L3 L2 L1 Memory 60GB/s Example of CPU moving stored data to network for another CPU Increased Storage IO Bandwidth would saturate CPU’s available memory bandwidth & CPU centric architecture breaks down completely The burden of using CPU centric architectures for Data Centric Problems = I/O Bus = Memory Bus
  • 23. IBM initiated Data Centric Transformation with CAPI • CAPI = Coherent Accelerator Processor Interface • Transport over PCIe bus • IBM’s CAPI Flash Product • Make Flash memory appear as Block Memory, BM • FPGA provides “translation” BM to Flash commands • CPU threads for storage management reduced from 40-50 down to 2 • CAPI attached Network Adaptor • 3 Symmetrical MultiProcessing, SMP, Interconnects for Scale Up architectures Network Controller PCIe 10GbE 56TB Storage Power 8 L3 L2 L1 Memory 230GB/s SMP x3 FPGA 4GB/s CAPI CAPI 4GB/s IBM Standard Flash Drawer vs CAPI Flash Drawer = I/O Bus = Memory Bus
  • 24. Introducing In-Storage Acceleration over CAPI • Extend CAPI Flash Product Innovations • Increase CAPI Flash Bandwidth • FPGA Acceleration • Leverage NVMe IOPS & Sequential data rates • Data Compression, Encryption, Filtering • Bit/Byte Level Stream Computing • Very power efficient architecture 10GbE Power 8 L3 L2 L1 FPGA Acceleration 4GB/s CAPI CAPI Memory 230GB/s 56TB Storage Network Controller 16GB/s 96GB/s FPGA SMP x3 = Memory Bus
  • 25. P8 Node with CAPI Flash – Available Today P8 12 Core Socket 56TB 4GB/s FPGA 4GB/s 1 MIOPS P8 12 Core Socket 56TB 4GB/s FPGA 4GB/s 76.8GB/s 1TB Memory 230GB/s 1TB Memory 230GB/s 56TB 4GB/s FPGA 4GB/s 56TB 4GB/s FPGA 4GB/s 40/100GbE NIC 40/100GbE NIC 1 MIOPS 1 MIOPS 1 MIOPS Maximum Configuration Shown • Possible Configuration for Scale Out Roxie & Thor Clusters
  • 26. Scale Out P8 Node with FPGA accelerated CAPI Flash P8 12 Core Socket 56TB 4GB/s FPGA 1 MIOPS P8 12 Core Socket 56TB 4GB/s FPGA 76.8GB/s 1TB Memory 230GB/s 1TB Memory 230GB/s 56TB 4GB/s FPGA 56TB 4GB/s FPGA 40/100GbE NIC 40/100GbE NIC 1 MIOPS 1 MIOPS 1 MIOPS Maximum Configuration Shown 60GB/s 14.4 MIOPs 60GB/s 14.4 MIOPs 60GB/s 14.4 MIOPs 60GB/s 14.4 MIOPs • Possible Configuration for Scale Out Roxie & Thor Clusters
  • 27. Scale Up IBM E870/E880 with Scale out FPGA Acceleration P8 Socket 56TB 4GB/s FPGA 60GB/s 14.4 MIOPs P8 Socket 56TB 4GB/s FPGA 60GB/s 14.4 MIOPs P8 Socket56TB 4GB/s FPGA 60GB/s 14.4 MIOPs P8 Socket56TB 4GB/s FPGA 60GB/s 14.4 MIOPs 76.8GB/s 76.8GB/s 76.8GB/s 76.8GB/s 76.8GB/s 76.8GB/s 1TB Memory 230GB/s 1TB Memory 230GB/s 1TB Memory 230GB/s 1TB Memory 230GB/s • Balanced Scale Up & Scale Out Solution • Scale Up Processors ideal for Compute intensive operations • Scale Out FPGA Accelerated Storage, ideal for Parallel Mapper type operations
  • 28. P8 Socket 56TB FPGA P8 Socket 56TB FPGA P8 Socket56TB FPGA P8 Socket56TB FPGA 1TB Memory 1TB Memory 1TB Memory 1TB Memory Scale Up IBM E880 with Scale out FPGA Acceleration • 16 Way, 192 Core, Scale Up Compute with aggregate • 16TB Memory @ 3.68TBytes/s • 896TB Storage @ 960 GBytes/s
  • 29. In Summary: HPCC Systems and OpenPOWER Are Ready For Business! • IBM Power8’s Data Centric Architecture is ideal for Big Data Problems • FPGAs provide an ideal streaming, near storage, accelerator • Opportunity to re-architect HPCC’s ECL for Power 8 with FPGA Acceleration • Will provide a step change in performance at lower power & footprint • A community effort will help to accelerate the transition 29 Contact information: JT Kellington | Allan Cantle Phone: 512-286-6948 | 805 377 4993 Email: jwkellin@us.ibm.com | a.cantle@nallatech.com Web: www.ibm.com | www.nallatech.com