SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
The energy-efficient, high performance 64bit
DOME ”server - an industry inclination point
Ronald P. Luijten – Data Motion Architect
lui@zurich.ibm.com
IBM Research - Zurich
12 June 2015
DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.
COMPUTE is FREE – DATA is NOT
Ronald P. Luijten – Data Motion Architect
lui@zurich.ibm.com
IBM Research - Zurich
12 June 2015
DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.
DOME:
‱ ppp Astron, IBM, Dutch gvt
‱ 20MEur gvt funding over 5 years
‱ Started feb 2012
Ronald P. Luijten – Analyst Meeting – 12 June 2015 3
© 2012 IBM Corporation
SKA (Square Kilometer Array) to measure Big Bang
Picture source: NZZ march 2014
Big
Bang Inflation
Protons
created
Start of
nucleosynthesis
through fusion
End of nucleo-
synthesis Modern Universe
0 10-32s 10-6s 0.01s 3min 380’000 years 13.8 Billion years
Ronald P. Luijten – Analyst Meeting – 12 June 2015 4
DOME ”Server Motivation & Objectives
‱Create the worlds highest density 64 bit ”-server drawer
–Useful to evaluate both SKA radio-astronomy and IBM future business
–Platform for Business Analytics appliance pre-product research
–High energy efficiency / very low cost
–Commodity components, HW + SW standards based
–Leverage ‘free computing’ paradigm
–Enhance with ‘Value Add’: packaging, system integration, 

–Density and speed of light
‱Most efficient cooling using IBM technology
(ref: SuperMUC June 2012 TOP500 machine)
‱Must be true 64 bit to enable business applications
‱Must run server class OS (SLES11 or RHEL6, or equivalent)
–Precluded ARM (64-bit Silicon was not available)
–PPC64 is available in SoC from FSL since 2011
–(no $$$ to build a new SoC
)
‱This is the DOME project capability demonstrator – not a product
Ronald P. Luijten – Analyst Meeting – 12 June 2015 5
Definition
”Server:
The integration of an entire server node motherboard*
into a single microchip except DRAM, Nor-boot flash
and power conversion logic.
305mm
245mm
139mmx55mm
* no graphics
Ronald P. Luijten – Analyst Meeting – 12 June 2015 6
Definition
”Server:
The integration of an entire server node motherboard*
into a single microchip except DRAM, Nor-boot flash
and power conversion logic.
305mm
245mm
139mmx55mm
This does NOT imply low performance!
* no graphics
Ronald P. Luijten – Analyst Meeting – 12 June 2015 7
T4240 Chip Overview
12 core – fully dual threaded
1.8 GHz ppc64 (e6500)
12 DP-FPU; 12 128b Altivec
3 DDR3 channels at 1.86GT/s
3x 0.5MB L3 cache
4x 10GbE + 2x SATA
PCIe 3.0
HW packet acceleration
RegEx Pattern Match acc.
Crypto acceleration
28nm TSMC Bulk CMOS
239mm2 - ~1.7B transistors
111Mbit SRAM, 6M FF
7 Power states (2 power gating)
Ronald P. Luijten – Analyst Meeting – 12 June 2015 8
DOME compute node board diagram
T4240
16GB
DRAM
72bit
16GB
DRAM
72bit
PSoC
1Gbit SPI
flash
Power
converter
USB
JTAG
Serial
I2C
4 x
10 GbE
PCIe x8 2 x SATA
16GB
DRAM
72bit
1866 MT/s 1866 MT/s
1866 MT/s
1V / 40A
12V / 2.5A
Ronald P. Luijten – Analyst Meeting – 12 June 2015 9
DOME compute node board diagram
T4240
DRAM DRAM
PSoC
SPI
flash
Power
converter
USB
JTAG
Serial
I2C
4 x
10 GbE
PCIe x8 2 x SATA
DRAM
12V / 2.5A
PSOC collapses 6 functions into a small
chip to save Area, Power and Cost
1. On/Off and Power up sequencing
2. Provide uServer boot configuration
3. JTAG debug access
4. Serial port access (Linux)
5. Temperature monitoring and protection
‱ and current measurement
6. Management interface and control
Ronald P. Luijten – Analyst Meeting – 12 June 2015 10
133 mm
30 mm
Standard 240 pin DDR3
memory DIMM board
133 mm
55 mm
139 mm
P5020 SoC
P5020/P5040
(Generation 1)
T4240
Generation 2
(Lid Removed)
139 mm
DOME Compute node board form factor
55mm
FRONT
BACK
Decoupling
Capacitors
area
(lid removed)
T4240 SoC
Ronald P. Luijten – Analyst Meeting – 12 June 2015 11
Planned System: 2U rack unit
19” 2U Chassis w/ Combined Cooling & Power
128 compute node boards
1536 cores / 3072 Threads
6 TB DRAM
1.28Tbps Ethernet (@40Gbps)
Datacenter-in-a-box
‱ Expected 2U unit total power: ~ 6kW
‱ Integrated mains power converter to 12V distribution: 12V / 500A
‱ Each compute node has own 12V / 40W converter
‱ Common Power Converter boards for all other supplies
‱ High radix 10GbE / 40GbE switch boards (under construction)
‱ Connects to Mains, Rack level Water, 32x 40Gbps Ethernet
‱ Hot-water cooled for efficiency and density
Ronald P. Luijten – Analyst Meeting – 12 June 2015 12
Cooling variant
Inlet
water
[C]
Junction
temp Tj
[C]
Measured
thermal Res.
[K/W]
Maximum cooling
capacity
[W]
OF R240 Cu, no heat pipe 45 85 1.11 36
OF R240 Cu with heat pipe 45 85 0.85 47
OF R240 Cu with heat pipe 45 75 0.85 36
Electrical +
Thermal
Interface
Water In
Water In
Compute Nodes
3 layer
Laminated
Copper Plate
FR4 Carrier
SoC
Node Cooling Design & Validation
Power converter boards
Storage boards
CeBIT Demo, april 15
Ronald P. Luijten – Analyst Meeting – 12 June 2015 13
Performance Measurement Results
CPU Freescale T4240
12 cores; 24 thr.
28nm Bulk
Intel Xeon E3-1230L v3
4 cores; 8 threads
22nm FinFet
CPU2006 Benchmark
Test Environment
System: T4240RDB-PB
1.666 GHz core clock,
1.866 GT/s 6GB DRAM, 3 channels
Fedora 20, Kernel 3.12.19
GCC 4.7.2
gcc options: -O3 -mcpu=powerpc64
System: Supermicro X10SAE
1.8 GHz core clock; Turbo disabled
1.666 GT/s 8 GB DRAM, 2 channels
Fedora 19, Kernel 3.13.9
GCC 4.8.2
gcc options: -O3 -march=native -mtune=native
CINT-base – 1 thread
6.86 20.7
CINT-base – all threads 109.34 (24 threads) 77.6 (8 threads)
Coremark - all threads 188K (24 threads) 65K (8 threads)
Ronald P. Luijten – Analyst Meeting – 12 June 2015 14
Performance Measurement Results
CPU Freescale T4240
12 cores; 24 thr.
28nm Bulk
Intel Xeon E3-1230L v3
4 cores; 8 threads
22nm FinFet
CPU2006 Benchmark
Test Environment
System: T4240RDB-PB
1.666 GHz core clock,
1.866 GT/s 6GB DRAM, 3 channels
Fedora 20, Kernel 3.12.19
GCC 4.7.2
gcc options: -O3 -mcpu=powerpc64
System: Supermicro X10SAE
1.8 GHz core clock; Turbo disabled
1.666 GT/s 8 GB DRAM, 2 channels
Fedora 19, Kernel 3.13.9
GCC 4.8.2
gcc options: -O3 -march=native -mtune=native
CINT-base – 1 thread
6.86 20.7
CINT-base – all threads 109.34 (24 threads) 77.6 (8 threads)
Coremark - all threads 188K (24 threads) 65K (8 threads)
40% more performance @ 70% of node level energy
consumption 2x more operations per Watt
Ronald P. Luijten – Analyst Meeting – 12 June 2015 15
Power Measurement Results
Power measurement on rev 1 board #5, on 7 + 8 april 2015; PSoC firmware 2-mar-15
current measurements at 12V input of power converters, T4240 temp < 65C
voltage domain
current measured @ 12V input
condition mA W mA W A W W
PSOC only power 3.4 0.0408 74 0.888 0.0008 0.0096 0.9384
T4240 power on, kept in reset 75 0.9 152 1.824 0.32 3.84 6.564
u-boot prompt (idle) 77.6 0.9312 350 4.2 1.48 17.76 22.8912
Linux prompt, idle system 77.6 0.9312 315 3.78 1.58 18.96 23.6712
BW_MEM512M, 24 thr 77.3 0.9276 450 5.4 1.65 19.8 26.1276
stream, 24 thread 77.3 0.9276 470 5.64 1.65 19.8 26.3676
BW_MEM512, 24 thr 77.7 0.9324 320 3.84 2.53 30.36 35.1324
idle at XCFE desktop 77.7 0.9324 320 3.84 1.6 19.2 23.9724
SpecInt PerlBench, 24 thr 77.8 0.9336 400 4.8 2.63 31.56 37.2936
SpecInt PerlBench, 12 thr 78 0.936 355 4.26 2.2 26.4 31.596
SpecInt gcc, 12 thr 78 0.936 416 4.992 1.7 20.4 26.328
total
node
1V0 coreDRAM1V8 I/O
Ronald P. Luijten – Analyst Meeting – 12 June 2015 16
Remarks
New Big-Data Metric: Memory BW density
use raw memory BW available at SoC or CPU
divide by volume of entire enclosure, incl. HDD, PCI slots
DOME 128node 2U rack unit: 159GB/s/Liter (peak)
P8 server S822L (dual socket): 13.9GB/s/Liter (peak)
‱ New era – perfect storm and Innovators Dilemma
‱ ”Server is all about SoC and packaging
‱ This is a serendipitous data point
Ronald P. Luijten – Analyst Meeting – 12 June 2015 17
Status and Plans
Until YE 2015
2016: a new compute node
Beyond 2016
Ronald P. Luijten – Analyst Meeting – 12 June 2015 18
LIVE DEMO
We demonstrate a
single node running:
‱ Fedora 20
‱ XFCE Desktop
‱ Stream
‱ CPMD
‱ And
 live 1V domain
current measurement
We show a revision-1 board T4240ZMS compute server:
‱ Larger than DOME form factor, same netlist
‱ All components on top side (save bring-up time and expense)
‱ Air-cooled for single node operation
compute node mini BaseBoard
Ronald P. Luijten – Analyst Meeting – 12 June 2015 19
T4240ZMS rev 1 board
Single node baseboard
GbE
SATA (mSata slot, SATA data connector)
Various power supplies
USB
uSD card
mSATAPHY
POWER
SUPPLIES
DEMO SETUP
Ronald P. Luijten – Analyst Meeting – 12 June 2015 20
R.Luijten, 8 Jan 2015
88E1111
PHY
192.168.1.152 NFS server 192.168.1.1 DHCP / TFTP server
T4240
24 HW thread
1625MHz
DRAM
8GB
1500MT
DRAM
8GB
1500MT
PSoC
SPI
flash
Power
converter
USB
JTAG
Serial
I2C
1GbE
DRAM
8GB
1500MT
T4240ZMS node:
-Revision 1 board
-slower speeds
-less memory
DIMM connector
A192.168.1.240
Management console
(current on 1V supply)
Serial console
VNC client into ZMS
Showing virtual desktop
Browser with DB2 / WMD
DEMO SETUP
SKA: http://www.skatelescope.org
DOME: http://www.dome-exascale.nl
”Server: http://www.zurich.ibm.com/microserver
T4240 system: http://swissdutch.ch:6999
Wikipedia: https://en.wikipedia.org/wiki/Microserver
Twitter: https://twitter.com/ronaldgadget
Videos:
Impossible ”Server: http://t.co/4vEkEVEazO
Innovators Dilemma: http://youtu.be/imweQe8NgnI
DOME T4240 Fedora: http://youtu.be/D6da5DqcyQk
4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC
for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density
© 2015 IEEE
International Solid-State Circuits Conference 22 of 15
Links
Ronald P. Luijten – Analyst Meeting – 12 June 2015 22
Acknowledgements
This work is the results of many people
‱ Peter v. Ackeren, FSL
‱ Ed Swarthout, FSL Austin
‱ Dac Pham, FSL Austin
‱ Yvonne Chan, IBM Toronto
‱ Andreas Doering, IBM ZRL
‱ Alessandro Curioni, IBM ZRL
‱ Stephan Paredes, IBM ZRL
‱ Matteo Cossale, IBM ZRL
‱ James Nigel, FSL
‱ Boris Bialek, IBM Toronto
‱ Marco de Vos, Astron NL
‱ Vipin Patel, IBM Fishkill
‱ And many more remain unnamed
.
Companies: FSL Austin, Belgium & Germany; IBM worldwide; Transfer - NL
Ronald P. Luijten – Analyst Meeting – 12 June 2015 23
“Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm
Bulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory Bandwidth
System Density”, R.Luijten et al., ISSCC15, San Francisco, Feb 2015
“The DOME embedded 64 bit microserver demonstrator”, R. Luijten and A. Doering,
ICICDT 2013, Pavia, Italy, May 2013
“Quantitative Analysis of the Berkeley Dwarfs' Parallelism and Data Movement
Properties”, Victoria Caparros Cabezas, Phillip Stanley-Marbell, ACM CF 2011, May
2011
“Performance, Power, and Thermal Analysis of Low-Power Processors for Scale-
Out Systems”, Phillip Stanley-Marbell, Victoria Caparros Cabezas, IEEE HPPAC 2011,
May 2011
“Pinned to the Walls—Impact of Packaging and Application Properties on the
Memory and Power Walls”, Phillip Stanley-Marbell, Victoria Caparros Cabezas,
Ronald P. Luijten, IEEE ISLPED 2011, Aug 2011.
4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC
for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density
© 2015 IEEE
International Solid-State Circuits Conference 24 of 15
Literature
Ronald P. Luijten – Analyst Meeting – 12 June 2015 24
Questions???
PS. I like lightweight things
”Server website: www.swissdutch.ch
Ronald P. Luijten – Analyst Meeting – 12 June 2015 25

Mais conteĂșdo relacionado

Mais procurados

SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
Volodymyr Saviak
 

Mais procurados (20)

A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
 
Bullx HPC eXtreme computing cluster references
Bullx HPC eXtreme computing cluster referencesBullx HPC eXtreme computing cluster references
Bullx HPC eXtreme computing cluster references
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Deep Learning on the SaturnV Cluster
Deep Learning on the SaturnV ClusterDeep Learning on the SaturnV Cluster
Deep Learning on the SaturnV Cluster
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Stig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerStig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputer
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPC
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
Sierra overview
Sierra overviewSierra overview
Sierra overview
 
OpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software StackOpenHPC: A Comprehensive System Software Stack
OpenHPC: A Comprehensive System Software Stack
 
IEEE CloudCom 2014ć‚ćŠ ć ±ć‘Š
IEEE CloudCom 2014ć‚ćŠ ć ±ć‘ŠIEEE CloudCom 2014ć‚ćŠ ć ±ć‘Š
IEEE CloudCom 2014ć‚ćŠ ć ±ć‘Š
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
 
A Fresh Look at HPC from Huawei Enterprise
A Fresh Look at HPC from Huawei EnterpriseA Fresh Look at HPC from Huawei Enterprise
A Fresh Look at HPC from Huawei Enterprise
 

Destaque (8)

Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Presentation dell - into the cloud with dell
Presentation   dell - into the cloud with dellPresentation   dell - into the cloud with dell
Presentation dell - into the cloud with dell
 
Citrix reference architecture for xen mobile 8 5_july2013
Citrix reference architecture for xen mobile 8 5_july2013Citrix reference architecture for xen mobile 8 5_july2013
Citrix reference architecture for xen mobile 8 5_july2013
 
HP Micro Server remote access card user manual
HP Micro Server remote access card user manualHP Micro Server remote access card user manual
HP Micro Server remote access card user manual
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Micro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectMicro Server Design - Open Compute Project
Micro Server Design - Open Compute Project
 
Power Edge Portfolio
Power Edge PortfolioPower Edge Portfolio
Power Edge Portfolio
 
Wireless Microserver
Wireless MicroserverWireless Microserver
Wireless Microserver
 

Semelhante a IBM/ASTRON DOME 64-bit Hot Water Cooled Microserver

Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
jade_22
 

Semelhante a IBM/ASTRON DOME 64-bit Hot Water Cooled Microserver (20)

DOME 64-bit ÎŒDataCenter
DOME 64-bit ÎŒDataCenterDOME 64-bit ÎŒDataCenter
DOME 64-bit ÎŒDataCenter
 
Lenovo HPC Strategy Update
Lenovo HPC Strategy UpdateLenovo HPC Strategy Update
Lenovo HPC Strategy Update
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
Introducing the CrossLink Programmable ASSP
Introducing the CrossLink Programmable ASSPIntroducing the CrossLink Programmable ASSP
Introducing the CrossLink Programmable ASSP
 
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC: Energy Efficiency and Water-Cool-Technology InnovationsLenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
Lenovo HPC: Energy Efficiency and Water-Cool-Technology Innovations
 
Poc exadata pres_doag_2015
Poc exadata pres_doag_2015Poc exadata pres_doag_2015
Poc exadata pres_doag_2015
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Precision Timing for KPI Measurements
Precision Timing for KPI MeasurementsPrecision Timing for KPI Measurements
Precision Timing for KPI Measurements
 
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with UnivaNVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
 
Latest HPC News from NVIDIA
Latest HPC News from NVIDIALatest HPC News from NVIDIA
Latest HPC News from NVIDIA
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
 
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel Queries
Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel QueriesChristo Kutrovsky - Maximize Data Warehouse Performance with Parallel Queries
Christo Kutrovsky - Maximize Data Warehouse Performance with Parallel Queries
 
Scaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million CoresScaling Green Instrumentation to more than 10 Million Cores
Scaling Green Instrumentation to more than 10 Million Cores
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
 
Flash memory summit enterprise udate 2019
Flash memory summit enterprise udate 2019Flash memory summit enterprise udate 2019
Flash memory summit enterprise udate 2019
 

Mais de IBM Research

The New Era of Cognitive Computing
The New Era of Cognitive ComputingThe New Era of Cognitive Computing
The New Era of Cognitive Computing
IBM Research
 

Mais de IBM Research (10)

IBM Research - Zurich Celebrates 60 Years of Science and Innovation
IBM Research - Zurich Celebrates 60 Years of Science and InnovationIBM Research - Zurich Celebrates 60 Years of Science and Innovation
IBM Research - Zurich Celebrates 60 Years of Science and Innovation
 
The Dilemmas of Innovation Management
The Dilemmas of Innovation ManagementThe Dilemmas of Innovation Management
The Dilemmas of Innovation Management
 
A Prototype Storage Subsystem based on Phase Change Memory
A Prototype Storage Subsystem based on Phase Change MemoryA Prototype Storage Subsystem based on Phase Change Memory
A Prototype Storage Subsystem based on Phase Change Memory
 
Big Data and the Future of Storage
Big Data and the Future of StorageBig Data and the Future of Storage
Big Data and the Future of Storage
 
The New Era of Cognitive Computing
The New Era of Cognitive ComputingThe New Era of Cognitive Computing
The New Era of Cognitive Computing
 
Das IBM Forschungslabor als Arbeitgeber
Das IBM Forschungslabor als ArbeitgeberDas IBM Forschungslabor als Arbeitgeber
Das IBM Forschungslabor als Arbeitgeber
 
Nano, SuperMUC and Photovoltaics:A Day in the Life of IBM Research - Zurich
Nano, SuperMUC and Photovoltaics:A Day in the Life of IBM Research - ZurichNano, SuperMUC and Photovoltaics:A Day in the Life of IBM Research - Zurich
Nano, SuperMUC and Photovoltaics:A Day in the Life of IBM Research - Zurich
 
Meet IBM Research
Meet IBM ResearchMeet IBM Research
Meet IBM Research
 
Dechema Conference: Istanbul
Dechema Conference: IstanbulDechema Conference: Istanbul
Dechema Conference: Istanbul
 
IBM Research: IBM 2010 Investor Briefing
IBM Research: IBM 2010 Investor BriefingIBM Research: IBM 2010 Investor Briefing
IBM Research: IBM 2010 Investor Briefing
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

IBM/ASTRON DOME 64-bit Hot Water Cooled Microserver

  • 1. The energy-efficient, high performance 64bit DOME ”server - an industry inclination point Ronald P. Luijten – Data Motion Architect lui@zurich.ibm.com IBM Research - Zurich 12 June 2015 DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.
  • 2. COMPUTE is FREE – DATA is NOT Ronald P. Luijten – Data Motion Architect lui@zurich.ibm.com IBM Research - Zurich 12 June 2015 DISCLAIMER: This presentation is entirely Ronald’s view and not necessarily that of IBM.
  • 3. DOME: ‱ ppp Astron, IBM, Dutch gvt ‱ 20MEur gvt funding over 5 years ‱ Started feb 2012 Ronald P. Luijten – Analyst Meeting – 12 June 2015 3
  • 4. © 2012 IBM Corporation SKA (Square Kilometer Array) to measure Big Bang Picture source: NZZ march 2014 Big Bang Inflation Protons created Start of nucleosynthesis through fusion End of nucleo- synthesis Modern Universe 0 10-32s 10-6s 0.01s 3min 380’000 years 13.8 Billion years Ronald P. Luijten – Analyst Meeting – 12 June 2015 4
  • 5. DOME ”Server Motivation & Objectives ‱Create the worlds highest density 64 bit ”-server drawer –Useful to evaluate both SKA radio-astronomy and IBM future business –Platform for Business Analytics appliance pre-product research –High energy efficiency / very low cost –Commodity components, HW + SW standards based –Leverage ‘free computing’ paradigm –Enhance with ‘Value Add’: packaging, system integration, 
 –Density and speed of light ‱Most efficient cooling using IBM technology (ref: SuperMUC June 2012 TOP500 machine) ‱Must be true 64 bit to enable business applications ‱Must run server class OS (SLES11 or RHEL6, or equivalent) –Precluded ARM (64-bit Silicon was not available) –PPC64 is available in SoC from FSL since 2011 –(no $$$ to build a new SoC
) ‱This is the DOME project capability demonstrator – not a product Ronald P. Luijten – Analyst Meeting – 12 June 2015 5
  • 6. Definition ”Server: The integration of an entire server node motherboard* into a single microchip except DRAM, Nor-boot flash and power conversion logic. 305mm 245mm 139mmx55mm * no graphics Ronald P. Luijten – Analyst Meeting – 12 June 2015 6
  • 7. Definition ”Server: The integration of an entire server node motherboard* into a single microchip except DRAM, Nor-boot flash and power conversion logic. 305mm 245mm 139mmx55mm This does NOT imply low performance! * no graphics Ronald P. Luijten – Analyst Meeting – 12 June 2015 7
  • 8. T4240 Chip Overview 12 core – fully dual threaded 1.8 GHz ppc64 (e6500) 12 DP-FPU; 12 128b Altivec 3 DDR3 channels at 1.86GT/s 3x 0.5MB L3 cache 4x 10GbE + 2x SATA PCIe 3.0 HW packet acceleration RegEx Pattern Match acc. Crypto acceleration 28nm TSMC Bulk CMOS 239mm2 - ~1.7B transistors 111Mbit SRAM, 6M FF 7 Power states (2 power gating) Ronald P. Luijten – Analyst Meeting – 12 June 2015 8
  • 9. DOME compute node board diagram T4240 16GB DRAM 72bit 16GB DRAM 72bit PSoC 1Gbit SPI flash Power converter USB JTAG Serial I2C 4 x 10 GbE PCIe x8 2 x SATA 16GB DRAM 72bit 1866 MT/s 1866 MT/s 1866 MT/s 1V / 40A 12V / 2.5A Ronald P. Luijten – Analyst Meeting – 12 June 2015 9
  • 10. DOME compute node board diagram T4240 DRAM DRAM PSoC SPI flash Power converter USB JTAG Serial I2C 4 x 10 GbE PCIe x8 2 x SATA DRAM 12V / 2.5A PSOC collapses 6 functions into a small chip to save Area, Power and Cost 1. On/Off and Power up sequencing 2. Provide uServer boot configuration 3. JTAG debug access 4. Serial port access (Linux) 5. Temperature monitoring and protection ‱ and current measurement 6. Management interface and control Ronald P. Luijten – Analyst Meeting – 12 June 2015 10
  • 11. 133 mm 30 mm Standard 240 pin DDR3 memory DIMM board 133 mm 55 mm 139 mm P5020 SoC P5020/P5040 (Generation 1) T4240 Generation 2 (Lid Removed) 139 mm DOME Compute node board form factor 55mm FRONT BACK Decoupling Capacitors area (lid removed) T4240 SoC Ronald P. Luijten – Analyst Meeting – 12 June 2015 11
  • 12. Planned System: 2U rack unit 19” 2U Chassis w/ Combined Cooling & Power 128 compute node boards 1536 cores / 3072 Threads 6 TB DRAM 1.28Tbps Ethernet (@40Gbps) Datacenter-in-a-box ‱ Expected 2U unit total power: ~ 6kW ‱ Integrated mains power converter to 12V distribution: 12V / 500A ‱ Each compute node has own 12V / 40W converter ‱ Common Power Converter boards for all other supplies ‱ High radix 10GbE / 40GbE switch boards (under construction) ‱ Connects to Mains, Rack level Water, 32x 40Gbps Ethernet ‱ Hot-water cooled for efficiency and density Ronald P. Luijten – Analyst Meeting – 12 June 2015 12
  • 13. Cooling variant Inlet water [C] Junction temp Tj [C] Measured thermal Res. [K/W] Maximum cooling capacity [W] OF R240 Cu, no heat pipe 45 85 1.11 36 OF R240 Cu with heat pipe 45 85 0.85 47 OF R240 Cu with heat pipe 45 75 0.85 36 Electrical + Thermal Interface Water In Water In Compute Nodes 3 layer Laminated Copper Plate FR4 Carrier SoC Node Cooling Design & Validation Power converter boards Storage boards CeBIT Demo, april 15 Ronald P. Luijten – Analyst Meeting – 12 June 2015 13
  • 14. Performance Measurement Results CPU Freescale T4240 12 cores; 24 thr. 28nm Bulk Intel Xeon E3-1230L v3 4 cores; 8 threads 22nm FinFet CPU2006 Benchmark Test Environment System: T4240RDB-PB 1.666 GHz core clock, 1.866 GT/s 6GB DRAM, 3 channels Fedora 20, Kernel 3.12.19 GCC 4.7.2 gcc options: -O3 -mcpu=powerpc64 System: Supermicro X10SAE 1.8 GHz core clock; Turbo disabled 1.666 GT/s 8 GB DRAM, 2 channels Fedora 19, Kernel 3.13.9 GCC 4.8.2 gcc options: -O3 -march=native -mtune=native CINT-base – 1 thread 6.86 20.7 CINT-base – all threads 109.34 (24 threads) 77.6 (8 threads) Coremark - all threads 188K (24 threads) 65K (8 threads) Ronald P. Luijten – Analyst Meeting – 12 June 2015 14
  • 15. Performance Measurement Results CPU Freescale T4240 12 cores; 24 thr. 28nm Bulk Intel Xeon E3-1230L v3 4 cores; 8 threads 22nm FinFet CPU2006 Benchmark Test Environment System: T4240RDB-PB 1.666 GHz core clock, 1.866 GT/s 6GB DRAM, 3 channels Fedora 20, Kernel 3.12.19 GCC 4.7.2 gcc options: -O3 -mcpu=powerpc64 System: Supermicro X10SAE 1.8 GHz core clock; Turbo disabled 1.666 GT/s 8 GB DRAM, 2 channels Fedora 19, Kernel 3.13.9 GCC 4.8.2 gcc options: -O3 -march=native -mtune=native CINT-base – 1 thread 6.86 20.7 CINT-base – all threads 109.34 (24 threads) 77.6 (8 threads) Coremark - all threads 188K (24 threads) 65K (8 threads) 40% more performance @ 70% of node level energy consumption 2x more operations per Watt Ronald P. Luijten – Analyst Meeting – 12 June 2015 15
  • 16. Power Measurement Results Power measurement on rev 1 board #5, on 7 + 8 april 2015; PSoC firmware 2-mar-15 current measurements at 12V input of power converters, T4240 temp < 65C voltage domain current measured @ 12V input condition mA W mA W A W W PSOC only power 3.4 0.0408 74 0.888 0.0008 0.0096 0.9384 T4240 power on, kept in reset 75 0.9 152 1.824 0.32 3.84 6.564 u-boot prompt (idle) 77.6 0.9312 350 4.2 1.48 17.76 22.8912 Linux prompt, idle system 77.6 0.9312 315 3.78 1.58 18.96 23.6712 BW_MEM512M, 24 thr 77.3 0.9276 450 5.4 1.65 19.8 26.1276 stream, 24 thread 77.3 0.9276 470 5.64 1.65 19.8 26.3676 BW_MEM512, 24 thr 77.7 0.9324 320 3.84 2.53 30.36 35.1324 idle at XCFE desktop 77.7 0.9324 320 3.84 1.6 19.2 23.9724 SpecInt PerlBench, 24 thr 77.8 0.9336 400 4.8 2.63 31.56 37.2936 SpecInt PerlBench, 12 thr 78 0.936 355 4.26 2.2 26.4 31.596 SpecInt gcc, 12 thr 78 0.936 416 4.992 1.7 20.4 26.328 total node 1V0 coreDRAM1V8 I/O Ronald P. Luijten – Analyst Meeting – 12 June 2015 16
  • 17. Remarks New Big-Data Metric: Memory BW density use raw memory BW available at SoC or CPU divide by volume of entire enclosure, incl. HDD, PCI slots DOME 128node 2U rack unit: 159GB/s/Liter (peak) P8 server S822L (dual socket): 13.9GB/s/Liter (peak) ‱ New era – perfect storm and Innovators Dilemma ‱ ”Server is all about SoC and packaging ‱ This is a serendipitous data point Ronald P. Luijten – Analyst Meeting – 12 June 2015 17
  • 18. Status and Plans Until YE 2015 2016: a new compute node Beyond 2016 Ronald P. Luijten – Analyst Meeting – 12 June 2015 18
  • 19. LIVE DEMO We demonstrate a single node running: ‱ Fedora 20 ‱ XFCE Desktop ‱ Stream ‱ CPMD ‱ And
 live 1V domain current measurement We show a revision-1 board T4240ZMS compute server: ‱ Larger than DOME form factor, same netlist ‱ All components on top side (save bring-up time and expense) ‱ Air-cooled for single node operation compute node mini BaseBoard Ronald P. Luijten – Analyst Meeting – 12 June 2015 19
  • 20. T4240ZMS rev 1 board Single node baseboard GbE SATA (mSata slot, SATA data connector) Various power supplies USB uSD card mSATAPHY POWER SUPPLIES DEMO SETUP Ronald P. Luijten – Analyst Meeting – 12 June 2015 20
  • 21. R.Luijten, 8 Jan 2015 88E1111 PHY 192.168.1.152 NFS server 192.168.1.1 DHCP / TFTP server T4240 24 HW thread 1625MHz DRAM 8GB 1500MT DRAM 8GB 1500MT PSoC SPI flash Power converter USB JTAG Serial I2C 1GbE DRAM 8GB 1500MT T4240ZMS node: -Revision 1 board -slower speeds -less memory DIMM connector A192.168.1.240 Management console (current on 1V supply) Serial console VNC client into ZMS Showing virtual desktop Browser with DB2 / WMD DEMO SETUP
  • 22. SKA: http://www.skatelescope.org DOME: http://www.dome-exascale.nl ”Server: http://www.zurich.ibm.com/microserver T4240 system: http://swissdutch.ch:6999 Wikipedia: https://en.wikipedia.org/wiki/Microserver Twitter: https://twitter.com/ronaldgadget Videos: Impossible ”Server: http://t.co/4vEkEVEazO Innovators Dilemma: http://youtu.be/imweQe8NgnI DOME T4240 Fedora: http://youtu.be/D6da5DqcyQk 4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density © 2015 IEEE International Solid-State Circuits Conference 22 of 15 Links Ronald P. Luijten – Analyst Meeting – 12 June 2015 22
  • 23. Acknowledgements This work is the results of many people ‱ Peter v. Ackeren, FSL ‱ Ed Swarthout, FSL Austin ‱ Dac Pham, FSL Austin ‱ Yvonne Chan, IBM Toronto ‱ Andreas Doering, IBM ZRL ‱ Alessandro Curioni, IBM ZRL ‱ Stephan Paredes, IBM ZRL ‱ Matteo Cossale, IBM ZRL ‱ James Nigel, FSL ‱ Boris Bialek, IBM Toronto ‱ Marco de Vos, Astron NL ‱ Vipin Patel, IBM Fishkill ‱ And many more remain unnamed
. Companies: FSL Austin, Belgium & Germany; IBM worldwide; Transfer - NL Ronald P. Luijten – Analyst Meeting – 12 June 2015 23
  • 24. “Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density”, R.Luijten et al., ISSCC15, San Francisco, Feb 2015 “The DOME embedded 64 bit microserver demonstrator”, R. Luijten and A. Doering, ICICDT 2013, Pavia, Italy, May 2013 “Quantitative Analysis of the Berkeley Dwarfs' Parallelism and Data Movement Properties”, Victoria Caparros Cabezas, Phillip Stanley-Marbell, ACM CF 2011, May 2011 “Performance, Power, and Thermal Analysis of Low-Power Processors for Scale- Out Systems”, Phillip Stanley-Marbell, Victoria Caparros Cabezas, IEEE HPPAC 2011, May 2011 “Pinned to the Walls—Impact of Packaging and Application Properties on the Memory and Power Walls”, Phillip Stanley-Marbell, Victoria Caparros Cabezas, Ronald P. Luijten, IEEE ISLPED 2011, Aug 2011. 4.4: Energy-Efficient Microserver Based on a 12-Core 1.8GHz 188K-CoreMark 28nm Bulk CMOS 64b SoC for Big-Data Applications with 159GB/s/L Memory Bandwidth System Density © 2015 IEEE International Solid-State Circuits Conference 24 of 15 Literature Ronald P. Luijten – Analyst Meeting – 12 June 2015 24
  • 25. Questions??? PS. I like lightweight things ”Server website: www.swissdutch.ch Ronald P. Luijten – Analyst Meeting – 12 June 2015 25