SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Intel® Xeon Phi™ coprocessor
(codename Knights Corner)


     George Chrysos
     Senior Principal Engineer
     Hot Chips, August 28, 2012




                                  More on Twitter @IntelITS
Legal Disclaimers
Copyright © 2012 Intel Corporation. All rights reserved.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.
EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF
INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU
SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND
REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL
OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall
have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm%20
Intel, the Intel logo, Xeon, Intel Core and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries Other names and brands may be claimed as the property of others.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and
functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products.
For more complete information about performance and benchmark results, visit Performance Test Disclosure
This document contains information on products in the design phase of development.
All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides
for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
WARNING: Altering clock frequency and/or voltage may: (i) reduce system stability and useful life of the system and processor; (ii) cause the processor and other system components to fail; (iii) cause reductions in system performance; (iv) cause additional heat or other
damage; and (v) affect system data integrity. Intel has not tested, and does not warranty, the operation of the processor beyond its specif ications. Intel assumes no responsibility that the processor, including if used with altered clock frequencies and/or voltages, will be fit
for any particular purpose. For more information, visit Overclocking Intel Processors
Warning: Altering PC memory frequency and/or voltage may (i) reduce system stability and use life of the system, memory and processor; (ii) cause the processor and other system components to fail; (iii) cause reductions in system performance; (iv) cause additional
heat or other damage; and (v) affect system data integrity. Intel assumes no responsibility that the memory, included if used with altered clock frequencies and/or voltages, will be fit for any particular purpose. Check with memory manufacturer for warranty and additional
details
Available  on  select  Intel®  Core™  Intel®  Xeon®  and  Intel®  Xeon  Phi™  processors.  Requires  an  Intel®  HT  Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information
including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading.
Requires a system with a 64-bit enabled processor, chipset, BIOS and software. Performance will vary depending on the specific hardware and software you use. Consult your PC manufacturer for more information. For more information, visit
http://www.intel.com/info/em64t
Requires a system with Intel® Turbo Boost Technology. Intel Turbo Boost Technology and Intel Turbo Boost Technology 2.0 are only available on select Intel® processors. Consult your PC manufacturer. Performance varies depending on hardware, software, and
system configuration. For more information, visit http://www.intel.com/go/turbo
ENERGY STAR is a system-level energy specification, defined by the Environmental Protection Agency, that relies on all system components, such as processor, chipset, power supply, etc.) For more information, visit http://www.intel.com/technology/epa/index.html
Intel® Many Integrated Core (Intel MIC) Architecture

                                 Targeted at highly parallel HPC workloads
                                          • Physics, Chemistry, Biology, Financial Services

                                 Power efficient cores, support for parallelism
                                          • Cores: less speculation, threads, wider SIMD
                                          • Scalability: high BW on die interconnect and memory


                                 General Purpose Programming Environment
                                          • Runs Linux (full service, open source OS)
                                          • Runs applications written in Fortran, C, C++, …
                                          • Supports X86 memory model, IEEE 754
                                          • x86 collateral (libraries, compilers, Intel® VTune™ debuggers, etc)

3   Visual and Parallel Computing Group                         Copyright © 2012 Intel Corporation. All rights reserved.
Knights Corner Coprocessor

                                                 KNC Card
                                                      KNC Card
                                          TCP/IP
                                          PC e x16
                                                                        GDDR5
                                                                        Channel               …             GDDR5
                                                                                                            Channel
          Intel® Xeon®
           Processor                       PCIe x16




                                                                                                                            Channel
                                                                                                                            GDDR5
                                                                            KN50 Cores
                                                                             >




                                                                                                                            …
                                                                               KN
                                                                                    Linux OS




                                                                                                                            Channel
                                                                                                                            GDDR5
          System Memory
                                                                        GDDR5
                                                                        Channel               …             GDDR5
                                                                                                            Channel


                                                                 >= 8GB GDDR5 memory



4   Visual and Parallel Computing Group                          Copyright © 2012 Intel Corporation. All rights reserved.
Knights Corner – Power Efficient
                                     Performance per Watt of a prototype Knights Corner Cluster
                                         compared to the 2 Top Graphics Accelerated Clusters
                                                           1381                                              1380
                                                                                                                                     1266
                           1400

                           1200
             MFLOPS/Watt




                           1000

                           800

                           600
                                                               +                                               +                      +
                           400

                           200

                              0
                                                  Intel Corp                            Nagasaki Univ.                        Barcelona
                                                                                                                              Supercomputing Center
                                                  Knights Corner                        ATI Radeon                            Nvidia Tesla 2090
    Higher is Better Source: www.green500.org
                                                  Top500 #150                           Top500 #456                           Top500 #177
                                                  72.9 kW                               47 kW                                 81.5 kW
5   Visual and Parallel Computing Group                            Copyright © 2012 Intel Corporation. All rights reserved.
Knights Corner Micro-architecture


                                                      Core            Core                         Core                 Core
                                              PCIe
                                             Client   L2                 L2                          L2                 L2
                                             Logic


                                          GDDR MC     TD                TD                           TD                 TD     GDDR MC
                                          GDDR MC                                                                              GDDR MC
                                                      TD                TD                           TD                 TD



                                                      L2                 L2                          L2                 L2

                                                      Core            Core                         Core                 Core




6   Visual and Parallel Computing Group                      Copyright © 2012 Intel Corporation. All rights reserved.
Knights Corner Core
                                                                                                           PPF                   PF   D0     D1       D2     E        WB


                 T0 IP
                 T1 IP                        L1 TLB      Code Cache Miss
                 T2 IP                       and 32KB
                 T3 IP
                                            Code Cache    TLB Miss
                                                   16B/Cycle (2 IPC)
                4 Threads
                 In-Order                    Decode                     uCode                                                                                     512KB
                                                                                                                    TLB Miss                        HWP
                                                                                                                                                                 L2 Cache
                                                                                                                     Handler
                                   Pipe 0                     Pipe 1                                                                              L2 Ctl
                                                                                                                      L2 TLB


                       VPU RF                X87 RF                    Scalar RF


                      VPU
                                            X87          ALU 0              ALU 1                                                                 To On-Die Interconnect
                    512b SIMD                                                                TLB Miss
                                              L1 TLB and 32KB Data Cache
                                                                                             DCache Miss
                                                                                                                                      Core


                                                          X86 specific logic < 2% of core + L2 area

7   Visual and Parallel Computing Group                                   Copyright © 2012 Intel Corporation. All rights reserved.
Vector Processing Unit
                                                                PPF            PF                  D0                 D1         D2       E     WB

                                                                                                                                 D2       E     VC1   VC2    V1-V4   WB


                                  D2           E         VC1                 VC2                               V1                 V2            V3          V4


                                           VPU                   LD
                              DEC           RF
                                          3R, 1W                                                                                      Vector ALUs
                                                                EMU
                                                                                                                                  16 Wide x 32 bit
                                                    ST                                                                            8 Wide x 64 bit

                                                                                                                                 Fused Multiply Add
                                             Mask              Scatter
                                              RF               Gather




8   Visual and Parallel Computing Group                               Copyright © 2012 Intel Corporation. All rights reserved.
Interconnect

                                                                                                                           BL - 64 Bytes    Data
                                          Core   Core   Core          Core

                                          L2     L2     L2              L2
                                                                                                                               AD          Command and Address

                                                                                                                               AK          Coherence and Credits
                                          TD     TD     TD              TD
                                          TD     TD     TD              TD
                                                                                                                               AK


                                                                                                                               AD
                                          L2     L2     L2              L2

                                          Core   Core   Core          Core
                                                                                                                          BL – 64 Bytes




9   Visual and Parallel Computing Group                        Copyright © 2012 Intel Corporation. All rights reserved.
Distributed Tag Directories


                                           Core   Core    Core          Core

                                           L2     L2       L2             L2                                                TAG Core Valid Mask State
                                                                                                                            TAG Core Valid Mask State

                                           TD     TD       TD             TD
                                           TD     TD       TD             TD



                                           L2     L2       L2             L2                                   Tag Directories track cache-lines in all L2s

                                           Core   Core    Core          Core




10   Visual and Parallel Computing Group                         Copyright © 2012 Intel Corporation. All rights reserved.
Interleaved Memory Access
                                                       Core                 Core




                                                                                                     GDDR MC
                                                       L2                     L2



                                                       TD                     TD




                                                                                                                         Core
                                           GDDR MC




                                                                                                                    L2
                                                                                                               TD




                                                                                                                         Core
                                           Core




                                                                                                                    L2
                                                       TD
                                                  L2




                                                                                                               TD
                                           Core

                                                                                                                    GDDR MC



                                                       TD
                                                  L2
                                                                                  TD                      TD


                                                          GDDR MC
                                                                                  L2                     L2

                                                                                Core                   Core




11   Visual and Parallel Computing Group                Copyright © 2012 Intel Corporation. All rights reserved.
Interconnect: 2X AD/AK

                                                                                                                              BL - 64 Bytes
                                           Core   Core     Core          Core

                                           L2     L2       L2              L2
                                                                                                                                  AD


                                                                                                                                  AK
                                           TD     TD       TD              TD
                                                                                                                                              2x
                                           TD     TD       TD              TD
                                                                                                                                  AK


                                                                                                                                  AD
                                           L2     L2       L2              L2

                                           Core   Core     Core          Core
                                                                                                                             BL – 64 Bytes




12   Visual and Parallel Computing Group                          Copyright © 2012 Intel Corporation. All rights reserved.
Multi-threaded Triad – Saturation for 1 AD/AK Ring



                                  Performance




                                                                                                                                                                      Simulation Data indicates
                                                                                                                                                                      saturation for a single
                                                                                                                                                                      AD/AK ring




                                                0                5             10             15             20                  25                 30                35       40        45             50


                                                                                                                      Cores Running

                                                    Results measured in development labs at Intel on Knights Corner prototype hardware and systems. For more information go to http://www.intel.com/performance




13   Visual and Parallel Computing Group                                                                   Copyright © 2012 Intel Corporation. All rights reserved.
Multi-threaded Triad – Benefit of Doubling AD/AK



                                                                                             Silicon Data for
                                                                                             2 AD + AK rings                                                                                                      > 40%
                                  Performance




                                                                                                                                                                      Simulation Data indicates
                                                                                                                                                                      saturation for a single
                                                                                                                                                                      AD/AK ring




                                                0                5             10             15             20                  25                 30                35       40        45             50


                                                                                                                      Cores Running

                                                    Results measured in development labs at Intel on Knights Corner prototype hardware and systems. For more information go to http://www.intel.com/performance




14   Visual and Parallel Computing Group                                                                   Copyright © 2012 Intel Corporation. All rights reserved.
Streaming Stores

                                           Streams Triad
                                            for (i=0; i<HUGE; i++)
                                                       A[i] = k*B[i] + C[i];

                                           Without Streaming Stores
                                            Read A, B, C, Write A
                                            256 Bytes transferred to/from memory per iteration

                                           With Streaming Stores
                                            Read B, C, Write A
                                            192 Bytes transferred to/from memory per iteration




15   Visual and Parallel Computing Group                        Copyright © 2012 Intel Corporation. All rights reserved.
Multi-threaded Triad — with Streaming Stores

                                                                                                                       Silicon Data
                                                                                                                       Streaming Stores                                                                                 > 30%
                                           Performance




                                                             0              5             10             15               20                  25                 30        35        40             45             50


                                                                                                                                     Cores Running
                                                         Results measured in development labs at Intel on Knights Corner prototype hardware and systems. For more information go to http://www.intel.com/performance




16   Visual and Parallel Computing Group                                                                        Copyright © 2012 Intel Corporation. All rights reserved.
Cache Hierarchy Micro-architecture Choices

        L2 TLB
            64 entry, holds PTEs and PDEs vs. no L2 TLB

        Dcache Capability
            Simultaneous 512b load and 512b store vs. 1 load or store per cycle

        L2 Cache
            512 KB vs. 256 KB

        Hardware Prefetcher
            16 stream detectors, prefetch into the L2 vs. no HWP (rely only on software prefetching)




17   Visual and Parallel Computing Group                  Copyright © 2012 Intel Corporation. All rights reserved.
Per-Core ST Performance Improvement (per cycle)
                                                                                                             Spec FP 2006
                         3.0
                                                                              Performance impact of KNC core uArch improvements
                         2.5

                         2.0

                         1.5

                         1.0

                         0.5

                         0.0




                                                     >1.8x Average Performance/Cycle Improvement – 1 Core, 1 Thread

                                       Results measured in development labs at Intel on Knights Corner and Knights Ferry prototype hardware and systems. For more information go to http://www.intel.com/performance



18   Visual and Parallel Computing Group                                                               Copyright © 2012 Intel Corporation. All rights reserved.
Caches – For or Against?
                                                                             Relative BW                                                                                 Relative BW/Watt
            50
            45
            40                                 Caches:
            35                                  high data BW
            30                                  low energy per byte of data supplied
            25                                  programmer friendly (coherence just works)
            20
            15
            10
               5
               0
                                                Memory BW                                                                   L2 Cache BW                                                              L1 Cache BW

                                                               Coherent Caches are a key MIC Architecture Advantage
     Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator or model. Any difference in system hardware or software design or configuration may affect actual performance.

19       Visual and Parallel Computing Group                                                                  Copyright © 2012 Intel Corporation. All rights reserved.
Example: Stencils
                                           spatial time-step simulation of a physical system




                                                                                  L2$
                                                                                 Sized




                                             Cache blocking promotes much higher performance
                                               and performance/watt vs. memory streaming

20   Visual and Parallel Computing Group                   Copyright © 2012 Intel Corporation. All rights reserved.
Power Management: All On and Running
                                                     PCIe IO

                                                                  Core             Core                    Core                     Core
                                                          PCIe
                                                         Client   L2                 L2                       L2                    L2
                                                         Logic
                         GDDR5                                                                                                                                 GDDR5
                         GDDR5                                                                                                                                 GDDR5




                                                                                                                                                     GDDR IO
                                                                  TD                 TD                      TD                     TD
                                           GDDR IO
                                                     GDDR MC                                                                               GDDR MC
                         GDDR5                                                                                                                                 GDDR5
                                                     GDDR MC                                                                               GDDR MC
                                                                  TD                 TD                      TD                     TD
                         GDDR5                                                                                                                                 GDDR5
                         GDDR5                                                                                                                                 GDDR5
                                                                  L2                 L2                       L2                    L2

                                                                  Core             Core                    Core                     Core




21   Visual and Parallel Computing Group                                 Copyright © 2012 Intel Corporation. All rights reserved.
Core C1: Clock Gate Core
                                                     PCIe IO

                                                                     Core             Core                    Core                     Core
                                                          PCIe
                                                         Client      L2                 L2                       L2                    L2
                                                         Logic
                         GDDR5                                                                                                                                    GDDR5
                         GDDR5                                                                                                                                    GDDR5




                                                                                                                                                        GDDR IO
                                                                     TD                 TD                      TD                     TD
                                           GDDR IO
                                                     GDDR MC                                                                                  GDDR MC
                         GDDR5                                                                                                                                    GDDR5
                                                     GDDR MC                                                                                  GDDR MC
                                                                     TD                 TD                      TD                     TD
                         GDDR5                                                                                                                                    GDDR5
                         GDDR5                                                                                                                                    GDDR5
                                                                     L2                 L2                       L2                    L2

                                                                     Core             Core                    Core                     Core




                                                       When all 4T on a core have halted, core clock gates itself


22   Visual and Parallel Computing Group                                    Copyright © 2012 Intel Corporation. All rights reserved.
Core C6: Power Gate Core
                                                     PCIe IO

                                                                     Core             Core                    Core                     Core
                                                          PCIe
                                                         Client       L2                L2                       L2                    L2
                                                         Logic
                         GDDR5                                                                                                                                    GDDR5
                         GDDR5                                                                                                                                    GDDR5




                                                                                                                                                        GDDR IO
                                                                      TD                TD                      TD                     TD
                                           GDDR IO
                                                     GDDR MC                                                                                  GDDR MC
                         GDDR5                                                                                                                                    GDDR5
                                                     GDDR MC                                                                                  GDDR MC
                                                                      TD                TD                      TD                     TD
                         GDDR5                                                                                                                                    GDDR5
                         GDDR5                                                                                                                                    GDDR5
                                                                      L2                L2                       L2                    L2

                                                                     Core             Core                    Core                     Core




                                                     C1 time-out, power gate core, save leakage, requires core-re-init


23   Visual and Parallel Computing Group                                    Copyright © 2012 Intel Corporation. All rights reserved.
Package Auto C3
                                                     PCIe IO

                                                                       Core             Core                    Core                     Core
                                                          PCIe
                                                         Client        L2                 L2                       L2                    L2
                                                         Logic
                         GDDR5                                                                                                                                      GDDR5
                         GDDR5                                                                                                                                      GDDR5




                                                                                                                                                          GDDR IO
                                                                       TD                 TD                      TD                     TD
                                           GDDR IO
                                                     GDDR MC                                                                                    GDDR MC
                         GDDR5                                                                                                                                      GDDR5
                                                     GDDR MC                                                                                    GDDR MC
                                                                       TD                 TD                      TD                     TD
                         GDDR5                                                                                                                                      GDDR5
                         GDDR5                                                                                                                                      GDDR5
                                                                       L2                 L2                       L2                    L2

                                                                       Core             Core                    Core                     Core




                                                                  Timeout when all cores have been in C6,
                                                                  clock gate the L2 and interconnect

24   Visual and Parallel Computing Group                                      Copyright © 2012 Intel Corporation. All rights reserved.
Package C6
                                                     PCIe IO

                                                                       Core             Core                    Core                     Core
                                                          PCIe
                                                         Client         L2                L2                       L2                    L2
                                                         Logic
                         GDDR5                                                                                                                                      GDDR5
                         GDDR5                                                                                                                                      GDDR5




                                                                                                                                                          GDDR IO
                                                                        TD                TD                      TD                     TD
                                           GDDR IO
                                                     GDDR MC                                                                                    GDDR MC
                         GDDR5                                                                                                                                      GDDR5
                                                     GDDR MC                                                                                    GDDR MC
                                                                        TD                TD                      TD                     TD
                         GDDR5                                                                                                                                      GDDR5
                         GDDR5                                                                                                                                      GDDR5
                                                                        L2                L2                       L2                    L2

                                                                       Core             Core                    Core                     Core




                                                                    Host Driver can initiate Package C6 –
                                                                  Uncore Voltage Off, requires partial restart

25   Visual and Parallel Computing Group                                      Copyright © 2012 Intel Corporation. All rights reserved.
Summary

                                           Intel® Xeon Phi™ coprocessor provides:

                                             Performance and Performance/Watt for highly parallel HPC
                                                   with cores, threads, wide-SIMD, caches, memory BW

                                             Intel Architecture
                                                    general purpose programming environment
                                                    advanced power management technology




                              KNC delivers programmability and performance/watt for highly parallel HPC


26   Visual and Parallel Computing Group                     Copyright © 2012 Intel Corporation. All rights reserved.
Thank You

                                           Knights Corner brought to you by:
                                            IAG (Intel Architecture Group)
                                               • DCSG (Data Center and Systems Group)
                                               • VPG (Visual and Parallel Group) MIC
                                                    – HW Architecture
                                                    – HW Design
                                                    – SW
                                            SSG (Software and Services Group) MIC
                                            IL PCL (Intel Labs – Parallel Computing Lab)


27   Visual and Parallel Computing Group                   Copyright © 2012 Intel Corporation. All rights reserved.
Vector Processor: 512b SIMD Width

                              SP                 SP                  SP                                SP
                              15
                                           DP7
                                                 11
                                                            DP5
                                                                     7
                                                                                DP3
                                                                                                       3
                                                                                                                     DP1
                                                                                                                                           Shared Multiplier
                              SP                 SP                  SP                                SP                                  Circuit for SP/DP
                              14                 10                  6                                 2

                              SP                 SP                  SP                                SP
                              13                 9                   5                                 1
                                           DP6              DP4                 DP2                                  DP0
                              SP                 SP                  SP                                SP
                              12                 8                   4                                 0




                                   RF3                RF2                 RF1                                RF0




                                                                  16 wide SP SIMD, 8 wide DP SIMD
                                                                  2:1 Ratio good for circuit optimization

29   Visual and Parallel Computing Group                                        Copyright © 2012 Intel Corporation. All rights reserved.
Gather/Scatter Address Machinery
     Gather Instruction Loop
      gather-prime
 loop: gather-step; jump-mask-not-zero loop                                                  Vector Register
                                                                                                 Index0              Index1              Index2    Index3   Index4   Index5     Index6   Index7
                         Scalar Register
                                            Base Address

                                                                                                      +                   +                    +     +        +        +           +         +
                                            Mask Register
                                                                                                 Addr0               Addr1               Addr2     Addr3    Addr4    Addr5       Addr6   Addr7



                                                         1 1 1 1 1 1 1 1
                                              Clear


                                                                           Find
                                                                           First
                                                                                                                                                                              To TLB/
                                                                                                                                                   Access Address             DCACHE



                                                      Clear                                               =                                                                              =



                                                                           Gather/Scatter machine takes advantage
                                                                                      of cache-line locality

30    Visual and Parallel Computing Group                                           Copyright © 2012 Intel Corporation. All rights reserved.
Package Deep C3
                                                     PCIe IO

                                                                       Core             Core                    Core                     Core
                                                          PCIe
                                                         Client         L2                L2                       L2                    L2
                                                         Logic
                         GDDR5                                                                                                                                      GDDR5
                         GDDR5                                                                                                                                      GDDR5




                                                                                                                                                          GDDR IO
                                                                        TD                TD                      TD                     TD
                                           GDDR IO
                                                     GDDR MC                                                                                    GDDR MC
                         GDDR5                                                                                                                                      GDDR5
                                                     GDDR MC                                                                                    GDDR MC
                                                                        TD                TD                      TD                     TD
                         GDDR5                                                                                                                                      GDDR5
                         GDDR5                                                                                                                                      GDDR5
                                                                        L2                L2                       L2                    L2

                                                                       Core             Core                    Core                     Core




                                                                  Host Driver Initiated – L2/Ring/TDs dropped
                                                                    to retention V, memory in self refresh

31   Visual and Parallel Computing Group                                      Copyright © 2012 Intel Corporation. All rights reserved.

Mais conteúdo relacionado

Mais procurados

The Role of a Network Software Developer in Network Transformation
The Role of a Network Software Developer in Network TransformationThe Role of a Network Software Developer in Network Transformation
The Role of a Network Software Developer in Network TransformationMichelle Holley
 
Using VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersMichelle Holley
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureJim St. Leger
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hwvideos
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...Edge AI and Vision Alliance
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Quieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyQuieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyMichelle Holley
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Eric Van Hensbergen
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Haidee McMahon
 
The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503Linaro
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchJim St. Leger
 
Netsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvNetsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvIntel
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Devconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKDevconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKMaxime Coquelin
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 

Mais procurados (20)

The Role of a Network Software Developer in Network Transformation
The Role of a Network Software Developer in Network TransformationThe Role of a Network Software Developer in Network Transformation
The Role of a Network Software Developer in Network Transformation
 
Using VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear Containers
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel ArchitectureDPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
 
1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw1 intro to_dpdk_and_hw
1 intro to_dpdk_and_hw
 
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Quieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyQuieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director Technology
 
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
Balance, Flexibility, and Partnership: An ARM Approach to Future HPC Node Arc...
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
Intel tools to optimize HPC systems
Intel tools to optimize HPC systemsIntel tools to optimize HPC systems
Intel tools to optimize HPC systems
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
 
The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
Netsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvNetsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfv
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
VirtFS
VirtFSVirtFS
VirtFS
 
Devconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKDevconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDK
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Paravirtualized File Systems
Paravirtualized File SystemsParavirtualized File Systems
Paravirtualized File Systems
 

Semelhante a Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012

Intel Xeon Phi Hotchips Architecture Presentation
Intel Xeon Phi Hotchips Architecture PresentationIntel Xeon Phi Hotchips Architecture Presentation
Intel Xeon Phi Hotchips Architecture PresentationChris O'Neal
 
Embedded Chief River Design-In Presentation_30442998.pdf
Embedded Chief River Design-In Presentation_30442998.pdfEmbedded Chief River Design-In Presentation_30442998.pdf
Embedded Chief River Design-In Presentation_30442998.pdfOemTest
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksDESMOND YUEN
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsIntel® Software
 
Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessorsKrunal Siddhapathak
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Community
 
Azinta Gpu Cloud Services London Financial Python Ug 1.2
Azinta Gpu Cloud Services   London Financial Python Ug 1.2Azinta Gpu Cloud Services   London Financial Python Ug 1.2
Azinta Gpu Cloud Services London Financial Python Ug 1.2Suleiman Shehu
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community
 
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationAccelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationIntel IT Center
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing SlidesRonen Mendezitsky
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYehMAKERPRO.cc
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCMemVerge
 
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)Talal Khaliq
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 

Semelhante a Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012 (20)

Intel Xeon Phi Hotchips Architecture Presentation
Intel Xeon Phi Hotchips Architecture PresentationIntel Xeon Phi Hotchips Architecture Presentation
Intel Xeon Phi Hotchips Architecture Presentation
 
Embedded Chief River Design-In Presentation_30442998.pdf
Embedded Chief River Design-In Presentation_30442998.pdfEmbedded Chief River Design-In Presentation_30442998.pdf
Embedded Chief River Design-In Presentation_30442998.pdf
 
Increasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery NetworksIncreasing Throughput per Node for Content Delivery Networks
Increasing Throughput per Node for Content Delivery Networks
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
 
Features of modern intel microprocessors
Features of modern intel microprocessorsFeatures of modern intel microprocessors
Features of modern intel microprocessors
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 
Azinta Gpu Cloud Services London Financial Python Ug 1.2
Azinta Gpu Cloud Services   London Financial Python Ug 1.2Azinta Gpu Cloud Services   London Financial Python Ug 1.2
Azinta Gpu Cloud Services London Financial Python Ug 1.2
 
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
 
Accelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing TransformationAccelerating Insights in the Technical Computing Transformation
Accelerating Insights in the Technical Computing Transformation
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing Slides
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
Intel core i5
Intel core i5Intel core i5
Intel core i5
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPC
 
Mcdram Tutorial
Mcdram TutorialMcdram Tutorial
Mcdram Tutorial
 
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
 
IBM Flex System x440 Compute Node
IBM Flex System x440 Compute NodeIBM Flex System x440 Compute Node
IBM Flex System x440 Compute Node
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 

Mais de Intel IT Center

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- SupercomputingIntel IT Center
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraIntel IT Center
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationIntel IT Center
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsIntel IT Center
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationIntel IT Center
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Intel IT Center
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayIntel IT Center
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.Intel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldIntel IT Center
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...Intel IT Center
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital AgeIntel IT Center
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityIntel IT Center
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel IT Center
 

Mais de Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012

  • 1. Intel® Xeon Phi™ coprocessor (codename Knights Corner) George Chrysos Senior Principal Engineer Hot Chips, August 28, 2012 More on Twitter @IntelITS
  • 2. Legal Disclaimers Copyright © 2012 Intel Corporation. All rights reserved. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm%20 Intel, the Intel logo, Xeon, Intel Core and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries Other names and brands may be claimed as the property of others. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about performance and benchmark results, visit Performance Test Disclosure This document contains information on products in the design phase of development. All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 WARNING: Altering clock frequency and/or voltage may: (i) reduce system stability and useful life of the system and processor; (ii) cause the processor and other system components to fail; (iii) cause reductions in system performance; (iv) cause additional heat or other damage; and (v) affect system data integrity. Intel has not tested, and does not warranty, the operation of the processor beyond its specif ications. Intel assumes no responsibility that the processor, including if used with altered clock frequencies and/or voltages, will be fit for any particular purpose. For more information, visit Overclocking Intel Processors Warning: Altering PC memory frequency and/or voltage may (i) reduce system stability and use life of the system, memory and processor; (ii) cause the processor and other system components to fail; (iii) cause reductions in system performance; (iv) cause additional heat or other damage; and (v) affect system data integrity. Intel assumes no responsibility that the memory, included if used with altered clock frequencies and/or voltages, will be fit for any particular purpose. Check with memory manufacturer for warranty and additional details Available  on  select  Intel®  Core™  Intel®  Xeon®  and  Intel®  Xeon  Phi™  processors.  Requires  an  Intel®  HT  Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading. Requires a system with a 64-bit enabled processor, chipset, BIOS and software. Performance will vary depending on the specific hardware and software you use. Consult your PC manufacturer for more information. For more information, visit http://www.intel.com/info/em64t Requires a system with Intel® Turbo Boost Technology. Intel Turbo Boost Technology and Intel Turbo Boost Technology 2.0 are only available on select Intel® processors. Consult your PC manufacturer. Performance varies depending on hardware, software, and system configuration. For more information, visit http://www.intel.com/go/turbo ENERGY STAR is a system-level energy specification, defined by the Environmental Protection Agency, that relies on all system components, such as processor, chipset, power supply, etc.) For more information, visit http://www.intel.com/technology/epa/index.html
  • 3. Intel® Many Integrated Core (Intel MIC) Architecture Targeted at highly parallel HPC workloads • Physics, Chemistry, Biology, Financial Services Power efficient cores, support for parallelism • Cores: less speculation, threads, wider SIMD • Scalability: high BW on die interconnect and memory General Purpose Programming Environment • Runs Linux (full service, open source OS) • Runs applications written in Fortran, C, C++, … • Supports X86 memory model, IEEE 754 • x86 collateral (libraries, compilers, Intel® VTune™ debuggers, etc) 3 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 4. Knights Corner Coprocessor KNC Card KNC Card TCP/IP PC e x16 GDDR5 Channel … GDDR5 Channel Intel® Xeon® Processor PCIe x16 Channel GDDR5 KN50 Cores > … KN Linux OS Channel GDDR5 System Memory GDDR5 Channel … GDDR5 Channel >= 8GB GDDR5 memory 4 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 5. Knights Corner – Power Efficient Performance per Watt of a prototype Knights Corner Cluster compared to the 2 Top Graphics Accelerated Clusters 1381 1380 1266 1400 1200 MFLOPS/Watt 1000 800 600 + + + 400 200 0 Intel Corp Nagasaki Univ. Barcelona Supercomputing Center Knights Corner ATI Radeon Nvidia Tesla 2090 Higher is Better Source: www.green500.org Top500 #150 Top500 #456 Top500 #177 72.9 kW 47 kW 81.5 kW 5 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 6. Knights Corner Micro-architecture Core Core Core Core PCIe Client L2 L2 L2 L2 Logic GDDR MC TD TD TD TD GDDR MC GDDR MC GDDR MC TD TD TD TD L2 L2 L2 L2 Core Core Core Core 6 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 7. Knights Corner Core PPF PF D0 D1 D2 E WB T0 IP T1 IP L1 TLB Code Cache Miss T2 IP and 32KB T3 IP Code Cache TLB Miss 16B/Cycle (2 IPC) 4 Threads In-Order Decode uCode 512KB TLB Miss HWP L2 Cache Handler Pipe 0 Pipe 1 L2 Ctl L2 TLB VPU RF X87 RF Scalar RF VPU X87 ALU 0 ALU 1 To On-Die Interconnect 512b SIMD TLB Miss L1 TLB and 32KB Data Cache DCache Miss Core X86 specific logic < 2% of core + L2 area 7 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 8. Vector Processing Unit PPF PF D0 D1 D2 E WB D2 E VC1 VC2 V1-V4 WB D2 E VC1 VC2 V1 V2 V3 V4 VPU LD DEC RF 3R, 1W Vector ALUs EMU 16 Wide x 32 bit ST 8 Wide x 64 bit Fused Multiply Add Mask Scatter RF Gather 8 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 9. Interconnect BL - 64 Bytes Data Core Core Core Core L2 L2 L2 L2 AD Command and Address AK Coherence and Credits TD TD TD TD TD TD TD TD AK AD L2 L2 L2 L2 Core Core Core Core BL – 64 Bytes 9 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 10. Distributed Tag Directories Core Core Core Core L2 L2 L2 L2 TAG Core Valid Mask State TAG Core Valid Mask State TD TD TD TD TD TD TD TD L2 L2 L2 L2 Tag Directories track cache-lines in all L2s Core Core Core Core 10 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 11. Interleaved Memory Access Core Core GDDR MC L2 L2 TD TD Core GDDR MC L2 TD Core Core L2 TD L2 TD Core GDDR MC TD L2 TD TD GDDR MC L2 L2 Core Core 11 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 12. Interconnect: 2X AD/AK BL - 64 Bytes Core Core Core Core L2 L2 L2 L2 AD AK TD TD TD TD 2x TD TD TD TD AK AD L2 L2 L2 L2 Core Core Core Core BL – 64 Bytes 12 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 13. Multi-threaded Triad – Saturation for 1 AD/AK Ring Performance Simulation Data indicates saturation for a single AD/AK ring 0 5 10 15 20 25 30 35 40 45 50 Cores Running Results measured in development labs at Intel on Knights Corner prototype hardware and systems. For more information go to http://www.intel.com/performance 13 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 14. Multi-threaded Triad – Benefit of Doubling AD/AK Silicon Data for 2 AD + AK rings > 40% Performance Simulation Data indicates saturation for a single AD/AK ring 0 5 10 15 20 25 30 35 40 45 50 Cores Running Results measured in development labs at Intel on Knights Corner prototype hardware and systems. For more information go to http://www.intel.com/performance 14 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 15. Streaming Stores Streams Triad for (i=0; i<HUGE; i++) A[i] = k*B[i] + C[i]; Without Streaming Stores Read A, B, C, Write A 256 Bytes transferred to/from memory per iteration With Streaming Stores Read B, C, Write A 192 Bytes transferred to/from memory per iteration 15 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 16. Multi-threaded Triad — with Streaming Stores Silicon Data Streaming Stores > 30% Performance 0 5 10 15 20 25 30 35 40 45 50 Cores Running Results measured in development labs at Intel on Knights Corner prototype hardware and systems. For more information go to http://www.intel.com/performance 16 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 17. Cache Hierarchy Micro-architecture Choices L2 TLB 64 entry, holds PTEs and PDEs vs. no L2 TLB Dcache Capability Simultaneous 512b load and 512b store vs. 1 load or store per cycle L2 Cache 512 KB vs. 256 KB Hardware Prefetcher 16 stream detectors, prefetch into the L2 vs. no HWP (rely only on software prefetching) 17 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 18. Per-Core ST Performance Improvement (per cycle) Spec FP 2006 3.0 Performance impact of KNC core uArch improvements 2.5 2.0 1.5 1.0 0.5 0.0 >1.8x Average Performance/Cycle Improvement – 1 Core, 1 Thread Results measured in development labs at Intel on Knights Corner and Knights Ferry prototype hardware and systems. For more information go to http://www.intel.com/performance 18 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 19. Caches – For or Against? Relative BW Relative BW/Watt 50 45 40 Caches: 35  high data BW 30  low energy per byte of data supplied 25  programmer friendly (coherence just works) 20 15 10 5 0 Memory BW L2 Cache BW L1 Cache BW Coherent Caches are a key MIC Architecture Advantage Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator or model. Any difference in system hardware or software design or configuration may affect actual performance. 19 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 20. Example: Stencils spatial time-step simulation of a physical system L2$ Sized Cache blocking promotes much higher performance and performance/watt vs. memory streaming 20 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 21. Power Management: All On and Running PCIe IO Core Core Core Core PCIe Client L2 L2 L2 L2 Logic GDDR5 GDDR5 GDDR5 GDDR5 GDDR IO TD TD TD TD GDDR IO GDDR MC GDDR MC GDDR5 GDDR5 GDDR MC GDDR MC TD TD TD TD GDDR5 GDDR5 GDDR5 GDDR5 L2 L2 L2 L2 Core Core Core Core 21 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 22. Core C1: Clock Gate Core PCIe IO Core Core Core Core PCIe Client L2 L2 L2 L2 Logic GDDR5 GDDR5 GDDR5 GDDR5 GDDR IO TD TD TD TD GDDR IO GDDR MC GDDR MC GDDR5 GDDR5 GDDR MC GDDR MC TD TD TD TD GDDR5 GDDR5 GDDR5 GDDR5 L2 L2 L2 L2 Core Core Core Core When all 4T on a core have halted, core clock gates itself 22 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 23. Core C6: Power Gate Core PCIe IO Core Core Core Core PCIe Client L2 L2 L2 L2 Logic GDDR5 GDDR5 GDDR5 GDDR5 GDDR IO TD TD TD TD GDDR IO GDDR MC GDDR MC GDDR5 GDDR5 GDDR MC GDDR MC TD TD TD TD GDDR5 GDDR5 GDDR5 GDDR5 L2 L2 L2 L2 Core Core Core Core C1 time-out, power gate core, save leakage, requires core-re-init 23 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 24. Package Auto C3 PCIe IO Core Core Core Core PCIe Client L2 L2 L2 L2 Logic GDDR5 GDDR5 GDDR5 GDDR5 GDDR IO TD TD TD TD GDDR IO GDDR MC GDDR MC GDDR5 GDDR5 GDDR MC GDDR MC TD TD TD TD GDDR5 GDDR5 GDDR5 GDDR5 L2 L2 L2 L2 Core Core Core Core Timeout when all cores have been in C6, clock gate the L2 and interconnect 24 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 25. Package C6 PCIe IO Core Core Core Core PCIe Client L2 L2 L2 L2 Logic GDDR5 GDDR5 GDDR5 GDDR5 GDDR IO TD TD TD TD GDDR IO GDDR MC GDDR MC GDDR5 GDDR5 GDDR MC GDDR MC TD TD TD TD GDDR5 GDDR5 GDDR5 GDDR5 L2 L2 L2 L2 Core Core Core Core Host Driver can initiate Package C6 – Uncore Voltage Off, requires partial restart 25 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 26. Summary Intel® Xeon Phi™ coprocessor provides: Performance and Performance/Watt for highly parallel HPC with cores, threads, wide-SIMD, caches, memory BW Intel Architecture general purpose programming environment advanced power management technology KNC delivers programmability and performance/watt for highly parallel HPC 26 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 27. Thank You Knights Corner brought to you by: IAG (Intel Architecture Group) • DCSG (Data Center and Systems Group) • VPG (Visual and Parallel Group) MIC – HW Architecture – HW Design – SW SSG (Software and Services Group) MIC IL PCL (Intel Labs – Parallel Computing Lab) 27 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 28.
  • 29. Vector Processor: 512b SIMD Width SP SP SP SP 15 DP7 11 DP5 7 DP3 3 DP1 Shared Multiplier SP SP SP SP Circuit for SP/DP 14 10 6 2 SP SP SP SP 13 9 5 1 DP6 DP4 DP2 DP0 SP SP SP SP 12 8 4 0 RF3 RF2 RF1 RF0 16 wide SP SIMD, 8 wide DP SIMD 2:1 Ratio good for circuit optimization 29 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 30. Gather/Scatter Address Machinery Gather Instruction Loop gather-prime loop: gather-step; jump-mask-not-zero loop Vector Register Index0 Index1 Index2 Index3 Index4 Index5 Index6 Index7 Scalar Register Base Address + + + + + + + + Mask Register Addr0 Addr1 Addr2 Addr3 Addr4 Addr5 Addr6 Addr7 1 1 1 1 1 1 1 1 Clear Find First To TLB/ Access Address DCACHE Clear = = Gather/Scatter machine takes advantage of cache-line locality 30 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.
  • 31. Package Deep C3 PCIe IO Core Core Core Core PCIe Client L2 L2 L2 L2 Logic GDDR5 GDDR5 GDDR5 GDDR5 GDDR IO TD TD TD TD GDDR IO GDDR MC GDDR MC GDDR5 GDDR5 GDDR MC GDDR MC TD TD TD TD GDDR5 GDDR5 GDDR5 GDDR5 L2 L2 L2 L2 Core Core Core Core Host Driver Initiated – L2/Ring/TDs dropped to retention V, memory in self refresh 31 Visual and Parallel Computing Group Copyright © 2012 Intel Corporation. All rights reserved.