SlideShare a Scribd company logo
1 of 24
Power Optimization Through
Many-Core Multiprocessing
    Delivering High Performance in a Low Power World

                          ChipEx2012
                            Haydn Povey
            Marketing Director – Implementation & Security
                       ARM Processor Division




                              May 2, 2012
1
Billions of Connected Devices
                                                                                TAM(m)
                                       Form Factor                               2015
                                       Mobile Phones                             1,750
Performance expectations continue to   Media players                                300
                                       Mobile Computers                             750
   increase exponentially but power
                                       Desktop PCs                                  150
     efficiency and scalability are    Digital TV/STB                               500
   becoming formidable challenges      Automotive Infotainment                      100
                                       Other*                                       450
                                       Total                                    4 billion
                                        *Includes PND, photo-frames, etc


                                          ABI Research, IDC, Gartner and ARM forecasts




                        May 2, 2012
 2
Historic Technology Drivers




                      Functionality      Functionality   Functionality
      Functionality
                           $              Power × $       Energy×$



                                                           2010s
     Up to 1980s         1990s           2000s
                                                          Mobile
    Mainframes/mini     The PC           Notebooks       Computing

                                  May 2, 2012
3
Low Power Positioned for the Future
 Going forward low power is necessary
 for everything from microcontroller to servers

 Low power is a design philosophy
      Mindset, style, culture and working practice
      Not something you change or acquire easily
 Low power is a design reality
      ARM is an efficient architecture                Functionality
      None of the legacy or CISC complexity            Energy×$

 Low cost is a design & manufacturing partnership
      Time to volume not time to niche markets          2010s
                                                        Mobile
      Speed-binning not good enough for mass-market   Computing

                                  May 2, 2012
 4
Limitations with Multiprocessing
 Cost of offering the
  peak single thread
  performance on each
  CPU quickly exceeds
  chassis thermal limits


 System and software
  bottlenecks limit overall
  scalability


 Single die integration
  offered some roadmap

                              May 2, 2012
  5
Evolution to Many-Core
 Base theorem
      Simpler and smaller processor designs require exponentially less
       energy to accomplish same amount of compute as a more complex
       and larger processor design.

 “Approximate rule of thumb”
      To increase performance 50% you double the power and area cost of
       the processor design
      Quickly reaches point of diminishing returns




                                  May 2, 2012
 6
Challenge of Many-Core
 Many-core definition
      Use ‘lots’ of smaller, more efficient processors to achieve a higher
       aggregate performance than can be reached through multiprocessing

 Smaller processors are not capable of executing the same
    single thread as a higher performance processor in the same
    time – so can’t execute existing applications effectively

 Many threads can not easily be decomposed into simpler
    smaller tasks so as to benefit from multiprocessing on the
    smaller processor

 Software development challenge


                                  May 2, 2012
7
Software Data Decomposition
     Each data item is independent

                                         TASK      CPU

                                                   CPU

                                                   CPU

                                                   CPU


          TASK     CPU

                                Split large quantity of DATA
          TASK     CPU
                                into smaller chunks that can
          TASK     CPU              be operated in parallel
          TASK     CPU




                         May 2, 2012
8
Software Task Decomposition
     Each task item is functionally independent

  TASK TASK TASK TASK TASK TASK TASK TASK TASK       CPU

                                                     CPU

                                                     CPU

                                                     CPU


TASK TASK TASK      CPU


TASK TASK TASK      CPU       Functionally independent tasks
                               can be executed concurrently
TASK TASK TASK      CPU


TASK TASK TASK      CPU




                            May 2, 2012
 9
Functional Block Partitioning
 Functional blocks are serially dependent
       But temporary independent

 Distribute different functional blocks across
  available processors
       Split into defined functional threads
       Uses passing of data blocks between threads
         to allocate work
 Requires code changes and fine tuning                                                        Example:
                                                                                       Real Time Video Encoding
                                                                       CPU2
                                                                                  Motion
                                                                                  Compensation
 CPU0                                    CPU1                                                              CPU3

      Analogue             Remove                Remove                Quantise   Run-Length      Buffer
      Video                Inter-Frame           Intra-Frame           Samples    Compress        Store
      Sampling             Redundancy            Redundancy



                 (Simplified MPEG encoding functional block diagram)
                                                                                                           TIME


                                                               May 2, 2012
 10
Strategy Focus: The Thermal Wall
 SOC sustained power is limited in mobile devices by thermals;
         1.5W to 2W with low-cost POP and stacked memories
         3W without stacked memories
                                                  Responsiveness is a must
Power




          Burst for responsiveness
               (e.g. Browsing)                               Complex active management is
                                 T >= Tjmax, Tskin             needed

                                           “Opportunistic Residency”
                                                                             Managed Sustained Power


                                                                              Tj >= T max             Tj < Tmax


                                                                        Un-managed Max Power (@Tjmax )
                                           Sustained performance
                                      (e.g. HD Video Record , Gaming)


                                                                         Power Optimised Low End
                                                                          (e.g. e-Mail, Voice, MP3)



                                                     May 2, 2012                               Time
   11
Applying Nominal Use Case
 Typical Day for Smartphone User
         90 min voice calling
         60 min email / social networking
         30 min reading web
         50 min angry birds / other gaming
         90 min jogging while listening to music and
          logging GPS co-ordinates
       10 min video recording
       7 hrs sleep with music alarm clock
       OS typically executing ~28 active processes
           Apps synching in background



                                     May 2, 2012
 12
Use Case Measurements




              May 2, 2012
13
Use Case Conclusion
     Profiled CPU          Minutes            % of CPU
        States                                 Active
     Deep Sleep              1186                n/a
       200MHz                 154               60%
      500 MHz                  69               27%
      800 MHz                  18                7%
      1000 MHz                 4                 2%
      1200 MHz                 10                4%
           If the phone was ARM big.LITTLE™ enabled...

                    Active CPU time
               12%                      big
               88%                   LITTLE


                                     May 2, 2012
14
Big.LITTLE Processing




     Multiprocessing Capable                 Many core Benefits


                               May 2, 2012
15
“big” Processor – Cortex-A15
 ARM Cortex™-A15 Processor
       3.5+ DMIPS/MHz
       1-4 core MPCore™ configurable
 Advanced Capabilities
       Full ARMv7A architecture
          Thumb®-2, TrustZone®, VFP, NEON™
          Virtualization, large address extensions
       AMBA® 4 ACE™ coherency
 High Performance
       Targeting 1.5GHz mobile implementation on 28nm
       Hard Macro Quad-core Implementation @ 2GHz on 28HPM process

                                   May 2, 2012
 16
“LITTLE” Processor – Cortex-A7
 ARM Cortex-A7 Processor
       “LITTLE” to Cortex-A15 “big”
       1-4 core MPCore configurable
 Same Architectural Capabilities
       Full ARMv7A architecture
           Thumb-2, TrustZone, VFP, NEON
           Virtualization, large address extensions
       AMBA 4 ACE Coherency
       ISA identical to Cortex-A15 processor
 High Performance
       Up to 1.2GHz for mobile implementation on 28nm

                                   May 2, 2012
 17
Comparison of big.LITTLE Pipelines




                May 2, 2012
18
Performance Comparison




              May 2, 2012
19
Power Efficiency Comparison




               May 2, 2012
20
Software Use Models
 Big.LITTLE Task Migration – One CPU active
       Migrate between Cortex-A15 and Cortex-A7 depending on
        performance requirements

 Big.LITTLE MP – Both CPUs can be active
       Allocate threads that need high-performance to cortex-A15
       Allocate threads that don’t require high performance to Cortex-A7 for
        best energy efficiency
       AMBA 4 hardware coherency between Cortex-A-15 and Cortex-A7




                                   May 2, 2012
 21
Task Migration Mechanics




                May 2, 2012
22
CCI-400 Cache Coherent Interconnect
AMBA 4 compliant, 128-bit single layer at up to ½ Cortex-A15 frequency

            GIC-400                                        Coherent
                                     Mali-T604
                                                             I/O                                                  CCI-400 2+3 (x3)
                                      Graphics                                     DMA                     LCD
  Quad                                    ACE-Lite
                                                            device
                                                                                                                   2 full AMBA 4 ACE slave
                      Quad
 Cortex-
                  Cortex-A7
                                                                                   Configurable AXI 4/AXI 3/AHB
                                                                                              :
                                                                                            NIC-400                interfaces
  A15                                 ADB-400               ADB-400
     ACE                ACE                                                         AXI 4
                                                                                                                   +3 ACE-Lite I/O coherent
 ADB-400              ADB-400         MMU-400              MMU-400               MMU-400                           slave interfaces
     128b               128b               128b              128b                  128 b
                                                                                                                   x3 master interfaces
     ACE                ACE               ACE-Lite + DVM     ACE-Lite + DVM        ACE-Lite + DVM

                CoreLink™ CCI-400 Cache Coherent Interconnect
                   128 bit @ up to 0.5 Cortex-A15 frequency                                                       CCI interfaces:
                ACE-Lite                       ACE-Lite               ACE-Lite
                                                                                                                   AMBA 4 ACE and ACE-
                 128b                             128b                  128b
                                                                                                                   Lite manage all
                ACE-Lite                       ACE-Lite                 AXI 4
                                                                                 NIC-400
                                                                                                                   coherency, sharability
                                DMC-400
                  PHY                             PHY
                                                                        Configurable AXI 4/AXI 3/AHB/APB
                                                                                   :                               and barriers
               DDR3/2                        DDR3/2                     Other                Other
              LPDDR2/3                      LPDDR2/3                    Slaves               Slaves



                                                                      May 2, 2012
23
Summary
 Multiprocessing enables the scaling of today’s application to
  grow while maintaining single thread performance
    Addresses nicely the multi-tasking of stacked usage scenarios
 Many-core brings the energy advantages of simpler and
  smaller processor but with the challenge of software
  complexity and lack of backwards compatibility with respect
  to single thread performance

 The big.LITTLE processing as delivered by the ARM Cortex-
  A15 and Cortex-A7 offers both the performance and
  compatibility advantages of Multiprocessing along with the
  power efficiency and scalability advantages of many-core
  processing

                                May 2, 2012
 24

More Related Content

What's hot

AMD Opteron 6000 Series Platform Press Presentation
AMD Opteron 6000 Series Platform Press PresentationAMD Opteron 6000 Series Platform Press Presentation
AMD Opteron 6000 Series Platform Press PresentationAMD
 
D610 Spec Sheet
D610 Spec SheetD610 Spec Sheet
D610 Spec Sheetxnk12x
 
Intel Cloud Summit: Intel Platform Update
Intel Cloud Summit: Intel Platform UpdateIntel Cloud Summit: Intel Platform Update
Intel Cloud Summit: Intel Platform UpdateIntelAPAC
 
Novell Support Revealed! An Insider's Peek and Feedback Opportunity
Novell Support Revealed! An Insider's Peek and Feedback OpportunityNovell Support Revealed! An Insider's Peek and Feedback Opportunity
Novell Support Revealed! An Insider's Peek and Feedback OpportunityNovell
 
Six-Core AMD Opteron EE Processor
Six-Core AMD Opteron EE ProcessorSix-Core AMD Opteron EE Processor
Six-Core AMD Opteron EE ProcessorAMD
 
zEnterprise Reduces Cost Per Workload
zEnterprise Reduces Cost Per WorkloadzEnterprise Reduces Cost Per Workload
zEnterprise Reduces Cost Per Workloaddkang
 
IBM Virtual Desktop Virtualization
IBM Virtual Desktop VirtualizationIBM Virtual Desktop Virtualization
IBM Virtual Desktop VirtualizationIBM Sverige
 
Presentation from physical to virtual to cloud emc
Presentation   from physical to virtual to cloud emcPresentation   from physical to virtual to cloud emc
Presentation from physical to virtual to cloud emcxKinAnx
 
IBM Smart Business Desktop Cloud - How to optimise the ROI from your desktop ...
IBM Smart Business Desktop Cloud - How to optimise the ROI from your desktop ...IBM Smart Business Desktop Cloud - How to optimise the ROI from your desktop ...
IBM Smart Business Desktop Cloud - How to optimise the ROI from your desktop ...Vincent Kwon
 
Practical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iPractical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iCOMMON Europe
 
Hp All In 1
Hp All In 1Hp All In 1
Hp All In 1RBratton
 
Dme presentation-feb2013v2-1
Dme presentation-feb2013v2-1Dme presentation-feb2013v2-1
Dme presentation-feb2013v2-1Bengt Edlund
 
Gentek Introduce(en)
Gentek Introduce(en)Gentek Introduce(en)
Gentek Introduce(en)cloudmmog
 
Road to superior investment protection for mission critical
Road to superior investment protection for mission criticalRoad to superior investment protection for mission critical
Road to superior investment protection for mission criticalHP ESSN Philippines
 
Transforming Your Business Through Cloud Computing
Transforming Your Business Through Cloud ComputingTransforming Your Business Through Cloud Computing
Transforming Your Business Through Cloud ComputingAMD
 
Infoboom future-storage-aug2011-v3
Infoboom future-storage-aug2011-v3Infoboom future-storage-aug2011-v3
Infoboom future-storage-aug2011-v3Tony Pearson
 
9sept2009 concept electronics
9sept2009 concept electronics9sept2009 concept electronics
9sept2009 concept electronicsAgora Group
 

What's hot (20)

AMD Opteron 6000 Series Platform Press Presentation
AMD Opteron 6000 Series Platform Press PresentationAMD Opteron 6000 Series Platform Press Presentation
AMD Opteron 6000 Series Platform Press Presentation
 
D610 Spec Sheet
D610 Spec SheetD610 Spec Sheet
D610 Spec Sheet
 
Intel Cloud Summit: Intel Platform Update
Intel Cloud Summit: Intel Platform UpdateIntel Cloud Summit: Intel Platform Update
Intel Cloud Summit: Intel Platform Update
 
Novell Support Revealed! An Insider's Peek and Feedback Opportunity
Novell Support Revealed! An Insider's Peek and Feedback OpportunityNovell Support Revealed! An Insider's Peek and Feedback Opportunity
Novell Support Revealed! An Insider's Peek and Feedback Opportunity
 
Six-Core AMD Opteron EE Processor
Six-Core AMD Opteron EE ProcessorSix-Core AMD Opteron EE Processor
Six-Core AMD Opteron EE Processor
 
IBM System Blue Gene/P Data Sheet
IBM System Blue Gene/P Data SheetIBM System Blue Gene/P Data Sheet
IBM System Blue Gene/P Data Sheet
 
zEnterprise Reduces Cost Per Workload
zEnterprise Reduces Cost Per WorkloadzEnterprise Reduces Cost Per Workload
zEnterprise Reduces Cost Per Workload
 
IBM Virtual Desktop Virtualization
IBM Virtual Desktop VirtualizationIBM Virtual Desktop Virtualization
IBM Virtual Desktop Virtualization
 
Presentation from physical to virtual to cloud emc
Presentation   from physical to virtual to cloud emcPresentation   from physical to virtual to cloud emc
Presentation from physical to virtual to cloud emc
 
IBM Smart Business Desktop Cloud - How to optimise the ROI from your desktop ...
IBM Smart Business Desktop Cloud - How to optimise the ROI from your desktop ...IBM Smart Business Desktop Cloud - How to optimise the ROI from your desktop ...
IBM Smart Business Desktop Cloud - How to optimise the ROI from your desktop ...
 
Practical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iPractical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM i
 
Dukane 8937
Dukane 8937Dukane 8937
Dukane 8937
 
Hp All In 1
Hp All In 1Hp All In 1
Hp All In 1
 
Dme presentation-feb2013v2-1
Dme presentation-feb2013v2-1Dme presentation-feb2013v2-1
Dme presentation-feb2013v2-1
 
Gentek Introduce(en)
Gentek Introduce(en)Gentek Introduce(en)
Gentek Introduce(en)
 
Road to superior investment protection for mission critical
Road to superior investment protection for mission criticalRoad to superior investment protection for mission critical
Road to superior investment protection for mission critical
 
Smarter Computing and Breakthrough IT Economics
Smarter Computing and Breakthrough IT EconomicsSmarter Computing and Breakthrough IT Economics
Smarter Computing and Breakthrough IT Economics
 
Transforming Your Business Through Cloud Computing
Transforming Your Business Through Cloud ComputingTransforming Your Business Through Cloud Computing
Transforming Your Business Through Cloud Computing
 
Infoboom future-storage-aug2011-v3
Infoboom future-storage-aug2011-v3Infoboom future-storage-aug2011-v3
Infoboom future-storage-aug2011-v3
 
9sept2009 concept electronics
9sept2009 concept electronics9sept2009 concept electronics
9sept2009 concept electronics
 

Viewers also liked

Energy consumption in smart phones huda
Energy consumption in smart phones hudaEnergy consumption in smart phones huda
Energy consumption in smart phones hudaNoor Huda
 
Learn about energy consumption and battery life on Android devices
Learn about energy consumption and battery life on Android devicesLearn about energy consumption and battery life on Android devices
Learn about energy consumption and battery life on Android devicesMarakana Inc.
 
Power optimization for Android apps
Power optimization for Android appsPower optimization for Android apps
Power optimization for Android appsXavier Hallade
 
Battery Optimization for Android Apps - Devoxx14
Battery Optimization for Android Apps - Devoxx14Battery Optimization for Android Apps - Devoxx14
Battery Optimization for Android Apps - Devoxx14Murat Aydın
 
Project presentation (Loginradius SDK for Android)
Project presentation (Loginradius SDK for Android)Project presentation (Loginradius SDK for Android)
Project presentation (Loginradius SDK for Android)shwetarathi Rathi
 

Viewers also liked (6)

Energy consumption in smart phones huda
Energy consumption in smart phones hudaEnergy consumption in smart phones huda
Energy consumption in smart phones huda
 
Learn about energy consumption and battery life on Android devices
Learn about energy consumption and battery life on Android devicesLearn about energy consumption and battery life on Android devices
Learn about energy consumption and battery life on Android devices
 
Power optimization for Android apps
Power optimization for Android appsPower optimization for Android apps
Power optimization for Android apps
 
Battery Optimization for Android Apps - Devoxx14
Battery Optimization for Android Apps - Devoxx14Battery Optimization for Android Apps - Devoxx14
Battery Optimization for Android Apps - Devoxx14
 
Mobile GPS Tracking
Mobile GPS TrackingMobile GPS Tracking
Mobile GPS Tracking
 
Project presentation (Loginradius SDK for Android)
Project presentation (Loginradius SDK for Android)Project presentation (Loginradius SDK for Android)
Project presentation (Loginradius SDK for Android)
 

Similar to Power Optimization Through Manycore Multiprocessing

CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...CAST, Inc.
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationSlide_N
 
Architecting for Hyper-Scale Datacenter Efficiency
Architecting for Hyper-Scale Datacenter EfficiencyArchitecting for Hyper-Scale Datacenter Efficiency
Architecting for Hyper-Scale Datacenter EfficiencyIntel IT Center
 
System on Chip (SoC) for mobile phones
System on Chip (SoC) for mobile phonesSystem on Chip (SoC) for mobile phones
System on Chip (SoC) for mobile phonesJeffrey Funk
 
Apcbyschneider 27mai2011-110602085611-phpapp01
Apcbyschneider 27mai2011-110602085611-phpapp01Apcbyschneider 27mai2011-110602085611-phpapp01
Apcbyschneider 27mai2011-110602085611-phpapp01a4asif
 
05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_powerGennaro (Rino) Persico
 
Micro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectMicro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectHitesh Jani
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedDileep Bhandarkar
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
Architecting Cloud Solutions
Architecting Cloud SolutionsArchitecting Cloud Solutions
Architecting Cloud SolutionsAMD
 
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...mentoresd
 
Rocketick accelerated verilog simulations
Rocketick  accelerated verilog simulationsRocketick  accelerated verilog simulations
Rocketick accelerated verilog simulationschiportal
 
Architectures for mobile computing dec12
Architectures for mobile computing dec12Architectures for mobile computing dec12
Architectures for mobile computing dec12Rajveer Shekhawat
 
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...Emulex Corporation
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsRed_Hat_Storage
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 

Similar to Power Optimization Through Manycore Multiprocessing (20)

CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
 
Cell Technology for Graphics and Visualization
Cell Technology for Graphics and VisualizationCell Technology for Graphics and Visualization
Cell Technology for Graphics and Visualization
 
Architecting for Hyper-Scale Datacenter Efficiency
Architecting for Hyper-Scale Datacenter EfficiencyArchitecting for Hyper-Scale Datacenter Efficiency
Architecting for Hyper-Scale Datacenter Efficiency
 
System on Chip (SoC) for mobile phones
System on Chip (SoC) for mobile phonesSystem on Chip (SoC) for mobile phones
System on Chip (SoC) for mobile phones
 
Apcbyschneider 27mai2011-110602085611-phpapp01
Apcbyschneider 27mai2011-110602085611-phpapp01Apcbyschneider 27mai2011-110602085611-phpapp01
Apcbyschneider 27mai2011-110602085611-phpapp01
 
05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power05 2012 power_roadshow_software_on_power
05 2012 power_roadshow_software_on_power
 
Micro Server Design - Open Compute Project
Micro Server Design - Open Compute ProjectMicro Server Design - Open Compute Project
Micro Server Design - Open Compute Project
 
The SDN Opportunity
The SDN OpportunityThe SDN Opportunity
The SDN Opportunity
 
DCIM
DCIMDCIM
DCIM
 
Linaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updatedLinaro connect 2018 keynote final updated
Linaro connect 2018 keynote final updated
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
Architecting Cloud Solutions
Architecting Cloud SolutionsArchitecting Cloud Solutions
Architecting Cloud Solutions
 
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
Meeting SEP 2.0 Compliance: Developing Power Aware Embedded Systems for the M...
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Embeddedsystem
EmbeddedsystemEmbeddedsystem
Embeddedsystem
 
Rocketick accelerated verilog simulations
Rocketick  accelerated verilog simulationsRocketick  accelerated verilog simulations
Rocketick accelerated verilog simulations
 
Architectures for mobile computing dec12
Architectures for mobile computing dec12Architectures for mobile computing dec12
Architectures for mobile computing dec12
 
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 

More from chiportal

Prof. Zhihua Wang, Tsinghua University, Beijing, China
Prof. Zhihua Wang, Tsinghua University, Beijing, China Prof. Zhihua Wang, Tsinghua University, Beijing, China
Prof. Zhihua Wang, Tsinghua University, Beijing, China chiportal
 
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...chiportal
 
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...chiportal
 
Prof. Uri Weiser,Technion
Prof. Uri Weiser,TechnionProf. Uri Weiser,Technion
Prof. Uri Weiser,Technionchiportal
 
Ken Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, FaradayKen Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, Faradaychiportal
 
Prof. Danny Raz, Director, Bell Labs Israel, Nokia
 Prof. Danny Raz, Director, Bell Labs Israel, Nokia  Prof. Danny Raz, Director, Bell Labs Israel, Nokia
Prof. Danny Raz, Director, Bell Labs Israel, Nokia chiportal
 
Marco Casale-Rossi, Product Mktg. Manager, Synopsys
Marco Casale-Rossi, Product Mktg. Manager, SynopsysMarco Casale-Rossi, Product Mktg. Manager, Synopsys
Marco Casale-Rossi, Product Mktg. Manager, Synopsyschiportal
 
Dr.Efraim Aharoni, ESD Leader, TowerJazz
Dr.Efraim Aharoni, ESD Leader, TowerJazzDr.Efraim Aharoni, ESD Leader, TowerJazz
Dr.Efraim Aharoni, ESD Leader, TowerJazzchiportal
 
Eddy Kvetny, System Engineering Group Leader, Intel
Eddy Kvetny, System Engineering Group Leader, IntelEddy Kvetny, System Engineering Group Leader, Intel
Eddy Kvetny, System Engineering Group Leader, Intelchiportal
 
Dr. John Bainbridge, Principal Application Architect, NetSpeed
 Dr. John Bainbridge, Principal Application Architect, NetSpeed  Dr. John Bainbridge, Principal Application Architect, NetSpeed
Dr. John Bainbridge, Principal Application Architect, NetSpeed chiportal
 
Xavier van Ruymbeke, App. Engineer, Arteris
Xavier van Ruymbeke, App. Engineer, ArterisXavier van Ruymbeke, App. Engineer, Arteris
Xavier van Ruymbeke, App. Engineer, Arterischiportal
 
Asi Lifshitz, VP R&D, Vtool
Asi Lifshitz, VP R&D, VtoolAsi Lifshitz, VP R&D, Vtool
Asi Lifshitz, VP R&D, Vtoolchiportal
 
Zvika Rozenshein,General Manager, EngineeringIQ
Zvika Rozenshein,General Manager, EngineeringIQZvika Rozenshein,General Manager, EngineeringIQ
Zvika Rozenshein,General Manager, EngineeringIQchiportal
 
Lewis Chu,Marketing Director,GUC
Lewis Chu,Marketing Director,GUC Lewis Chu,Marketing Director,GUC
Lewis Chu,Marketing Director,GUC chiportal
 
Kunal Varshney, VLSI Engineer, Open-Silicon
Kunal Varshney, VLSI Engineer, Open-SiliconKunal Varshney, VLSI Engineer, Open-Silicon
Kunal Varshney, VLSI Engineer, Open-Siliconchiportal
 
Gert Goossens,Sen. Director, ASIP Tools, Synopsys
Gert Goossens,Sen. Director, ASIP Tools, SynopsysGert Goossens,Sen. Director, ASIP Tools, Synopsys
Gert Goossens,Sen. Director, ASIP Tools, Synopsyschiportal
 
Tuvia Liran, Director of VLSI, Nano Retina
Tuvia Liran, Director of VLSI, Nano RetinaTuvia Liran, Director of VLSI, Nano Retina
Tuvia Liran, Director of VLSI, Nano Retinachiportal
 
Sagar Kadam, Lead Software Engineer, Open-Silicon
Sagar Kadam, Lead Software Engineer, Open-SiliconSagar Kadam, Lead Software Engineer, Open-Silicon
Sagar Kadam, Lead Software Engineer, Open-Siliconchiportal
 
Ronen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
Ronen Shtayer,Director of ASG Operations & PMO, NXP SemiconductorRonen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
Ronen Shtayer,Director of ASG Operations & PMO, NXP Semiconductorchiportal
 
Prof. Emanuel Cohen, Technion
Prof. Emanuel Cohen, TechnionProf. Emanuel Cohen, Technion
Prof. Emanuel Cohen, Technionchiportal
 

More from chiportal (20)

Prof. Zhihua Wang, Tsinghua University, Beijing, China
Prof. Zhihua Wang, Tsinghua University, Beijing, China Prof. Zhihua Wang, Tsinghua University, Beijing, China
Prof. Zhihua Wang, Tsinghua University, Beijing, China
 
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
 
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
Prof. Steve Furber, University of Manchester, Principal Designer of the BBC M...
 
Prof. Uri Weiser,Technion
Prof. Uri Weiser,TechnionProf. Uri Weiser,Technion
Prof. Uri Weiser,Technion
 
Ken Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, FaradayKen Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, Faraday
 
Prof. Danny Raz, Director, Bell Labs Israel, Nokia
 Prof. Danny Raz, Director, Bell Labs Israel, Nokia  Prof. Danny Raz, Director, Bell Labs Israel, Nokia
Prof. Danny Raz, Director, Bell Labs Israel, Nokia
 
Marco Casale-Rossi, Product Mktg. Manager, Synopsys
Marco Casale-Rossi, Product Mktg. Manager, SynopsysMarco Casale-Rossi, Product Mktg. Manager, Synopsys
Marco Casale-Rossi, Product Mktg. Manager, Synopsys
 
Dr.Efraim Aharoni, ESD Leader, TowerJazz
Dr.Efraim Aharoni, ESD Leader, TowerJazzDr.Efraim Aharoni, ESD Leader, TowerJazz
Dr.Efraim Aharoni, ESD Leader, TowerJazz
 
Eddy Kvetny, System Engineering Group Leader, Intel
Eddy Kvetny, System Engineering Group Leader, IntelEddy Kvetny, System Engineering Group Leader, Intel
Eddy Kvetny, System Engineering Group Leader, Intel
 
Dr. John Bainbridge, Principal Application Architect, NetSpeed
 Dr. John Bainbridge, Principal Application Architect, NetSpeed  Dr. John Bainbridge, Principal Application Architect, NetSpeed
Dr. John Bainbridge, Principal Application Architect, NetSpeed
 
Xavier van Ruymbeke, App. Engineer, Arteris
Xavier van Ruymbeke, App. Engineer, ArterisXavier van Ruymbeke, App. Engineer, Arteris
Xavier van Ruymbeke, App. Engineer, Arteris
 
Asi Lifshitz, VP R&D, Vtool
Asi Lifshitz, VP R&D, VtoolAsi Lifshitz, VP R&D, Vtool
Asi Lifshitz, VP R&D, Vtool
 
Zvika Rozenshein,General Manager, EngineeringIQ
Zvika Rozenshein,General Manager, EngineeringIQZvika Rozenshein,General Manager, EngineeringIQ
Zvika Rozenshein,General Manager, EngineeringIQ
 
Lewis Chu,Marketing Director,GUC
Lewis Chu,Marketing Director,GUC Lewis Chu,Marketing Director,GUC
Lewis Chu,Marketing Director,GUC
 
Kunal Varshney, VLSI Engineer, Open-Silicon
Kunal Varshney, VLSI Engineer, Open-SiliconKunal Varshney, VLSI Engineer, Open-Silicon
Kunal Varshney, VLSI Engineer, Open-Silicon
 
Gert Goossens,Sen. Director, ASIP Tools, Synopsys
Gert Goossens,Sen. Director, ASIP Tools, SynopsysGert Goossens,Sen. Director, ASIP Tools, Synopsys
Gert Goossens,Sen. Director, ASIP Tools, Synopsys
 
Tuvia Liran, Director of VLSI, Nano Retina
Tuvia Liran, Director of VLSI, Nano RetinaTuvia Liran, Director of VLSI, Nano Retina
Tuvia Liran, Director of VLSI, Nano Retina
 
Sagar Kadam, Lead Software Engineer, Open-Silicon
Sagar Kadam, Lead Software Engineer, Open-SiliconSagar Kadam, Lead Software Engineer, Open-Silicon
Sagar Kadam, Lead Software Engineer, Open-Silicon
 
Ronen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
Ronen Shtayer,Director of ASG Operations & PMO, NXP SemiconductorRonen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
Ronen Shtayer,Director of ASG Operations & PMO, NXP Semiconductor
 
Prof. Emanuel Cohen, Technion
Prof. Emanuel Cohen, TechnionProf. Emanuel Cohen, Technion
Prof. Emanuel Cohen, Technion
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Power Optimization Through Manycore Multiprocessing

  • 1. Power Optimization Through Many-Core Multiprocessing Delivering High Performance in a Low Power World ChipEx2012 Haydn Povey Marketing Director – Implementation & Security ARM Processor Division May 2, 2012 1
  • 2. Billions of Connected Devices TAM(m) Form Factor 2015 Mobile Phones 1,750 Performance expectations continue to Media players 300 Mobile Computers 750 increase exponentially but power Desktop PCs 150 efficiency and scalability are Digital TV/STB 500 becoming formidable challenges Automotive Infotainment 100 Other* 450 Total 4 billion *Includes PND, photo-frames, etc ABI Research, IDC, Gartner and ARM forecasts May 2, 2012 2
  • 3. Historic Technology Drivers Functionality Functionality Functionality Functionality $ Power × $ Energy×$ 2010s Up to 1980s 1990s 2000s Mobile Mainframes/mini The PC Notebooks Computing May 2, 2012 3
  • 4. Low Power Positioned for the Future  Going forward low power is necessary for everything from microcontroller to servers  Low power is a design philosophy  Mindset, style, culture and working practice  Not something you change or acquire easily  Low power is a design reality  ARM is an efficient architecture Functionality  None of the legacy or CISC complexity Energy×$  Low cost is a design & manufacturing partnership  Time to volume not time to niche markets 2010s Mobile  Speed-binning not good enough for mass-market Computing May 2, 2012 4
  • 5. Limitations with Multiprocessing  Cost of offering the peak single thread performance on each CPU quickly exceeds chassis thermal limits  System and software bottlenecks limit overall scalability  Single die integration offered some roadmap May 2, 2012 5
  • 6. Evolution to Many-Core  Base theorem  Simpler and smaller processor designs require exponentially less energy to accomplish same amount of compute as a more complex and larger processor design.  “Approximate rule of thumb”  To increase performance 50% you double the power and area cost of the processor design  Quickly reaches point of diminishing returns May 2, 2012 6
  • 7. Challenge of Many-Core  Many-core definition  Use ‘lots’ of smaller, more efficient processors to achieve a higher aggregate performance than can be reached through multiprocessing  Smaller processors are not capable of executing the same single thread as a higher performance processor in the same time – so can’t execute existing applications effectively  Many threads can not easily be decomposed into simpler smaller tasks so as to benefit from multiprocessing on the smaller processor  Software development challenge May 2, 2012 7
  • 8. Software Data Decomposition Each data item is independent TASK CPU CPU CPU CPU TASK CPU Split large quantity of DATA TASK CPU into smaller chunks that can TASK CPU be operated in parallel TASK CPU May 2, 2012 8
  • 9. Software Task Decomposition Each task item is functionally independent TASK TASK TASK TASK TASK TASK TASK TASK TASK CPU CPU CPU CPU TASK TASK TASK CPU TASK TASK TASK CPU Functionally independent tasks can be executed concurrently TASK TASK TASK CPU TASK TASK TASK CPU May 2, 2012 9
  • 10. Functional Block Partitioning  Functional blocks are serially dependent  But temporary independent  Distribute different functional blocks across available processors  Split into defined functional threads  Uses passing of data blocks between threads to allocate work  Requires code changes and fine tuning Example: Real Time Video Encoding CPU2 Motion Compensation CPU0 CPU1 CPU3 Analogue Remove Remove Quantise Run-Length Buffer Video Inter-Frame Intra-Frame Samples Compress Store Sampling Redundancy Redundancy (Simplified MPEG encoding functional block diagram) TIME May 2, 2012 10
  • 11. Strategy Focus: The Thermal Wall  SOC sustained power is limited in mobile devices by thermals;  1.5W to 2W with low-cost POP and stacked memories  3W without stacked memories  Responsiveness is a must Power Burst for responsiveness (e.g. Browsing)  Complex active management is T >= Tjmax, Tskin needed “Opportunistic Residency” Managed Sustained Power Tj >= T max Tj < Tmax Un-managed Max Power (@Tjmax ) Sustained performance (e.g. HD Video Record , Gaming) Power Optimised Low End (e.g. e-Mail, Voice, MP3) May 2, 2012 Time 11
  • 12. Applying Nominal Use Case  Typical Day for Smartphone User  90 min voice calling  60 min email / social networking  30 min reading web  50 min angry birds / other gaming  90 min jogging while listening to music and logging GPS co-ordinates  10 min video recording  7 hrs sleep with music alarm clock  OS typically executing ~28 active processes  Apps synching in background May 2, 2012 12
  • 13. Use Case Measurements May 2, 2012 13
  • 14. Use Case Conclusion Profiled CPU Minutes % of CPU States Active Deep Sleep 1186 n/a 200MHz 154 60% 500 MHz 69 27% 800 MHz 18 7% 1000 MHz 4 2% 1200 MHz 10 4% If the phone was ARM big.LITTLE™ enabled... Active CPU time 12% big 88% LITTLE May 2, 2012 14
  • 15. Big.LITTLE Processing Multiprocessing Capable Many core Benefits May 2, 2012 15
  • 16. “big” Processor – Cortex-A15  ARM Cortex™-A15 Processor  3.5+ DMIPS/MHz  1-4 core MPCore™ configurable  Advanced Capabilities  Full ARMv7A architecture  Thumb®-2, TrustZone®, VFP, NEON™  Virtualization, large address extensions  AMBA® 4 ACE™ coherency  High Performance  Targeting 1.5GHz mobile implementation on 28nm  Hard Macro Quad-core Implementation @ 2GHz on 28HPM process May 2, 2012 16
  • 17. “LITTLE” Processor – Cortex-A7  ARM Cortex-A7 Processor  “LITTLE” to Cortex-A15 “big”  1-4 core MPCore configurable  Same Architectural Capabilities  Full ARMv7A architecture  Thumb-2, TrustZone, VFP, NEON  Virtualization, large address extensions  AMBA 4 ACE Coherency  ISA identical to Cortex-A15 processor  High Performance  Up to 1.2GHz for mobile implementation on 28nm May 2, 2012 17
  • 18. Comparison of big.LITTLE Pipelines May 2, 2012 18
  • 19. Performance Comparison May 2, 2012 19
  • 20. Power Efficiency Comparison May 2, 2012 20
  • 21. Software Use Models  Big.LITTLE Task Migration – One CPU active  Migrate between Cortex-A15 and Cortex-A7 depending on performance requirements  Big.LITTLE MP – Both CPUs can be active  Allocate threads that need high-performance to cortex-A15  Allocate threads that don’t require high performance to Cortex-A7 for best energy efficiency  AMBA 4 hardware coherency between Cortex-A-15 and Cortex-A7 May 2, 2012 21
  • 22. Task Migration Mechanics May 2, 2012 22
  • 23. CCI-400 Cache Coherent Interconnect AMBA 4 compliant, 128-bit single layer at up to ½ Cortex-A15 frequency GIC-400 Coherent Mali-T604 I/O CCI-400 2+3 (x3) Graphics DMA LCD Quad ACE-Lite device  2 full AMBA 4 ACE slave Quad Cortex- Cortex-A7 Configurable AXI 4/AXI 3/AHB : NIC-400 interfaces A15 ADB-400 ADB-400 ACE ACE AXI 4  +3 ACE-Lite I/O coherent ADB-400 ADB-400 MMU-400 MMU-400 MMU-400 slave interfaces 128b 128b 128b 128b 128 b  x3 master interfaces ACE ACE ACE-Lite + DVM ACE-Lite + DVM ACE-Lite + DVM CoreLink™ CCI-400 Cache Coherent Interconnect 128 bit @ up to 0.5 Cortex-A15 frequency CCI interfaces: ACE-Lite ACE-Lite ACE-Lite  AMBA 4 ACE and ACE- 128b 128b 128b Lite manage all ACE-Lite ACE-Lite AXI 4 NIC-400 coherency, sharability DMC-400 PHY PHY Configurable AXI 4/AXI 3/AHB/APB : and barriers DDR3/2 DDR3/2 Other Other LPDDR2/3 LPDDR2/3 Slaves Slaves May 2, 2012 23
  • 24. Summary  Multiprocessing enables the scaling of today’s application to grow while maintaining single thread performance  Addresses nicely the multi-tasking of stacked usage scenarios  Many-core brings the energy advantages of simpler and smaller processor but with the challenge of software complexity and lack of backwards compatibility with respect to single thread performance  The big.LITTLE processing as delivered by the ARM Cortex- A15 and Cortex-A7 offers both the performance and compatibility advantages of Multiprocessing along with the power efficiency and scalability advantages of many-core processing May 2, 2012 24

Editor's Notes

  1. The performance requirements of handsets and other mobile devices continues to grow exponentially with new applications, advanced gaming, and traditional PC-type functionality migrating rapidly to these platforms. While this capability enables the next wave of digital revolution it comes at the price of increased power usage and potential thermal challenges. This presentation will investigate the issues and compromises traditionally required to push performance to the next level, and the challenges we face as an industry if we do not architecturally innovate on the  implementation of  advance systems. We will demonstrate key advances in future processor designs and highlight the advantages and challenges faced as we look to deliver high performance in the low power world.
  2. EXAMPLE: Digital camera sport mode (burst mode). Take a lot of pictures and filter and JPEG on the go. Each picture is an independent work item, and can be processed in parallel. Instead of processing the pictures one at the time, one after the other, you can processes them in parallel. Quicker execution. Then switch-off cores and go to sleep. Low leakage and no dynamic power consumption. ANOTHER EXAMPLE: Complex post-processing on large RAW digital image. You can have more than one thread concurrently acting on the input data, and writing to the output image (reads can overlap).
  3. EXAMPLE: You have more than one application running at the same time. On a single core your multitasking OS will time-slice. On a multi-core things will happen in parallel. They will execute in less time, and be more responsive (ie the UI).
  4. EXAMPLE: VIDEO CODEC: This works because a video codec processes a stream. Within a single frame, and within a group of frames there are all sorts of dependencies BUT this is a stream, so while you are storing the result of a encoded frame, you can already be calculating the maths of the following frames, and sampling the next one and so on... Each core can have a task allocated to it, and the code needs to be modified so that these task synchronise and communicate between each other. Distribute different functional blocks of the decoder across available processors Multi-task pipeline: Eg taskA -&gt; taskB -&gt; (multiple)TaskC -&gt; taskD Split into defined functional threads Uses passing of data blocks between threads to allocate work
  5. Start with cheap package (high thermal resistance :15C/W Thetajb, 30C/W Thetaja) and 60C Tjb (so we use Thetajb) 1.5 to 2W with stacked memory limit (including the memory Tj max 85C). 3W w/o mems (20C advantage to play with assuming 105C max Tj SOC) NB: This is an issue we need to understand a lot better.
  6. What is DVM? Why does the slide say 3 masters and 2 slaves (looks like the other way around)