2. Two x86 Cores Tuned for Target Markets Mainstream Client and Server Markets “Bulldozer” Performance & Scalability Low PowerMarkets Small Die Area Cloud Clients Optimized “Bobcat” Flexible, Low Power & Small
3. The Bulldozer Architecture “Bulldozer” An innovative design that delivers true core functionality by pairing two integer execution cores with components that can be shared as needed Instruction Set extensions to increase capability of the design Extensive new power efficiency innovations Manufactured on the latest 32nm SOI technology Fetch Decode IntegerScheduler IntegerScheduler FP Scheduler Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline 128-bitFMAC 128-bitFMAC L1 DCache L1 DCache Shared L2 Cache
10. DedicatedComponents Shared at the module level Shared at the chip level Sharing Resources Fetch The Bulldozer architecture has shared and dedicated components The shared components: Help reduce power consumption Help reduce die space (cost) The dedicated components: Help increase performance and scalability Bulldozer dynamically switches between shared and dedicated components to maximize performance per watt Decode FP Scheduler IntScheduler IntScheduler Core 1 Core 2 L1 DCache L1 DCache 128-bit FMAC 128-bit FMAC Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Shared L2 Cache Shared L3 Cache and NB
11. Building a Bulldozer-Based Chip Fetch Decode IntScheduler IntScheduler FP Scheduler Shared L3 Cache and NB Integrated Memory Controller Integrated Northbridge Controller Each chip is composed of multiple bulldozer modules Module divisions are transparent to shared hardware, operating system or application The modular architecture speeds chip development and increases product flexibility
12. Bulldozer Summary “Bulldozer” Bulldozer is the next generation of AMD high-performance processor core technology This new core is a completely new design from the ground up Bulldozer will be utilized in client and server designs in 2011 AMD delivers 33% more cores and an estimated 50% increase in throughput in the same power envelope as Magny-Cours* Fetch Decode IntegerScheduler IntegerScheduler FP Scheduler Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline Pipeline 128-bitFMAC 128-bitFMAC L1 DCache L1 DCache Shared L2 Cache *Based on internal AMD modeling using benchmark simulations
13. Two x86 Cores Tuned for Target Markets Mainstream Client and Server Markets “Bulldozer” Performance & Scalability Low PowerMarkets Small Die Area Cloud Clients Optimized “Bobcat” Flexible, Low Power & Small
14. Bobcat Design Goals A small, efficient, low power x86 core Excellent performance Synthesizable with small number of custom arrays Easily Portable across process technologies
22. Bobcat Core Overview Advanced Micro-architecture Dual x86 Decode Advanced Branch Predictor Full OOO instruction execution Full OOO load/store engine High Performance Floating Point AMD64 64-bit ISA SSE1,2,3, SSSE3 ISA Secure Virtualization 32kb L1s Low Power Design Power Optimized Execution Micro-architecture that minimizes data movement and unnecessary reads Clock gating, Power gating System Low Power States Small Core Area efficient balance of high performance and low power ICACHE L2 Bobcat Low Power Core Fetch BU Decode FP Scheduler Address Scheduler Integer Scheduler A Pipe M Pipe I Pipe Store Pipe I Pipe Load Pipe DCACHE
26. Single die designSystem Memory SIMD Engine Array X86 CPU Cores High Performance Bus&Memory Controller Unified Video Decoder Platform Interfaces
27. Bobcat Summary Bobcat is the CPU engine for AMD’s first APU Estimate 90% of the performance of AMD’s current mainstream notebook CPU in less than half the area and a fraction of the power* Highly portable across designs and manufacturing technologies Sub-one watt capable core *Based on internal AMD modeling using benchmark simulations
Before we start: There is a lot of technical detail available below what we are about to show you, this presentation is intended to give you a high level overview of both designs and AMD’s expectations for each. The engineering detail will be presented by the two chief architects for the designs at the upcoming HotChips conference on the Stanford Campus next week. Please feel free to ask detailed questions along the way if you would like to hear more about a specific feature or operation. At a higher level, this shows innovation at AMD remains alive and well. Please think of these core architectures within the context of the new, revitalized AMD built around our focus as a design company since the spin-off of GlobalFoundries, our new VISION platforms and marketing program, and our Fusion APU strategy. “Bobcat” and “Bulldozer” are the latest chapters in that story and form a solid foundation for AMD products for years to come.
The two cores, although both x86 compatible, are completely different for a reason. The workloads, end equipment markets and usage scenarios require different approaches and that’s what AMD recognized at the onset of this effort. Think of “Bulldozer”, just as the name implies, as the heavy lifter. It will appear in server, as well as mainstream and high performance client products. “Bobcat” is small and highly efficient. It utilizes those characteristics to address the highly portable netbook / notebook markets.So, 2 different designs, with different goals in mind.
So starting with Bulldozer, here’s a block diagram that shows its distinguishing features. We are taking 2 of the most frequently used parts of processor, the integer cores and adding a hefty, shared floating point capability to deliver 2 robust threads much more efficiently than Hyper-threading where a single integer core is used.We have also added a number of instruction set extensions to increase the design’s capabilities and done extensive work on power management to improve performance per watt even further.The 32nm process technology delivers additional savings in terms of area and power consumption; this our first process technology to utilize high-K metal gate.
The previous slide hinted to a key differentiator of Bulldozer that bears more explanation.A big conversation in the industry these last few years is how to continue to increase processing performance as we reach plateaus in clock speed.Essentially there have been two approaches used – SMT, which stands for Simultaneous Multi-Threading and CMP, which stands for Core Multi-Processing. CMP is probably the easiest to understand, because it can be described as “if one core is good, two must be better” and it is.. So CMP architectures take a complete core and replicate it.SMT is a little more complex to picture, but because of the way instructions are decoded and executed, it’s possible to have two concurrently tasks running on a single core.Bulldozer takes a third approach..
On the first Bulldozer slide we mentioned “true core functionality” – so what exactly does that mean. There are two complete integer units in the Bulldozer design for the most common type of compute tasks, so it functions like a dual-core design allowing maximum performance rather than pushing two threads through a single core. However, we don’t replicate everything on the core like a CMP either. Floating point operations on Bulldozer use a shared scheduler and two 128-bit Multiply and Accumulate Units. Extensive research went into analyzing workloads ahead of this design, so we feel the division between shared and discrete components is the right one. And by the way, the idea of sharing hardware is hardly new, right? Shared Cache, the Northbridge, etc. have been shared across multi-core designs for years already.
You can see that larger view of shared hardware components here as we raise our view up to the chip level. On an 8 core Bulldozer design you can see how Bulldozer “modules” are grouped together to share L3 cache and Northbridge, and combined with a memory controller and Northbridge controller to form the major components of the chip. And again, the OS and applications see true cores; the shared floating point components and L2 cache are transparent to the code.
So that covers Bulldozer, now let’s cover AMD’s new core design specifically for the low-power x86 market. “Bobcat” is small and highly efficient. It utilizes those characteristics to address the highly portable netbook / notebook markets.
Bobcat is a little bit more straight-forward to understand than Bulldozer, but it too, has some highly differentiated features to it. And these were stated from the very beginning because of AMD’s understanding of the final products requirements.
So those were the goals. Where did we end up? Bobcat can operate below one-watt (with a resulting reduction in performance) – that’s not a statement about any resulting products, but it does give you some sense of the core’s power envelope. The next bullets here are critical – out-of-order execution means higher performance than an in-order execution core like Atom, pure and simple. Synthesizeable means it uses few custom logic arrays that are more dependent on the specifics of the underlying manufacturing technology for optimal performance and that it can be more easily integrated into SoC designs for faster turnaround of new variations.No limitations on the instruction set either, including support for virtualization.AMD estimates 90% of today’s mainstream CPU performance in less than half the silicon area and a fraction of the power.Will appear early next year in Ontario, which is ahead of schedule.
Technical details if needed.
The need for optimal energy-efficient balance of CPU and GPU represents the beginning of a new era of computing in 2011, the era of the accelerated processing unit or APU, which combines both on a single piece of silicon.The Fusion of CPU and GPU compute power is what the next chapter in visual computing requires – a powerful visual computing experience at home or on the go without compromise. Our AMD Fusion™ design is driven by mobility and is based on a low-power visual compute architecture that will enhance active and resting battery life while increasing both CPU and GPU performance. This is the culmination of the vision of ‘One AMD’ and only AMD can deliver the GPU and CPU combination that will be the future of computing