Valladolid final-septiembre-2010

“ Evolución de la Arquitectura de Computadores ” Valladolid, Septiembre 2010 Prof. Mateo Valero Director

Technological Achievements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Power Density 1 10 100 1000           i386 i486 Pentium® Pentium® Pro Pentium® II Pentium® III Hot plate Nuclear Reactor Sun's Surface Rocket Nozzle * “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – Fred Pollack, Intel Corp. Micro32 conference keynote - 1999. Pentium® 4 Watts/cm 2

Technology Outlook Shekhar Borkar, Micro37, P Medium High Very High Variability Energy scaling will slow down >0.5 >0.5 >0.35 Energy/Logic Op scaling 0.5 to 1 layer per generation 8-9 7-8 6-7 Metal Layers 1 1 1 1 1 1 1 1 RC Delay Reduce slowly towards 2-2.5 <3 ~3 ILD (K) Low Probability High Probability Alternate, 3G etc 128 11 2016 High Probability Low Probability Bulk Planar CMOS Delay scaling will slow down >0.7 ~0.7 0.7 Delay = CV/I scaling 256 64 32 16 8 4 2 Integration Capacity (BT) 8 16 22 32 45 65 90 Technology Node (nm) 2018 2014 2012 2010 2008 2006 2004 High Volume Manufacturing

We have seen increasing number of gates on a chip and increasing clock speed. Heat becoming an unmanageable problem, Intel Processors > 100 Watts We will not see the dramatic increases in clock speeds in the future. However, the number of gates on a chip will continue to increase. Increasing the number of gates into a tight knot and decreasing the cycle time of the processor Lower Voltage Increase Clock Rate & Transistor Density Core Cache Core Cache Core C1 C2 C3 C4 Cache C1 C2 C3 C4 Cache C1 C2 C3 C4 C1 C2 C3 C4 C1 C2 C3 C4 C1 C2 C3 C4

Increasing chip performance: Intel´s Petaflop chip ,[object Object],[object Object],[object Object],[object Object],ICPP-2009, September 23rd 2009 Thanks to Intel

NVIDIA Fermi Architecture Unified 768KB L2 cache serves all threads GigaThread hardware scheduler assigns Thread Blocks to SMs Wide DRAM interface provides 12 GB/s bandwidth 16 Streaming- Multiprocessors (512 cores) execute Thread Blocks 620 Gigaflops

Cell Broadband Engine TM : A Heterogeneous Multi-core Architecture * Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc.

Intel/UPC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Looking at the Gordon Bell Prize ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Jack Dongarra

BSC-CNS e iniciativas a nivel internacional: IESP Build an international plan for developing the next generation open source software for scientific high-performance computing Improve the world’s simulation and modeling capability by improving the coordination and development of the HPC software environment

1 EFlop/s “Clean Sheet of Paper” Strawman ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Sizing done by “balancing” power budgets with achievable capabilities Largely due to Bill Dally Courtesy of Peter Kogge, UND

Education for Parallel Programming Multicore-based pacifier I multi-core programming I many-core programming We all massive parallel prog. I games

Initial developments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

In 50 Years ... Eniac , Eckert&Mauchly1946 ... 18000 vacuum tubes Pentium III playing DVD, 1998 ... 24 M transistors

Technology Trends: Microprocessor Capacity 2X transistors/Chip Every 1.5 years Called “ Moore’s Law ” Moore’s Law Microprocessors have become smaller, denser, and more powerful. Not just processors, bandwidth, storage, etc Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.

Computer Architecture Achievements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Virtual Worlds have huge potential beyond Games ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Jaguar @ ORNL: 1.75 PF/s Jack Dongarra

MareIncognito: Project structure 4 relevant apps: Materials: SIESTA Geophisics imaging: RTM Comp. Mechanics: ALYA Plasma: EUTERPE General kernels Automatic analysis Coarse/fine grain prediction Sampling Clustering Integration with Peekperf Contention, Collectives Overlap computation/communication Slimmed Networks Direct versus indirect networks Contribution to new Cell design Support for programming model Support for load balancing Support for performance tools Issues for future processors Coordinated scheduling: Run time, Process, Job Power efficiency StarSs: CellSs, SMPSs [email_address] OpenMP++ MPI + OpenMP/StarSs Performance analysis tools Processor and node Load balancing Interconnect Applications Programming models Models and prototype

[object Object],[object Object],[object Object],[object Object],BSC-CNS: vertebrador de la investigación en supercomputación en España Application scope “Earth Sciences” Application scope “Astrophysics” Application scope “Engineering” Application scope “Physics” Application scope “Life Sciences” Compilers and tuning of application kernels Programming models and performance tuning tools Architectures and hardware technologies

High Performance Computing as key-enabler 1980 1990 2000 2010 2020 2030 Capacity: # of Overnight Loads cases run Available Computational Capacity [Flop/s] CFD-based LOADS & HQ Aero Optimisation & CFD-CSM Full MDO Real time CFD based in flight simulation x10 6 1 Zeta (10 21 ) 1 Peta (10 15 ) 1 Tera (10 12 ) 1 Giga (10 9 ) 1 Exa (10 18 ) 10 2 10 3 10 4 10 5 10 6 LES CFD-based noise simulation RANS Low Speed RANS High Speed HS Design Data Set UnsteadyRANS ,[object Object],[object Object],[object Object],[object Object],Capability achieved during one night batch Courtesy AIRBUS France

Diseño del ITER TOKAMAK (JET, Oxford)

Supercomputación, teoría y experimentación Cortesia de IBM

Weather, Climate and Earth Sciences: Roadmap ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Valladolid final-septiembre-2010

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Valladolid final-septiembre-2010

Semelhante a Valladolid final-septiembre-2010 (20)

Mais de TELECOM I+D

Mais de TELECOM I+D (20)

Valladolid final-septiembre-2010

Notas do Editor