[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research)
[Harvard CS264] 09 - Machine Learning on Big Data: Lessons Learned from Googl...
Similar to [Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research)
Appsterdam talk - about the chips inside your phonemarcocjacobs
Similar to [Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research) (20)
Micro-Scholarship, What it is, How can it help me.pdf
[Harvard CS264] 15a - The Onset of Parallelism, Changes in Computer Architecture and Microsoft's Role in the Transition (David Rich, Microsoft Research)
1. David Rich
April 2011
The Onset of Parallelism
Changes in computer architecture and
Microsoft’s role in the transition
2.
3. Your introduction – some
questions…
! What kind of software do you see
yourself working on in the future?
Scientific? Web? Games? Business?
! Have you worked on a distributed app?
MPI?
! Have you used Visual Studio?
! Which will limit performance in the
future:
Power consumption? Latency? Lack of
parallelism? Bugs?
4.
5.
6.
7.
8.
9. ! Made in 1922 by Robert
Flaherty
! Considered to be the
first full length
documentary -though
some scenes were
staged
! http://en.wikipedia.org/
wiki/
Nanook_of_the_north
10.
11. Job Specialization
Bricklayer / Masonry Industrial Pipefitter
Carpenter (construction)
Caulker / Pointer / Cleaners Industrial Welder
Cement Mason (construction)
Construction Lineman Ironworker, Structural
Drywall Finisher/Taper Laborer
Marble Setter, Masonry
Electrician, Elevator Mechanic
Millwright Construction
Electrician, HVAC--
Environmental Control System Machinery Erector
Servicer & Installer Operating Engineer
Electrician, General Painter--Decorator / Traffic
Journeyman (Inside) Control Painter
Electrician, Limited Energy Pile Driver
Technician A Pipefitter
• What about?
Electrician, Limited Energy Plasterer
Technician B Plumber – Architect
Electrician, Limited Renewable Renewable Energy Technician
Energy Technician – Surveyor
Roofer
Electrician, Limited Residential
Scaffold Erector – Inspector
Electrician, Sign Maker-
Sheet Metal Worker
Erector / Sign Hanger / Sign
Solar Heating/Cooling
• Or people that work in
Assembler-Fabricator
Exterior/Interior Specialist
Systems Installer the companies that
Sprinkler Fitter
(metal framing & drywall)
Steamfitter
produce pre-fab
Finisher, Masonry
Floorcoverer
Technical Engineer components?
Terrazzo Worker, Masonry
Glazier (construction) – Pipes, wires, windows,
Heat / Frost Insulator Tilesetter, Masonry
Tree Trimmer, Power Line fixtures, etc.
Heavy Duty Repairer
Truck Driver (Heavy)
13. Preparing for the Future – What Will
Your Machine Look Like in 5 to 10
Years?
! Look at the Top500, predict and divide:
1. At any point in time, most organizations can
afford a machine which is 1/1000th the size of
the #1 machine on the Top500
2. Exaflop comes from 2x efficiency, 2x frequency and 100x the
cores
Today’s #1 Test: Is this within Exaflop Your Future
Tianhe-1A your budget? Platform
(1/1000th)
Perf: 2.5 PFs 250 TFs 1000PFs 1PF
Nodes 7,168 7 500,000? 500?
Cores X86: 86,016 X86: 86 -- ~14 Xeons 130 Million 130
GPU: GPU: 3,211 -- ~7 Tesla Thousand
3,211,164 Cores…
14. Core Counts On the Rise
3,500,000
Number of Cores in Top500 #1 Over Time Tianhe-1A
GPUs Get to #1...
3,000,000
250,000
Jaguar
2,500,000200,000
2,000,000
150,000 Blue
Cores
Gene
1,500,000 RoadRunner
100,000
1,000,000
50,000
500,000 ASCI Earth
ASCI Red White Simulator
Fujitsu
-
-
Jun 93
Nov 93
Jun 94
Nov 94
Jun 95
Nov 95
Jun 96
Nov 96
Jun 97
Nov 97
Jun 98
Nov 98
Jun 99
Nov 99
Jun 00
Nov 00
Jun 01
Nov 01
Jun 02
Nov 02
Jun 03
Nov 03
Jun 05
Jun 05
Nov 05
Jun 06
Nov 06
Jun 07
Nov 07
Jun 08
Nov 08
June 09
Nov 09
Jun 10
Nov 10
14
15. Good News:
Everybody gets a Petaflop!
Bad News:
You have to find 200,000 way parallelism
26. 2 years
6 years 12 MM users
2 Bil emails/day
7 years
5 Bil conf mins/yr.
11 years
Update 12 Bil queries/mo.
12 years
40 Petabytes/
mo.
13 years 500 Million active Windows Live IDs
550 MM users/
9.9 Billion messages / day via WL Messenger
mo.
Over 1 Million BPOS Users in 36 Countries
15 years
450 MM users
27. Microsoft’s Datacenter Evolution
Datacenter Co- Quincy and San Chicago and Dublin Modular Datacenter
Location Antonio Generation 3 Generation 4
Generation 1 Generation 2
Facility PAC
Server
Capacity
Time to Market
Lower TCO
28. Generation 3 - Chicago Data Center
$500M+ investment 1.5 million person hours-of-labor
3000 construction related jobs 3400 tons of steel
707,000 sq ft 190 miles of conduit
2400 tons of copper
7.5 miles of chilled water piping 26,000 cubic yards of concrete
29. Visual Studio
! Visual Studio is used by over half of the professional
programmers in the world
! VS2010 – released a year ago – has been downloaded
over 7 million times (more than 4 million extension
downloads)
! Main point: when we release a new capability into Visual
Studio it automatically gets large adoption
! (story about the ISC developers)
31. GPU Hardware Evolution
Year Version Defining Feature
1996 DirectX3 Hardware rasterization
1997 DirectX5 2 Shading options to select
1998 DirectX6 Multi-texture operations
1999 DirectX7 Vertex Processing in hardware
2000 DirectX8 Programmable Shaders: Vertex and Pixel
2002 DirectX9 High Level Shading Language, 32 instr
2003 DirectX9c 1000s of instructions per shader
2006 DirectX10 Unified Shaders: consistent shader models
2009 DirectX11 Compute Shader: explicit SIMD, random I/O
32.
33. The GPGPU Software Stack
High level tools and
! Windows has broad libraries
support at all levels: PGI “x86 CUDA”, CAPS,
Culatools, Volara,
• Supports all HW Acceleware
• Each of CUDA,
OpenCL and Low Level Programming
DirectCompute CUDA, OpenCL,
DirectCompute
• Almost all high level
tools and libraries
Hardware
GPU: AMD & NVIDIA
Mullticore x86: AMD &
Intel
34. DirectCompute
! What
is
DirectCompute?
• Microso3’s
GPGPU
Programming
Solu<on
• API
of
the
DirectX
Family
• Component
of
the
Direct3D
API
! Why
Use
DirectCompute
Over
Other
APIs?
• Interoperability
with
rest
of
2D,
3D,
Video
rendering
APIs
(display
computed
results)
• Cross-‐hardware
compa<bility
• Feature
compa<bility
guarantees
• Access
to
fixed-‐func<on
hardware
! Used
extensively
by
the
gaming
community
http://msdn.microsoft.com/directx
35. GPGPU Development on Windows
! Choice: CUDA, OpenCL or DirectCompute
! Tools and libraries;
Nsight and Visual Studio, PGI, CAPS, MATLAB,
Jacket, PyCUDA, Quantifi, CUDA.NET, Culatools,
NAG, Scicomp…
many others
! NVIDIA reports that over 80% of CUDA SDK
downloads are for Windows
37. MATLAB
Computer Cluster
Desktop Computer MATLAB Distributed Computing Server
Parallel Computing Toolbox
Windows
HPC Server
Workers
38.
39. Cluster HPC ISV / OSS
Excel MPI
SOA Applications Applications
HPC
Middleware Pack SOA
HPC Edition
Operating
Systems
On Premise Cluster Computing
*Note that in SP1 support for MPI applications on Azure does not exist.
40. Performance Parity Between
Linux and Windows
1 Million active Cells, 1000 wells, Blackoil
5500
5000
Elapsed Time [secs]
4500
4000
3500
3000
2500
2000
1500
1000
500
Cores 1 2 4 6 8 16 24 32 48
RedHat 5 U3 5200.43 3385.17 3095.72 2281.25 1790.59 1014.42 776.71 638.43 621.42
Win HPC R2 SP1 5404.38 3298.55 3175.9 2171.37 1736.11 992.82 745.43 610.88 549.74
Make your choice based on features and TCO…
42. ! Connects to the cluster as a SOA client
Excel SOA Client ! VSTO code in workbook calls out to SOA Service
! Input and output managed by Excel developer
! Run multiple instances of Excel 2010 on an HPC Cluster
Excel Workbook on ! Each instance runs an iteration of the same workbook
the Cluster ! Can be launched from Excel 2010 or a Windows program
NEW ! Excel Dialog Suppression
! Run User Defined Functions in parallel on a cluster
! Excel 2010 includes a new API and options for HPC
Excel UDF on the cluster
Cluster ! Support for .XLL files developed through Excel SDK
NEW
! Easy to develop on a desktop and then deploy to a cluster
43. ! Use Azure servers to run HPC compute Jobs
! Can be used to “burst-out” to the cloud to handle peak demand
! Can create clusters that include dedicated on-premise servers, non-dedicated
workstations and shared Azure servers
! Jobs can run unchanged across all 3 types of compute nodes (no support for MPI in SP1)
! Azure nodes are added to cluster using the Administration console (just like Workstation nodes)
HPC Clients
Azure
Head & Broker Nodes
Jobs
Requests
Azure Gateway
44. Compute Nodes On-Premise and in Azure Simultaneously
HPC Head Node
Desktops
• “Burst” into cloud on-
demand while keeping
control over data and
corporate policies
Broker Node
On-premise Compute Nodes • Pay only for what you use
• A stepping stone to hybrid
Azure and public clouds.
• Dynamically adjust how
Azure much runs on-premise and
in the cloud
Compute Proxies
Compute Instances
45. Parallel Development
“Combined with Intel Parallel Studio, I think it is
reasonable to say that Windows has the richest and most
complete set of tools for multicore programming”. --
James Reinders, Intel, 12-April-2010
46. Solution Begins with DEVELOPERS
Make it easier to express
and manage the
correctness, efficiency and
maintainability of
parallelism on Microsoft
platforms for developers of
all skill levels
Enable developers to Simplify the
express parallelism process of
easily and focus on designing and
Improve the testing parallel
the problem to be efficiency and
solved applications
scalability of
parallel
applications
47. Visual Studio 2010
Tools, Programming Models, Runtimes
Tools Programming Models
Parallel LINQ Parallel
Parallel
Agents
Debugger
Task Parallel Pattern
Tool Library
Library Library
Data Structures
Data Structures
Windows
Visual
Studio .NET Framework 4 Visual C++ 10
Concurrency Runtime
IDE
Profiler ThreadPool
Concurrenc Task Scheduler
y Task Scheduler
Analysis
Resource Manager Resource Manager
Operating
UMS
System
Windows
Threads
Threads
Managed Native Tooling