SlideShare uma empresa Scribd logo
1 de 28
The
     Rise
        of
  Parallel
Computing
       Ben
      Baker
Moore’s Law

"The number of transistors incorporated in a chip
will approximately double every 24 months."

             Gordon Moore, Intel Co-Founder
                 Originally published in 1965
So What’s the Problem?

‱ Can continue to increase transistors per Moore’s Law
‱ Cannot continue to increase power or chips will melt
   – Power steadily rose with new chips until ~2005 – now 1 volt
‱ Cannot continue to scale processor frequency
   – Have you seen any 10 GHz chips?


            Moore’s Law gave no prediction of
            continued performance increases
Time to “Take the Leap”
“We have reached the limit of what is possible with
one or more traditional, serial central processing
units, or CPUs. It is past time for the computing
industry – and everyone who relies on it for
continued improvements in productivity, economic
growth and social progress – to take the leap into
parallel processing.”

        Bill Dally - Chief Scientist at NVIDIA and Professor at Stanford University
http://www.forbes.com/2010/04/29/moores-law-computing-processing-opinions-contributors-bill-dally.html
Additional Resources

‱ Stanford course available on iTunes U
‱   http://itunes.apple.com/us/itunes-u/programming-massively-parallel/id384233322

     – Programming Massively Parallel Processors with
       CUDA
     – Lectures 1 and 13 are great introductions
          ‱ Lecture 13 – The Future of Throughput Computing (Bill Dally)
          ‱ Lecture 1 – Introduction to Massively Parallel Computing
Guiding Principles
‱ Performance = Parallelism
  – Single-threaded processor performance has flat-
    lined at 0-5% annual growth since ~2005
‱ Efficiency = Locality
  – Chips are power limited with most power spent
    moving data around
Three Types of Parallelism
‱ Instruction-level parallelism
  – Out of order execution, branch prediction, etc.
  – Opportunities decreasing
‱ Data-level parallelism
  – SIMD (Single Instruction Multiple Data), GPUs, etc.
  – Opportunities increasing
‱ Thread-level parallelism
  – Multithreading, multi-core CPUs, etc.
  – Opportunities increasing
Taking the Leap

‱ Three things are required
  – Lots of processors
  – Efficienct memory storage
  – Programming system that abstracts it
CPU VS. GPU ARCHITECTURE
              CPU                          GPU
‱   General purpose          ‱   Special purpose
    processors                   processors
‱   Optimized for            ‱   Optimized for data level
    instruction level            parallelism
    parallelism              ‱   Many smaller processors
‱   A few large processors       executing single
    capable of multi-            instructions on multiple
    threading                    data (SIMD)
High Performance GPU Computing
‱ GPUs are getting faster more quickly than CPUs
‱ Being used in industry for weather simulation,
  medical imaging, computational finance, etc.
‱ Amazon is now offering access to NVIDIA Tesla
  GPUs in the cloud as a service ($ vs Âą per hour)
‱ GPUs are being used as general purpose parallel
  processors – http://gpgpu.org
Examples

‱   CUDA – NVIDIA
‱   C++ AMP – Microsoft
‱   OpenCL – Open source
‱   NPP – NVIDIA (Research done at FamilySearch)
CUDA
‱ Compute Unified Device Architecture
‱ Proprietary NVIDIA extensions to C for
  running code on NVIDIA GPUs
‱ Other language bindings
  – Java – jCUDA, JCuda, JCublas, JCufft
  – Python – PyCUDA, KappaCUDA
  – .NET – CUDAfy.NET, CUDA.NET
  – Ruby – KappaCUDA
  – More – Fortran, Perl, Mathematica, MATLAB, etc.
C for CUDA Example
// Compute vector sum c = a + b
// Each thread performs one pair-wise addition
__global__ void vector_add(float* A, float* B, float* C)
{
   int i = threadIdx.x + blockDim.x * blockIdx.x;
   C[i] = A[i] + B[i];
{


int main()
{
   // Allocate and initialize host (CPU) memory
   float* hostA = 
, *hostB = 
;

    // Allocate device (GPU) memory
    cudaMalloc((void**) &deviceA, N * sizeof(float));
    cudaMalloc((void**) &deviceB, N * sizeof(float));
    cudaMalloc((void**) &deviceC, N * sizeof(float));

    // Copy host memory to device
    cudaMemcpy(deviceA, hostA, N * sizeof(float), cudaMemcpyHostToDevice));
    cudaMemcpy(deviceB, hostB, N * sizeof(float), cudaMemcpyHostToDevice));

    // Run N/256 blocks of 256 threads each
    vector_add<<< N/256, 256>>>(deviceA, deviceB, deviceC);
}
Heterogeneous Computing with
        Microsoft C++ AMP
‱ AMP = Accelerated Massive Parallelism
‱ Designed to take advantage of all the available compute
  resources (CPU, integrated & discrete GPUs)
‱ Coming in the next version of Visual Studio and C++ in
  the next year or two
‱ Cool demo
   http://hothardware.com/News/Microsoft-Demos-C-AMP-Heterogeneous-Computing-at-AFDS/
EXAMPLE – C++ AMP
void MatrixMult(float* C, const vector<float>&A, const vector<float>&B, int M, int N, int W)
{
   for (int y = 0; y < M; y++) {
      for (int x = 0; x < N; x++) {
         float sum = 0;
            for (int i = 0; i < W; i++)
               sum += A(y*W + i] * B[i*N + x);
            C[y*N + x] = sum;
      }
   }
}



void MatrixMult(float* C, const vector<float>&A, const vector<float>&B, int M, int N, int W)
{
   array_view<const float, 2> a (M, W, A), b(W, N, B);
   array_view<writeonly<float>, 2>c((M, N, C);

    parallel_for_each(c.grid, [=](index<2> idx) restrict(direct3d) {
        float sum = 0;
        for (int i = 0; i < a.x; i++)
           sum += a(idx.y, i) * b(i, idx.x);
        c[idx] = sum;
    });
}
OpenCL

‱ Royalty free, cross-platform, vendor neutral
‱ Managed by Khronos OpenCL working group
  (www.khronos.org/opencl)
‱ Design goal to use all computational resources
  – GPUs and CPUs are peers
‱ Based on C
‱ Abstract the specifics of underlying hardware
Example – OpenCL
void trad_mul(int n, const float *a, const float* b, float* c)
{
  for (int i = 0; i < n; i++)
    c[i] = a[i] * b[i];
}



 kernel void dp_mul(global const float *a, global const float* b, global float* c)
{
  int id = get_global_id(0);
  c[id] = a[id] * b[id];
} // Execute over “n’ work-items
Image Processing Flow at FamilySearch
                                   Preservation Storage
                                   (Lossless JPEG-2000)


Image Capture
(Uncompressed TIFF)   Image
                      Post-Processing
Microfilm Scanners    (DPC)
Digital Cameras

                                        Distribution
                                        Storage
                                        (JPEG - original size)
                                        (JPEG - thumbnails)
Digital Processing Center (DPC)
‱ Collection of servers in a data center used by FamilySearch
  to continuously process millions of images annually
‱ Image post processing operations performed include
   –   Automatic skew correction
   –   Automatic document cropping
   –   Image sharpening
   –   Image scaling (thumbnail creation)
   –   Encoding into other image formats
‱ CPU is a current bottleneck (~12 sec/image)
‱ Processing requirements continuously rising (number of
  images, image size and number of color channels)
Computer Graphics vs.
          Computer Vision
‱ Approximate inverses of each other:
   – Computer graphics – converting “numbers into pictures”
   – Computer vision – converting “pictures into numbers”
‱ GPUs have traditionally been used for computer
  graphics – (Ex. Graphics intensive computer games)
‱ Recent research, hardware and software are using
  GPUs for computer vision (Ex. Using Graphics
  Devices in Reverse)
‱ GPUs generally work well when there is ample data-
  level parallelism
IMPLEMENTATION OPTIONS
Rack Mount Servers                   Personal Supercomputer
‱ Several vendors provide solutions. ‱ GPUs for computing can be placed in
  (Ex. One is a 3U rack mount unit      a standard workstation. Several
  capable of holding 16 GPUs            vendors provide solutions.
  connected to 8 servers)            ‱ Each Tesla GPU requires
‱ “Compared to typical quad-core         – Available double-wide PCIe slot
  CPUs, Tesla 20 series computing        – Two 6-pin or one 8-pin PCIe power
  systems deliver equivalent                connectors and sufficient wattage
  performance at 1/10th the cost         – Recommend 4GB RAM per card, at
  and 1/20th the power                      least 2.33 GHz quad-core CPU and
  consumption.” (NVIDIA)                    64-bit Linux or Windows
                                     ‱ “250x the computing performance of
                                        a standard workstation” (NVIDIA)
Image Processing Performance
                 with IPP and NPP
‱ FamilySearch currently uses Intel’s IPP
   – Intel Performance Primitives
   – Optimize operations on Intel CPUs
   – Closed source, licensed


‱ NVIDIA has produced a similar library called NPP
   –   NVIDIA Performance Primitives
   –   Optimize operations on NVIDIA GPUs (CUDA underneath)
   –   Higher level abstraction to perform image processing on GPUs
   –   No license for SDK
EXAMPLE – NPP
                                                       // Declare a host object for an 8-bit grayscale image
                                                       npp::ImageCPU_8u_C1 hostSrc;

                                                       // Load grayscale image from disk
                                                       npp::loadImage(sFilename, hostSrc);

                                                       // Declare a device image and upload from host
                                                       npp::ImageNPP_8u_C1 deviceSrc(hostSrc);

 [Create padded image]

 [Create Gaussian kernel]                             
 [Create padded image]
                                                       
 [Create Gaussian kernel]

                                                       // Copy kernel to GPU
                                                       cudaMemcpy2D(deviceKernel, 12, hostKernel, kernelSize.width
                                                            * sizeof(Npp32s), kernelSize.width * sizeof(Npp32s),
                                                            kernelSize.height, cudaMemcpyHostToDevice);

// Allocate blurred image of appropriate size          // Allocate blurred image of appropriate size (on GPU)
Ipp8u* blurredImg = ippiMalloc_8u_C1(img.getWidth(),   npp::ImageNPP_8u_C1 deviceBlurredImg(imgSz.width,
     img.getHeight(), &blurredImgStepSz);                   imgSz.height);

// Perform the filter                                  // Perform the filter
ippiFilter32f_8u_C1R(paddedImgData,                    nppiFilter_8u_C1R(paddedImg.data(widthOffset,
     paddedImage.getStepSize(), blurredImg,                 heightOffset), paddedImg.pitch(),
     blurredImgStepSz, imgSz, kernel, kernelSize,           deviceBlurredImg.data(), deviceBlurredImg.pitch(),
     kernelAnchor);                                         imgSz, deviceKernel, kernelSize, kernelAnchor,
                                                            divisor);

                                                       // Declare a host image for the result
                                                       npp::ImageCPU_8u_C1
                                                            hostBlurredImage(deviceBlurredImg.size());

                                                       // Copy the device result data into it
                                                       deviceBlurredImg.copyTo(hostBlurredImg.data(),
                                                            hostBlurredImg.pitch());
Performance Testing Methodology
‱ Test System Specifications
    – Dual Quad Core Intel¼ Xeon¼ 2.80GHz i7 CPUs (8 cores
      total)
    – 6 GB RAM
    – 64-bit Windows 7 operating system
    – Single Tesla C1060 Compute Processor (240 processing cores
      total)
    – PCI-Express x16 Gen2 slot
‱ Three representative grayscale images of increasing size
    – Small image – 1726 x 1450 (2.5 megapixels)
    – Average image – 4808 x 3940 (18.9 megapixels)
    – Large image – 8966 x 6132 (55.0 megapixels)
‱ Results for each image repeated 3 times and averaged
‱ Transfer time to/from the GPU is considered part of all
  GPU operations
‱   Combining operations minimizes GPU/CPU transfers
‱   5 – 6x speed up, increasing slightly with image size
AMDAHL’S LAW

Speeding up 25% of an
overall process by 10x is
less of an overall
improvement than
speeding up 75% of an
overall process by 1.5x
Takeaways
‱ Significant performance increases can be realized through
  parallelization – may become only way in the future
‱ GPUs are transforming into general purpose data-parallel
  computational coprocessors and outstripping advances in multi-
  core CPUs
‱ Languages, tools and APIs for parallel computing remain relatively
  immature, but are improving rapidly
‱ Relatively small learning curve
    – For image processing, NPP’s API nearly perfectly matches Intel’s IPP
    – New paradigms around copying to/from GPU and allocating memory
    – Can use programming languages familiar to developers without
      understanding intricacies of GPU architectures
    – Does require rethinking of algorithms to be parallel and building the
      computation around the data

Mais conteĂșdo relacionado

Mais procurados

Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009Randall Hand
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unitShashwat Shriparv
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with JavaKelum Senanayake
 
GPU Computing
GPU ComputingGPU Computing
GPU ComputingKhan Mostafa
 
Google warehouse scale computer
Google warehouse scale computerGoogle warehouse scale computer
Google warehouse scale computerTejhaskar Ashok Kumar
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)Amal R
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An IntroductionDhan V Sagar
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing ClusterJax Jargalsaikhan
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...Edge AI and Vision Alliance
 

Mais procurados (20)

Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
Cuda
CudaCuda
Cuda
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
 
Google warehouse scale computer
Google warehouse scale computerGoogle warehouse scale computer
Google warehouse scale computer
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
GPU - An Introduction
GPU - An IntroductionGPU - An Introduction
GPU - An Introduction
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 

Destaque

Viewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View PaperViewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View Paperbakers84
 
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 PresentationA Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 Presentationbakers84
 
A Whirlwind Tour of FamilySearch Resources - 2012
A Whirlwind Tour of FamilySearch Resources - 2012A Whirlwind Tour of FamilySearch Resources - 2012
A Whirlwind Tour of FamilySearch Resources - 2012bakers84
 
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL ListA Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL Listbakers84
 
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'bakers84
 
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach WebinarFamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach Webinarbakers84
 
How to Get More Involved in Your Family History Despite Limited Available Time
How to Get More Involved in Your Family History Despite Limited Available TimeHow to Get More Involved in Your Family History Despite Limited Available Time
How to Get More Involved in Your Family History Despite Limited Available Timebakers84
 
Merging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - PresentationMerging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - Presentationbakers84
 
Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!bakers84
 
FamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - PresentationFamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - Presentationbakers84
 
Start and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - PresentationStart and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - Presentationbakers84
 
What I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family HistoryWhat I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family Historybakers84
 
The Evolution of Technology and Family History
The Evolution of Technology and Family HistoryThe Evolution of Technology and Family History
The Evolution of Technology and Family Historybakers84
 

Destaque (13)

Viewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View PaperViewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View Paper
 
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 PresentationA Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
 
A Whirlwind Tour of FamilySearch Resources - 2012
A Whirlwind Tour of FamilySearch Resources - 2012A Whirlwind Tour of FamilySearch Resources - 2012
A Whirlwind Tour of FamilySearch Resources - 2012
 
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL ListA Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
 
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
 
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach WebinarFamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
 
How to Get More Involved in Your Family History Despite Limited Available Time
How to Get More Involved in Your Family History Despite Limited Available TimeHow to Get More Involved in Your Family History Despite Limited Available Time
How to Get More Involved in Your Family History Despite Limited Available Time
 
Merging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - PresentationMerging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - Presentation
 
Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!
 
FamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - PresentationFamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - Presentation
 
Start and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - PresentationStart and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - Presentation
 
What I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family HistoryWhat I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family History
 
The Evolution of Technology and Family History
The Evolution of Technology and Family HistoryThe Evolution of Technology and Family History
The Evolution of Technology and Family History
 

Semelhante a The Rise of Parallel Computing

lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
GPU Algorithms and trends 2018
GPU Algorithms and trends 2018GPU Algorithms and trends 2018
GPU Algorithms and trends 2018Prabindh Sundareson
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unitDayakar Siddula
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUsShree Kumar
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
GPU Renderfarm with Integrated Asset Management & Production System (AMPS)
GPU Renderfarm with Integrated Asset Management & Production System (AMPS)GPU Renderfarm with Integrated Asset Management & Production System (AMPS)
GPU Renderfarm with Integrated Asset Management & Production System (AMPS)Budianto Tandianus
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCLUnai Lopez-Novoa
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de InformĂĄtica UCM
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningSri Ambati
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science Domino Data Lab
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
GTC Taiwan 2017 朹 Google Cloud ç•¶äž­äœżç”š GPU é€ČèĄŒæ•ˆèƒœæœ€äœłćŒ–
GTC Taiwan 2017 朹 Google Cloud ç•¶äž­äœżç”š GPU é€ČèĄŒæ•ˆèƒœæœ€äœłćŒ–GTC Taiwan 2017 朹 Google Cloud ç•¶äž­äœżç”š GPU é€ČèĄŒæ•ˆèƒœæœ€äœłćŒ–
GTC Taiwan 2017 朹 Google Cloud ç•¶äž­äœżç”š GPU é€ČèĄŒæ•ˆèƒœæœ€äœłćŒ–NVIDIA Taiwan
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014Puppet
 

Semelhante a The Rise of Parallel Computing (20)

lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
GPU Algorithms and trends 2018
GPU Algorithms and trends 2018GPU Algorithms and trends 2018
GPU Algorithms and trends 2018
 
19564926 graphics-processing-unit
19564926 graphics-processing-unit19564926 graphics-processing-unit
19564926 graphics-processing-unit
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
GPU Renderfarm with Integrated Asset Management & Production System (AMPS)
GPU Renderfarm with Integrated Asset Management & Production System (AMPS)GPU Renderfarm with Integrated Asset Management & Production System (AMPS)
GPU Renderfarm with Integrated Asset Management & Production System (AMPS)
 
Programming Models for Heterogeneous Chips
Programming Models for  Heterogeneous ChipsProgramming Models for  Heterogeneous Chips
Programming Models for Heterogeneous Chips
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine Learning
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
GTC Taiwan 2017 朹 Google Cloud ç•¶äž­äœżç”š GPU é€ČèĄŒæ•ˆèƒœæœ€äœłćŒ–
GTC Taiwan 2017 朹 Google Cloud ç•¶äž­äœżç”š GPU é€ČèĄŒæ•ˆèƒœæœ€äœłćŒ–GTC Taiwan 2017 朹 Google Cloud ç•¶äž­äœżç”š GPU é€ČèĄŒæ•ˆèƒœæœ€äœłćŒ–
GTC Taiwan 2017 朹 Google Cloud ç•¶äž­äœżç”š GPU é€ČèĄŒæ•ˆèƒœæœ€äœłćŒ–
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
 

Mais de bakers84

Civil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - PresentationCivil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - Presentationbakers84
 
Civil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - HandoutCivil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - Handoutbakers84
 
Finding Relatives in Spanish Church Records
Finding Relatives in Spanish Church RecordsFinding Relatives in Spanish Church Records
Finding Relatives in Spanish Church Recordsbakers84
 
Artificial Intelligence and the Coming Revolution of Family History - Handout
Artificial Intelligence and the Coming Revolution of Family History - HandoutArtificial Intelligence and the Coming Revolution of Family History - Handout
Artificial Intelligence and the Coming Revolution of Family History - Handoutbakers84
 
Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...bakers84
 
Leveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - PresentationLeveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - Presentationbakers84
 
Leveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner SyllabusLeveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner Syllabusbakers84
 
A Peek Under the Hood at FamilySearch - Presentation
A Peek Under the Hood at FamilySearch - PresentationA Peek Under the Hood at FamilySearch - Presentation
A Peek Under the Hood at FamilySearch - Presentationbakers84
 
The Coming Explosion of Records at FamilySearch - Presentation
The Coming Explosion of Records at FamilySearch - PresentationThe Coming Explosion of Records at FamilySearch - Presentation
The Coming Explosion of Records at FamilySearch - Presentationbakers84
 
The Coming Explosion of Records at FamilySearch Syllabus
The Coming Explosion of Records at FamilySearch SyllabusThe Coming Explosion of Records at FamilySearch Syllabus
The Coming Explosion of Records at FamilySearch Syllabusbakers84
 
A Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch SyllabusA Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch Syllabusbakers84
 
Meaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour SyllabusMeaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour Syllabusbakers84
 
Meaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - PresentationMeaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - Presentationbakers84
 
Viewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View PosterViewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View Posterbakers84
 
FamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - SyllabusFamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - Syllabusbakers84
 

Mais de bakers84 (15)

Civil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - PresentationCivil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - Presentation
 
Civil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - HandoutCivil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - Handout
 
Finding Relatives in Spanish Church Records
Finding Relatives in Spanish Church RecordsFinding Relatives in Spanish Church Records
Finding Relatives in Spanish Church Records
 
Artificial Intelligence and the Coming Revolution of Family History - Handout
Artificial Intelligence and the Coming Revolution of Family History - HandoutArtificial Intelligence and the Coming Revolution of Family History - Handout
Artificial Intelligence and the Coming Revolution of Family History - Handout
 
Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...
 
Leveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - PresentationLeveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - Presentation
 
Leveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner SyllabusLeveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner Syllabus
 
A Peek Under the Hood at FamilySearch - Presentation
A Peek Under the Hood at FamilySearch - PresentationA Peek Under the Hood at FamilySearch - Presentation
A Peek Under the Hood at FamilySearch - Presentation
 
The Coming Explosion of Records at FamilySearch - Presentation
The Coming Explosion of Records at FamilySearch - PresentationThe Coming Explosion of Records at FamilySearch - Presentation
The Coming Explosion of Records at FamilySearch - Presentation
 
The Coming Explosion of Records at FamilySearch Syllabus
The Coming Explosion of Records at FamilySearch SyllabusThe Coming Explosion of Records at FamilySearch Syllabus
The Coming Explosion of Records at FamilySearch Syllabus
 
A Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch SyllabusA Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch Syllabus
 
Meaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour SyllabusMeaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour Syllabus
 
Meaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - PresentationMeaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - Presentation
 
Viewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View PosterViewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View Poster
 
FamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - SyllabusFamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - Syllabus
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...gurkirankumar98700
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍾 8923113531 🎰 Avail...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

The Rise of Parallel Computing

  • 1. The Rise of Parallel Computing Ben Baker
  • 2. Moore’s Law "The number of transistors incorporated in a chip will approximately double every 24 months." Gordon Moore, Intel Co-Founder Originally published in 1965
  • 3.
  • 4. So What’s the Problem? ‱ Can continue to increase transistors per Moore’s Law ‱ Cannot continue to increase power or chips will melt – Power steadily rose with new chips until ~2005 – now 1 volt ‱ Cannot continue to scale processor frequency – Have you seen any 10 GHz chips? Moore’s Law gave no prediction of continued performance increases
  • 5. Time to “Take the Leap” “We have reached the limit of what is possible with one or more traditional, serial central processing units, or CPUs. It is past time for the computing industry – and everyone who relies on it for continued improvements in productivity, economic growth and social progress – to take the leap into parallel processing.” Bill Dally - Chief Scientist at NVIDIA and Professor at Stanford University http://www.forbes.com/2010/04/29/moores-law-computing-processing-opinions-contributors-bill-dally.html
  • 6. Additional Resources ‱ Stanford course available on iTunes U ‱ http://itunes.apple.com/us/itunes-u/programming-massively-parallel/id384233322 – Programming Massively Parallel Processors with CUDA – Lectures 1 and 13 are great introductions ‱ Lecture 13 – The Future of Throughput Computing (Bill Dally) ‱ Lecture 1 – Introduction to Massively Parallel Computing
  • 7. Guiding Principles ‱ Performance = Parallelism – Single-threaded processor performance has flat- lined at 0-5% annual growth since ~2005 ‱ Efficiency = Locality – Chips are power limited with most power spent moving data around
  • 8. Three Types of Parallelism ‱ Instruction-level parallelism – Out of order execution, branch prediction, etc. – Opportunities decreasing ‱ Data-level parallelism – SIMD (Single Instruction Multiple Data), GPUs, etc. – Opportunities increasing ‱ Thread-level parallelism – Multithreading, multi-core CPUs, etc. – Opportunities increasing
  • 9. Taking the Leap ‱ Three things are required – Lots of processors – Efficienct memory storage – Programming system that abstracts it
  • 10. CPU VS. GPU ARCHITECTURE CPU GPU ‱ General purpose ‱ Special purpose processors processors ‱ Optimized for ‱ Optimized for data level instruction level parallelism parallelism ‱ Many smaller processors ‱ A few large processors executing single capable of multi- instructions on multiple threading data (SIMD)
  • 11. High Performance GPU Computing ‱ GPUs are getting faster more quickly than CPUs ‱ Being used in industry for weather simulation, medical imaging, computational finance, etc. ‱ Amazon is now offering access to NVIDIA Tesla GPUs in the cloud as a service ($ vs Âą per hour) ‱ GPUs are being used as general purpose parallel processors – http://gpgpu.org
  • 12. Examples ‱ CUDA – NVIDIA ‱ C++ AMP – Microsoft ‱ OpenCL – Open source ‱ NPP – NVIDIA (Research done at FamilySearch)
  • 13. CUDA ‱ Compute Unified Device Architecture ‱ Proprietary NVIDIA extensions to C for running code on NVIDIA GPUs ‱ Other language bindings – Java – jCUDA, JCuda, JCublas, JCufft – Python – PyCUDA, KappaCUDA – .NET – CUDAfy.NET, CUDA.NET – Ruby – KappaCUDA – More – Fortran, Perl, Mathematica, MATLAB, etc.
  • 14. C for CUDA Example // Compute vector sum c = a + b // Each thread performs one pair-wise addition __global__ void vector_add(float* A, float* B, float* C) { int i = threadIdx.x + blockDim.x * blockIdx.x; C[i] = A[i] + B[i]; { int main() { // Allocate and initialize host (CPU) memory float* hostA = 
, *hostB = 
; // Allocate device (GPU) memory cudaMalloc((void**) &deviceA, N * sizeof(float)); cudaMalloc((void**) &deviceB, N * sizeof(float)); cudaMalloc((void**) &deviceC, N * sizeof(float)); // Copy host memory to device cudaMemcpy(deviceA, hostA, N * sizeof(float), cudaMemcpyHostToDevice)); cudaMemcpy(deviceB, hostB, N * sizeof(float), cudaMemcpyHostToDevice)); // Run N/256 blocks of 256 threads each vector_add<<< N/256, 256>>>(deviceA, deviceB, deviceC); }
  • 15. Heterogeneous Computing with Microsoft C++ AMP ‱ AMP = Accelerated Massive Parallelism ‱ Designed to take advantage of all the available compute resources (CPU, integrated & discrete GPUs) ‱ Coming in the next version of Visual Studio and C++ in the next year or two ‱ Cool demo http://hothardware.com/News/Microsoft-Demos-C-AMP-Heterogeneous-Computing-at-AFDS/
  • 16. EXAMPLE – C++ AMP void MatrixMult(float* C, const vector<float>&A, const vector<float>&B, int M, int N, int W) { for (int y = 0; y < M; y++) { for (int x = 0; x < N; x++) { float sum = 0; for (int i = 0; i < W; i++) sum += A(y*W + i] * B[i*N + x); C[y*N + x] = sum; } } } void MatrixMult(float* C, const vector<float>&A, const vector<float>&B, int M, int N, int W) { array_view<const float, 2> a (M, W, A), b(W, N, B); array_view<writeonly<float>, 2>c((M, N, C); parallel_for_each(c.grid, [=](index<2> idx) restrict(direct3d) { float sum = 0; for (int i = 0; i < a.x; i++) sum += a(idx.y, i) * b(i, idx.x); c[idx] = sum; }); }
  • 17. OpenCL ‱ Royalty free, cross-platform, vendor neutral ‱ Managed by Khronos OpenCL working group (www.khronos.org/opencl) ‱ Design goal to use all computational resources – GPUs and CPUs are peers ‱ Based on C ‱ Abstract the specifics of underlying hardware
  • 18. Example – OpenCL void trad_mul(int n, const float *a, const float* b, float* c) { for (int i = 0; i < n; i++) c[i] = a[i] * b[i]; } kernel void dp_mul(global const float *a, global const float* b, global float* c) { int id = get_global_id(0); c[id] = a[id] * b[id]; } // Execute over “n’ work-items
  • 19. Image Processing Flow at FamilySearch Preservation Storage (Lossless JPEG-2000) Image Capture (Uncompressed TIFF) Image Post-Processing Microfilm Scanners (DPC) Digital Cameras Distribution Storage (JPEG - original size) (JPEG - thumbnails)
  • 20. Digital Processing Center (DPC) ‱ Collection of servers in a data center used by FamilySearch to continuously process millions of images annually ‱ Image post processing operations performed include – Automatic skew correction – Automatic document cropping – Image sharpening – Image scaling (thumbnail creation) – Encoding into other image formats ‱ CPU is a current bottleneck (~12 sec/image) ‱ Processing requirements continuously rising (number of images, image size and number of color channels)
  • 21. Computer Graphics vs. Computer Vision ‱ Approximate inverses of each other: – Computer graphics – converting “numbers into pictures” – Computer vision – converting “pictures into numbers” ‱ GPUs have traditionally been used for computer graphics – (Ex. Graphics intensive computer games) ‱ Recent research, hardware and software are using GPUs for computer vision (Ex. Using Graphics Devices in Reverse) ‱ GPUs generally work well when there is ample data- level parallelism
  • 22. IMPLEMENTATION OPTIONS Rack Mount Servers Personal Supercomputer ‱ Several vendors provide solutions. ‱ GPUs for computing can be placed in (Ex. One is a 3U rack mount unit a standard workstation. Several capable of holding 16 GPUs vendors provide solutions. connected to 8 servers) ‱ Each Tesla GPU requires ‱ “Compared to typical quad-core – Available double-wide PCIe slot CPUs, Tesla 20 series computing – Two 6-pin or one 8-pin PCIe power systems deliver equivalent connectors and sufficient wattage performance at 1/10th the cost – Recommend 4GB RAM per card, at and 1/20th the power least 2.33 GHz quad-core CPU and consumption.” (NVIDIA) 64-bit Linux or Windows ‱ “250x the computing performance of a standard workstation” (NVIDIA)
  • 23. Image Processing Performance with IPP and NPP ‱ FamilySearch currently uses Intel’s IPP – Intel Performance Primitives – Optimize operations on Intel CPUs – Closed source, licensed ‱ NVIDIA has produced a similar library called NPP – NVIDIA Performance Primitives – Optimize operations on NVIDIA GPUs (CUDA underneath) – Higher level abstraction to perform image processing on GPUs – No license for SDK
  • 24. EXAMPLE – NPP // Declare a host object for an 8-bit grayscale image npp::ImageCPU_8u_C1 hostSrc; // Load grayscale image from disk npp::loadImage(sFilename, hostSrc); // Declare a device image and upload from host npp::ImageNPP_8u_C1 deviceSrc(hostSrc); 
 [Create padded image] 
 [Create Gaussian kernel] 
 [Create padded image] 
 [Create Gaussian kernel] // Copy kernel to GPU cudaMemcpy2D(deviceKernel, 12, hostKernel, kernelSize.width * sizeof(Npp32s), kernelSize.width * sizeof(Npp32s), kernelSize.height, cudaMemcpyHostToDevice); // Allocate blurred image of appropriate size // Allocate blurred image of appropriate size (on GPU) Ipp8u* blurredImg = ippiMalloc_8u_C1(img.getWidth(), npp::ImageNPP_8u_C1 deviceBlurredImg(imgSz.width, img.getHeight(), &blurredImgStepSz); imgSz.height); // Perform the filter // Perform the filter ippiFilter32f_8u_C1R(paddedImgData, nppiFilter_8u_C1R(paddedImg.data(widthOffset, paddedImage.getStepSize(), blurredImg, heightOffset), paddedImg.pitch(), blurredImgStepSz, imgSz, kernel, kernelSize, deviceBlurredImg.data(), deviceBlurredImg.pitch(), kernelAnchor); imgSz, deviceKernel, kernelSize, kernelAnchor, divisor); // Declare a host image for the result npp::ImageCPU_8u_C1 hostBlurredImage(deviceBlurredImg.size()); // Copy the device result data into it deviceBlurredImg.copyTo(hostBlurredImg.data(), hostBlurredImg.pitch());
  • 25. Performance Testing Methodology ‱ Test System Specifications – Dual Quad Core IntelÂź XeonÂź 2.80GHz i7 CPUs (8 cores total) – 6 GB RAM – 64-bit Windows 7 operating system – Single Tesla C1060 Compute Processor (240 processing cores total) – PCI-Express x16 Gen2 slot ‱ Three representative grayscale images of increasing size – Small image – 1726 x 1450 (2.5 megapixels) – Average image – 4808 x 3940 (18.9 megapixels) – Large image – 8966 x 6132 (55.0 megapixels) ‱ Results for each image repeated 3 times and averaged ‱ Transfer time to/from the GPU is considered part of all GPU operations
  • 26. ‱ Combining operations minimizes GPU/CPU transfers ‱ 5 – 6x speed up, increasing slightly with image size
  • 27. AMDAHL’S LAW Speeding up 25% of an overall process by 10x is less of an overall improvement than speeding up 75% of an overall process by 1.5x
  • 28. Takeaways ‱ Significant performance increases can be realized through parallelization – may become only way in the future ‱ GPUs are transforming into general purpose data-parallel computational coprocessors and outstripping advances in multi- core CPUs ‱ Languages, tools and APIs for parallel computing remain relatively immature, but are improving rapidly ‱ Relatively small learning curve – For image processing, NPP’s API nearly perfectly matches Intel’s IPP – New paradigms around copying to/from GPU and allocating memory – Can use programming languages familiar to developers without understanding intricacies of GPU architectures – Does require rethinking of algorithms to be parallel and building the computation around the data

Notas do Editor

  1. Don’t claim to be expert
  2. Source of much of what I will present – gives a lot more details, coming from people who know a lot more than I do
  3. Even CPUs realize performance is about parallelism – multi-core CPUsPower required increases exponentially with distance – Bill Dally says that lots of arithmetic units actually not hot
  4. GPUs initially only for computer graphics acceleration
  5. Of course want something that is open
  6. Number of images increasing as is size, more color, etc.
  7. Data center servers for large scale places like FamilySearch, Workstations could be put in smaller installations such as an archiveBased on limited survey (most sites don’t list prices)~$5-6K list price for 1U server or personal supercomputer w/2 Teslas~$8-9K list price for 1U server or personal supercomputer w/4 Teslas~$1200 per Tesla
  8. NVIDIA directly going at IPPImaging library structured so that we could create implementation for GPUs to run on a single GPU based server concurrent with current system
  9. Rotating, cropping, sharpening and scaling operations parallelized on GPU