SlideShare uma empresa Scribd logo
1 de 23
High Performance Content-Based Matching Using GPUs Alessandro Margara and GianpaoloCugola margara@elet.polimi.it, cugola@elet.polimi.it Dip. Elettronica e Informazione (DEI) Politecnico di Milano
The Problem: Content-Based Matching High Performance Content-Based Matching Using GPUs - DEBS 2011 2 Publishers Content-Based Matching Subscribers Predicate Filter (Smoke=true and Room = “Kitchen”) or (Light>30 and Room=“Bedroom”) Light=50, Room=Bedroom, Sender=“Sensor1” Attribute Constraint
Introduced by Nvidia in 2006 General purpose parallel computing architecture New instruction set New programming model Programmable using high-level languages Cuda C (a C dialect) Programming GPUs: CUDA High Performance Content-Based Matching Using GPUs - DEBS 2011 3
Programming Model: Basics The device (GPU) acts as a coprocessor for the host (CPU) and has its own separate memory space It is necessary to copy input data from the main memory to the GPU memory before starting a computation … … and to copy results back to the main memory when the computation finishes Often the most expensive operations Involve sending information through the PCI-Ex bus Bandwidth but also latency Also requires serialization of data structures! They must be kept simple High Performance Content-Based Matching Using GPUs - DEBS 2011 4
Typical Workflow High Performance Content-Based Matching Using GPUs - DEBS 2011 5 Allocate memory on device Serialize and copy data to device Execute one or more kernels on the device Wait for the device to finish processing Copy results back
Programming Model: Fundamentals Single Program Multiple Threads implementation strategy A single kernel(function) is executed by multiple threads in parallel Threads are organized in blocks Threads within different blocks operate independently Threads within the same block cooperate to solve a single sub-problem The runtime provides a blockIdand athreadIdvariable, to uniquely identify each running thread Accessing such variables is the only way to differentiate the work done by different threads High Performance Content-Based Matching Using GPUs - DEBS 2011 6
Programming Model: Memory management Hierarchical organization of memory All threads have access to the same common global memory Large (512MB-6GB) but slow (DRAM) Stores information received from the host Persistent across different function calls Threads within a block coordinate themselves using a shared memory Implemented on-chip Fast but limited (16-48KB) Each thread has its own localmemory It’s the only “cache” available No hardware/system support Must be explicitly controlled by the application code High Performance Content-Based Matching Using GPUs - DEBS 2011 7
More on Memory Management Without hardware managed caches, accesses to global memory can easily become a bottleneck Issues to consider when designing algorithms and data structures Maximize usage of shared (block local) memory Without overcoming its size Threads with contiguous ids should access contiguous global memory regions Hardware can combine them into several memory-wide accesses High Performance Content-Based Matching Using GPUs - DEBS 2011 8
Hardware Implementation An array of Streaming Multiprocessors (SMs) containing many (extremely simple) processing cores Each SM executes threads in groups of 32 called warps Scheduling is performed in hardware with zero overhead Optimized for data parallel problems Maximum efficiency only if all threads in a warp agree on the execution path 9 High Performance Content-Based Matching Using GPUs - DEBS 2011
Some Numbers NVIDIA GTX 460 1GB RAM (Global Memory) 7 Streaming Multiprocessors Each SM contains 48 cores Each SM manages up to 48 warps (32 threads each) Up to 10752 threads managed concurrently!!! Up to 336 threads running concurrently!!! Today’s cheap GPU: less than 160$ High Performance Content-Based Matching Using GPUs - DEBS 2011 10
Existing Algorithms Two approaches Counting algorithms Tree-based algorithms Complex data structures to optimize sequential execution Trees, Maps, … Lots of pointers!!! Hardly fit the data parallel programming model! High Performance Content-Based Matching Using GPUs - DEBS 2011 11
Algorithm Description High Performance Content-Based Matching Using GPUs - DEBS 2011 12 F1: A>10 and B=20 F2: B>15 and C<30 S1 A=12 B=20 A=12 B=20 F3: D=20 S2 2 1 0 0 1 0
Algorithm Description Constraints with the same name are stored in array on the GPU Contiguous memory regions When processing an event E, the CPU selects all relevant constraint arrays Based on the name of the attributes in E High Performance Content-Based Matching Using GPUs - DEBS 2011 13
Algorithm Description Bi-dimensional organization of threads One thread for each attribute/constraint pair Threads in the same block evaluate the same attribute It can be copied in shared memory Threads with contiguous ids access contiguous constraints Accesses combined into several memory-wide operations Filters count updated with an atomic operation High Performance Content-Based Matching Using GPUs - DEBS 2011 14 Event attributes B=32 C=21 A=7
Improvement Problem: before processing each event we need to reset filters count and interfaces selection vector Naïve version: use a memset Communication with the GPU introduces additional delay Solution: two copies of filters count and interfaces vector While processing an event One copy is used One copy is reset for the next event Inside the same kernel No communication overhead High Performance Content-Based Matching Using GPUs - DEBS 2011 15
Results: Default Scenario Comparison against state of the art sequential implementation SFF (Siena) 1.9.4 AMD CPU @ 2.8GHz Default scenario Relatively “simple” 10 interfaces, 25k filters, 1M constraints Analysis changing various parameters We measure latency Processing time for a single event High Performance Content-Based Matching Using GPUs - DEBS 2011 16 7x
Results: Number of Constraints High Performance Content-Based Matching Using GPUs - DEBS 2011 17 10x
Results: Number of Filters High Performance Content-Based Matching Using GPUs - DEBS 2011 18 13x
Results What is the time needed to install subscriptions? Need to serialize data structures Need to copy from CPU memory to GPU memory But data structures are simple! Memory requirements? 35MB in the default scenario Up to 200MB in all our tests Not a problem for a modern GPU High Performance Content-Based Matching Using GPUs - DEBS 2011 19
Results We measured the latency when processing a single event 0.14ms processing time  7000 events/s? What about the maximum throughput? High Performance Content-Based Matching Using GPUs - DEBS 2011 20 9400 events/s
Conclusions Benefits of GPU in a wide range of scenarios In particular in the most challenging workloads Additional advantage It leaves the CPU free to perform other tasks E.g. Communication related tasks Available for download Includes a translator from Siena subscriptions / messages More info at http://home.dei.polimi.it/margara High Performance Content-Based Matching Using GPUs - DEBS 2011 21
Future Work We are currently working with multi-core CPUs Using OpenMP We are currently testing our algorithm within a real system Both GPUs and multi-core CPUs Take into account communication overhead Measure of latency and throughput We plan to explore the advantages of GPUs with probabilistic (as opposed to exact) matching Encoded filters (Bloom filters) Balance between performance and percentage of false positives High Performance Content-Based Matching Using GPUs - DEBS 2011 22
Questions? High Performance Content-Based Matching Using GPUs - DEBS 2011 23

Mais conteúdo relacionado

Mais procurados

Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelaratorsEmmanuel college
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Paging and Segmentation in Operating System
Paging and Segmentation in Operating SystemPaging and Segmentation in Operating System
Paging and Segmentation in Operating SystemRaj Mohan
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingMesbah Uddin Khan
 
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationEXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationIosif Itkin
 
Enery efficient data prefetching
Enery efficient data prefetchingEnery efficient data prefetching
Enery efficient data prefetchingHimanshu Koli
 
Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Marcirio Chaves
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overviewRajiv Kumar
 
Parallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingParallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingGrigoris Anagnostopoulos
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsSrinath Perera
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentationVishal Singh
 

Mais procurados (20)

openCL Paper
openCL PaperopenCL Paper
openCL Paper
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
High performance computing with accelarators
High performance computing with accelaratorsHigh performance computing with accelarators
High performance computing with accelarators
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Paging and Segmentation in Operating System
Paging and Segmentation in Operating SystemPaging and Segmentation in Operating System
Paging and Segmentation in Operating System
 
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU ComputingAchieving Improved Performance In Multi-threaded Programming With GPU Computing
Achieving Improved Performance In Multi-threaded Programming With GPU Computing
 
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value CreationEXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
 
Enery efficient data prefetching
Enery efficient data prefetchingEnery efficient data prefetching
Enery efficient data prefetching
 
OS_Ch9
OS_Ch9OS_Ch9
OS_Ch9
 
Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
 
ResumeJagannath
ResumeJagannathResumeJagannath
ResumeJagannath
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
PowerAlluxio
PowerAlluxioPowerAlluxio
PowerAlluxio
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
Parallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingParallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modeling
 
Memory Mapping Cache
Memory Mapping CacheMemory Mapping Cache
Memory Mapping Cache
 
In-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common PatternsIn-Memory Computing: How, Why? and common Patterns
In-Memory Computing: How, Why? and common Patterns
 
CPU vs. GPU presentation
CPU vs. GPU presentationCPU vs. GPU presentation
CPU vs. GPU presentation
 

Semelhante a Content-Based Matching on GPUs

I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfanil0878
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosDenodo
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Unified Computing System - PC Without CPU
Unified Computing System - PC Without CPUUnified Computing System - PC Without CPU
Unified Computing System - PC Without CPUErAnalSalshingikar
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processorsHebeon1
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)Sri Prasanna
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
Hug syncsort etl hadoop big data
Hug syncsort etl hadoop big dataHug syncsort etl hadoop big data
Hug syncsort etl hadoop big dataStéphane Heckel
 

Semelhante a Content-Based Matching on GPUs (20)

I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data Scenarios
 
Low-level Graphics APIs
Low-level Graphics APIsLow-level Graphics APIs
Low-level Graphics APIs
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Unified Computing System - PC Without CPU
Unified Computing System - PC Without CPUUnified Computing System - PC Without CPU
Unified Computing System - PC Without CPU
 
1.multicore processors
1.multicore processors1.multicore processors
1.multicore processors
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Hug syncsort etl hadoop big data
Hug syncsort etl hadoop big dataHug syncsort etl hadoop big data
Hug syncsort etl hadoop big data
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
 

Último

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Content-Based Matching on GPUs

  • 1. High Performance Content-Based Matching Using GPUs Alessandro Margara and GianpaoloCugola margara@elet.polimi.it, cugola@elet.polimi.it Dip. Elettronica e Informazione (DEI) Politecnico di Milano
  • 2. The Problem: Content-Based Matching High Performance Content-Based Matching Using GPUs - DEBS 2011 2 Publishers Content-Based Matching Subscribers Predicate Filter (Smoke=true and Room = “Kitchen”) or (Light>30 and Room=“Bedroom”) Light=50, Room=Bedroom, Sender=“Sensor1” Attribute Constraint
  • 3. Introduced by Nvidia in 2006 General purpose parallel computing architecture New instruction set New programming model Programmable using high-level languages Cuda C (a C dialect) Programming GPUs: CUDA High Performance Content-Based Matching Using GPUs - DEBS 2011 3
  • 4. Programming Model: Basics The device (GPU) acts as a coprocessor for the host (CPU) and has its own separate memory space It is necessary to copy input data from the main memory to the GPU memory before starting a computation … … and to copy results back to the main memory when the computation finishes Often the most expensive operations Involve sending information through the PCI-Ex bus Bandwidth but also latency Also requires serialization of data structures! They must be kept simple High Performance Content-Based Matching Using GPUs - DEBS 2011 4
  • 5. Typical Workflow High Performance Content-Based Matching Using GPUs - DEBS 2011 5 Allocate memory on device Serialize and copy data to device Execute one or more kernels on the device Wait for the device to finish processing Copy results back
  • 6. Programming Model: Fundamentals Single Program Multiple Threads implementation strategy A single kernel(function) is executed by multiple threads in parallel Threads are organized in blocks Threads within different blocks operate independently Threads within the same block cooperate to solve a single sub-problem The runtime provides a blockIdand athreadIdvariable, to uniquely identify each running thread Accessing such variables is the only way to differentiate the work done by different threads High Performance Content-Based Matching Using GPUs - DEBS 2011 6
  • 7. Programming Model: Memory management Hierarchical organization of memory All threads have access to the same common global memory Large (512MB-6GB) but slow (DRAM) Stores information received from the host Persistent across different function calls Threads within a block coordinate themselves using a shared memory Implemented on-chip Fast but limited (16-48KB) Each thread has its own localmemory It’s the only “cache” available No hardware/system support Must be explicitly controlled by the application code High Performance Content-Based Matching Using GPUs - DEBS 2011 7
  • 8. More on Memory Management Without hardware managed caches, accesses to global memory can easily become a bottleneck Issues to consider when designing algorithms and data structures Maximize usage of shared (block local) memory Without overcoming its size Threads with contiguous ids should access contiguous global memory regions Hardware can combine them into several memory-wide accesses High Performance Content-Based Matching Using GPUs - DEBS 2011 8
  • 9. Hardware Implementation An array of Streaming Multiprocessors (SMs) containing many (extremely simple) processing cores Each SM executes threads in groups of 32 called warps Scheduling is performed in hardware with zero overhead Optimized for data parallel problems Maximum efficiency only if all threads in a warp agree on the execution path 9 High Performance Content-Based Matching Using GPUs - DEBS 2011
  • 10. Some Numbers NVIDIA GTX 460 1GB RAM (Global Memory) 7 Streaming Multiprocessors Each SM contains 48 cores Each SM manages up to 48 warps (32 threads each) Up to 10752 threads managed concurrently!!! Up to 336 threads running concurrently!!! Today’s cheap GPU: less than 160$ High Performance Content-Based Matching Using GPUs - DEBS 2011 10
  • 11. Existing Algorithms Two approaches Counting algorithms Tree-based algorithms Complex data structures to optimize sequential execution Trees, Maps, … Lots of pointers!!! Hardly fit the data parallel programming model! High Performance Content-Based Matching Using GPUs - DEBS 2011 11
  • 12. Algorithm Description High Performance Content-Based Matching Using GPUs - DEBS 2011 12 F1: A>10 and B=20 F2: B>15 and C<30 S1 A=12 B=20 A=12 B=20 F3: D=20 S2 2 1 0 0 1 0
  • 13. Algorithm Description Constraints with the same name are stored in array on the GPU Contiguous memory regions When processing an event E, the CPU selects all relevant constraint arrays Based on the name of the attributes in E High Performance Content-Based Matching Using GPUs - DEBS 2011 13
  • 14. Algorithm Description Bi-dimensional organization of threads One thread for each attribute/constraint pair Threads in the same block evaluate the same attribute It can be copied in shared memory Threads with contiguous ids access contiguous constraints Accesses combined into several memory-wide operations Filters count updated with an atomic operation High Performance Content-Based Matching Using GPUs - DEBS 2011 14 Event attributes B=32 C=21 A=7
  • 15. Improvement Problem: before processing each event we need to reset filters count and interfaces selection vector Naïve version: use a memset Communication with the GPU introduces additional delay Solution: two copies of filters count and interfaces vector While processing an event One copy is used One copy is reset for the next event Inside the same kernel No communication overhead High Performance Content-Based Matching Using GPUs - DEBS 2011 15
  • 16. Results: Default Scenario Comparison against state of the art sequential implementation SFF (Siena) 1.9.4 AMD CPU @ 2.8GHz Default scenario Relatively “simple” 10 interfaces, 25k filters, 1M constraints Analysis changing various parameters We measure latency Processing time for a single event High Performance Content-Based Matching Using GPUs - DEBS 2011 16 7x
  • 17. Results: Number of Constraints High Performance Content-Based Matching Using GPUs - DEBS 2011 17 10x
  • 18. Results: Number of Filters High Performance Content-Based Matching Using GPUs - DEBS 2011 18 13x
  • 19. Results What is the time needed to install subscriptions? Need to serialize data structures Need to copy from CPU memory to GPU memory But data structures are simple! Memory requirements? 35MB in the default scenario Up to 200MB in all our tests Not a problem for a modern GPU High Performance Content-Based Matching Using GPUs - DEBS 2011 19
  • 20. Results We measured the latency when processing a single event 0.14ms processing time  7000 events/s? What about the maximum throughput? High Performance Content-Based Matching Using GPUs - DEBS 2011 20 9400 events/s
  • 21. Conclusions Benefits of GPU in a wide range of scenarios In particular in the most challenging workloads Additional advantage It leaves the CPU free to perform other tasks E.g. Communication related tasks Available for download Includes a translator from Siena subscriptions / messages More info at http://home.dei.polimi.it/margara High Performance Content-Based Matching Using GPUs - DEBS 2011 21
  • 22. Future Work We are currently working with multi-core CPUs Using OpenMP We are currently testing our algorithm within a real system Both GPUs and multi-core CPUs Take into account communication overhead Measure of latency and throughput We plan to explore the advantages of GPUs with probabilistic (as opposed to exact) matching Encoded filters (Bloom filters) Balance between performance and percentage of false positives High Performance Content-Based Matching Using GPUs - DEBS 2011 22
  • 23. Questions? High Performance Content-Based Matching Using GPUs - DEBS 2011 23