SlideShare uma empresa Scribd logo
1 de 10
Q norm: A library of parallel methods for gene-expression Q-normalization José Manuel Mateos-Duran; Pjotr Prins; Andrés Rodríguez and Oswaldo Trelles The Bioinformatics Open Source Conference (BOSC)
European Concerted Research Action (COST) Bioinformatics  new generation  open source Bi ng os Improving open source software  for high performance computing in Biology Problem : New HT technologies in several areas of life sciences produce enormous amounts of data. A bottleneck in our ability to process and analyse the data Solution : Increase communication between Bioinformatics, HPC and OSS communities for adapting/developing capable software tools ,[object Object],[object Object],[object Object],[object Object],[object Object]
1) Load data to memory 2) Order each column of R producing a set of indexes I[G][E]=p (where p is the original position of the value in column 4) Assign the average value to all entries O[g][e]= A[g]  g=1 to G; e=1 to E 3) Obtain A[G] the average value for each row  5) Sort each column O[g][E] by the index I[g][E] (reproduce the original order) Q uantile normalization
C ode reorganization {  nE = LoadProject(fname, fList); for (i=0;i< nE;i++) { // for each Exp [STEP 1] LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG);  PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++)  // Global average  AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Out file & one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or disk) for (j=0;j<nG;j++) { //  prepare   OUT  array dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector }  } } P arallel prototype
S hared memory version {  nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG);  PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++)  // Global average  AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Output file and one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or from disk) for (j=0;j<nG;j++) { // complete output vector dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector }  } } #pragma omp parallel shared From, To, Range // Open general parallel section #pragma omp parallel shared From, To, Range
Master  Slave(s) Get Parameters, Initialize Start with params CalculateBlocks(nP,IniBlocks) Broadcast(IniBlocks)  Receive (Block) while(!ENDsignal) {   for each experiment in block {   LoadExperiment(Exp)   SortExperiment(Exp)   AcumulateAverage(Exp); } while (ThereIsBlocks) {   AverageBlock(ResultBlock) Receive(ResultBlock,who)    Send(ResultBlock) AverageBlock(ResultBlock) if(!SendAllBlocks) { CalculateNextBlock(NextBlock) Send(who,NextBlock)    Receive(Block);   } } } Broadcast(ENDsignal)  ReportResults M essage  P assing version
CPU nE = LoadProject(fname, fList); for  (i=0; i< nE; i++) {  // for each Exp LoadFile(fList, i, dataIn); CopyToGPU(dataIn); <<kernel>> QSortGPU(dataIn, dIndex) CopyFromGPU(dIndex); WriteToDisk(dIndex); <<kernel>> RowAccum(dataIn, AvG) } <<kernel>> GlobalAvg (AvG, nE) CopyFromGPU(AvG); // Step 2: Produce Output File // Using indexes and global average G PU version GPU NVIDIA CUDA Programming Model GPU kernels: QSortGPU(dataIn, dIndex) RowAccum(dataIn, AvG) GlobalAvg(AvG, nE)
Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 470 arrays.  Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). H ardware &  D ata Pablo : Shared Memory Cluster up-to 256 Nodes / JS20-IBM 512 CPUs - 1TB Distributed memory. Each node: 2 CPUs IBM PowerPC single-core 970FX - 64 bits - 2 GHz & 4GB RAM mem. HD : 40 GB (local) Interconnection Network: MERINET  Picasso:  Shared Memory Cluster up-to 64 Nodes Superdome HP 128 CPUs - 128 GB SM. Each node: 2 CPUs Intel Itanium-2 Dual Core - 1,6 GHz Almeria:  CPU: Intel Core 2 Quad Q9450, 2.66 GHz, 1.33 GHz FSB, 12 MB L2  GPU: GeForce 9800 GX2, 600/1500 MHz, 2x1 GHz DDR3, 1 GB & 512 bits  HD: 2 x 72 GB (RAID 0) **Western Digital Raptors **10000 RPM.
Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x  4 70 arrays.  Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). B enchmarking Distributed memory  Shared memory  GPU 2.9 x total speed-up 5.5 x processing speed-up
C onclusions Background Application domain: bioinformatics (diverse, disperse, heterogeneous, huge data…) I/O and memory oriented applications Large collection of sequential code unable to deal with computational demands Aims Featuring the application domain Start-up a library of (common) parallel procedures. Benchmarking Performance is strong related to code dependencies Parallel models (shared, distributed, etc) are appropriated for different code structures Shared memory is good but expensive GPU-based solution seem to be a good alternative for local installations I/O bounded applications should search of performance in the I/O device Q norm

Mais conteúdo relacionado

Mais procurados

PyCon KR 2019 sprint - RustPython by example
PyCon KR 2019 sprint  - RustPython by examplePyCon KR 2019 sprint  - RustPython by example
PyCon KR 2019 sprint - RustPython by exampleYunWon Jeong
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedis Labs
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaXKernel TLV
 
Scheduling in Linux and Web Servers
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web ServersDavid Evans
 
Performance evaluation of apache tajo
Performance evaluation of apache tajoPerformance evaluation of apache tajo
Performance evaluation of apache tajoJihoon Son
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksDavid Evans
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGordon Chung
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
Gnocchi v4 (preview)
Gnocchi v4 (preview)Gnocchi v4 (preview)
Gnocchi v4 (preview)Gordon Chung
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbagGordon Chung
 
Porting FreeRTOS on OpenRISC
Porting FreeRTOS   on   OpenRISCPorting FreeRTOS   on   OpenRISC
Porting FreeRTOS on OpenRISCYi-Chiao
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2Gordon Chung
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackKernel TLV
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)David Evans
 

Mais procurados (20)

PyCon KR 2019 sprint - RustPython by example
PyCon KR 2019 sprint  - RustPython by examplePyCon KR 2019 sprint  - RustPython by example
PyCon KR 2019 sprint - RustPython by example
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
The Internet
The InternetThe Internet
The Internet
 
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetchRedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
RedisConf17 - Internet Archive - Preventing Cache Stampede with Redis and XFetch
 
Storage
StorageStorage
Storage
 
grsecurity and PaX
grsecurity and PaXgrsecurity and PaX
grsecurity and PaX
 
Scheduling in Linux and Web Servers
Scheduling in Linux and Web ServersScheduling in Linux and Web Servers
Scheduling in Linux and Web Servers
 
Performance evaluation of apache tajo
Performance evaluation of apache tajoPerformance evaluation of apache tajo
Performance evaluation of apache tajo
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Debugging TV Frame 0x0D
Debugging TV Frame 0x0DDebugging TV Frame 0x0D
Debugging TV Frame 0x0D
 
Gnocchi v4 (preview)
Gnocchi v4 (preview)Gnocchi v4 (preview)
Gnocchi v4 (preview)
 
Gnocchi v3 brownbag
Gnocchi v3 brownbagGnocchi v3 brownbag
Gnocchi v3 brownbag
 
Porting FreeRTOS on OpenRISC
Porting FreeRTOS   on   OpenRISCPorting FreeRTOS   on   OpenRISC
Porting FreeRTOS on OpenRISC
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network Stack
 
Synchronization
SynchronizationSynchronization
Synchronization
 
Scheduling
SchedulingScheduling
Scheduling
 
Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)Making a Process (Virtualizing Memory)
Making a Process (Virtualizing Memory)
 

Destaque

Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009bosc
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009bosc
 
Content Marketing Using Stories
Content Marketing Using StoriesContent Marketing Using Stories
Content Marketing Using StoriesSteve Rayson
 
Adapt Open Source Project - Initial Meeting Slides
Adapt Open Source Project - Initial Meeting SlidesAdapt Open Source Project - Initial Meeting Slides
Adapt Open Source Project - Initial Meeting SlidesSteve Rayson
 
Piipari_iMotif_BOSC2009
Piipari_iMotif_BOSC2009Piipari_iMotif_BOSC2009
Piipari_iMotif_BOSC2009bosc
 
Water&Poverty FCS20thC
Water&Poverty FCS20thCWater&Poverty FCS20thC
Water&Poverty FCS20thCjdankoff
 
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)actioncan
 

Destaque (9)

Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009Hedlund_biogrid_BOSC2009
Hedlund_biogrid_BOSC2009
 
Kineo china
Kineo chinaKineo china
Kineo china
 
Snell Psoda Bosc2009
Snell Psoda Bosc2009Snell Psoda Bosc2009
Snell Psoda Bosc2009
 
Content Marketing Using Stories
Content Marketing Using StoriesContent Marketing Using Stories
Content Marketing Using Stories
 
Adapt Open Source Project - Initial Meeting Slides
Adapt Open Source Project - Initial Meeting SlidesAdapt Open Source Project - Initial Meeting Slides
Adapt Open Source Project - Initial Meeting Slides
 
Piipari_iMotif_BOSC2009
Piipari_iMotif_BOSC2009Piipari_iMotif_BOSC2009
Piipari_iMotif_BOSC2009
 
Water&Poverty FCS20thC
Water&Poverty FCS20thCWater&Poverty FCS20thC
Water&Poverty FCS20thC
 
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
함께하는시민행동과함께한2개월의시간 (2010인턴발표-펭귄)
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Semelhante a Trelles_QnormBOSC2009

Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovFwdays
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...David Walker
 
Adios Api Scidac Tutorialv2
Adios Api Scidac Tutorialv2Adios Api Scidac Tutorialv2
Adios Api Scidac Tutorialv2fanc1985
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...Ganesan Narayanasamy
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAkihiro Hayashi
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformaticsShanker Trivedi
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesESUG
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAGanesan Narayanasamy
 
Portable and reproducible bioinformatic analysis. Neoantigen discovery.
Portable and reproducible bioinformatic analysis. Neoantigen discovery.Portable and reproducible bioinformatic analysis. Neoantigen discovery.
Portable and reproducible bioinformatic analysis. Neoantigen discovery.Vladimir Kovacevic
 
Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008bosc_2008
 
Java gpu computing
Java gpu computingJava gpu computing
Java gpu computingArjan Lamers
 
Operating System 3
Operating System 3Operating System 3
Operating System 3tech2click
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta PyData
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionYunong Xiao
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 

Semelhante a Trelles_QnormBOSC2009 (20)

Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Adios Api Scidac Tutorialv2
Adios Api Scidac Tutorialv2Adios Api Scidac Tutorialv2
Adios Api Scidac Tutorialv2
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
Accelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL GenerationAccelerating Habanero-Java Program with OpenCL Generation
Accelerating Habanero-Java Program with OpenCL Generation
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Nvidia in bioinformatics
Nvidia in bioinformaticsNvidia in bioinformatics
Nvidia in bioinformatics
 
Biopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and OutlookBiopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and Outlook
 
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution ModesThreaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGA
 
Portable and reproducible bioinformatic analysis. Neoantigen discovery.
Portable and reproducible bioinformatic analysis. Neoantigen discovery.Portable and reproducible bioinformatic analysis. Neoantigen discovery.
Portable and reproducible bioinformatic analysis. Neoantigen discovery.
 
Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008
 
Java gpu computing
Java gpu computingJava gpu computing
Java gpu computing
 
Operating System 3
Operating System 3Operating System 3
Operating System 3
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta
 
Lec05 buffers basic_examples
Lec05 buffers basic_examplesLec05 buffers basic_examples
Lec05 buffers basic_examples
 
Node Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In ProductionNode Interactive Debugging Node.js In Production
Node Interactive Debugging Node.js In Production
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
AES on modern GPUs
AES on modern GPUsAES on modern GPUs
AES on modern GPUs
 

Mais de bosc

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009bosc
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627bosc
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009bosc
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009bosc
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009bosc
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009bosc
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009bosc
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009bosc
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009bosc
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009bosc
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009bosc
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009bosc
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009bosc
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009bosc
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009bosc
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009bosc
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009bosc
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009bosc
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009bosc
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009bosc
 

Mais de bosc (20)

Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009Swertz Molgenis Bosc2009
Swertz Molgenis Bosc2009
 
Bosc Intro 20090627
Bosc Intro 20090627Bosc Intro 20090627
Bosc Intro 20090627
 
Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009Software Patterns Panel Bosc2009
Software Patterns Panel Bosc2009
 
Schbath Rmes Bosc2009
Schbath Rmes Bosc2009Schbath Rmes Bosc2009
Schbath Rmes Bosc2009
 
Kallio Chipster Bosc2009
Kallio Chipster Bosc2009Kallio Chipster Bosc2009
Kallio Chipster Bosc2009
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Rice Emboss Bosc2009
Rice Emboss Bosc2009Rice Emboss Bosc2009
Rice Emboss Bosc2009
 
Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009Prlic Bio Java Bosc2009
Prlic Bio Java Bosc2009
 
Senger Soaplab Bosc2009
Senger Soaplab Bosc2009Senger Soaplab Bosc2009
Senger Soaplab Bosc2009
 
Cock Biopython Bosc2009
Cock Biopython Bosc2009Cock Biopython Bosc2009
Cock Biopython Bosc2009
 
Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009Hanmer Software Patterns Bosc2009
Hanmer Software Patterns Bosc2009
 
Procter Vamsas Bosc2009
Procter Vamsas Bosc2009Procter Vamsas Bosc2009
Procter Vamsas Bosc2009
 
Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009Drablos Composite Motifs Bosc2009
Drablos Composite Motifs Bosc2009
 
Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009Fauteux Seeder Bosc2009
Fauteux Seeder Bosc2009
 
Moeller Debian Bosc2009
Moeller Debian Bosc2009Moeller Debian Bosc2009
Moeller Debian Bosc2009
 
Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009Prins Bio Lib Bosc 2009
Prins Bio Lib Bosc 2009
 
Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009Wilczynski_BNFinder_BOSC2009
Wilczynski_BNFinder_BOSC2009
 
Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009Welsh_BioHDF_BOSC2009
Welsh_BioHDF_BOSC2009
 
Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009Varre_Biomanycores_BOSC2009
Varre_Biomanycores_BOSC2009
 
Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009Rother_ModeRNA_BOSC2009
Rother_ModeRNA_BOSC2009
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Trelles_QnormBOSC2009

  • 1. Q norm: A library of parallel methods for gene-expression Q-normalization José Manuel Mateos-Duran; Pjotr Prins; Andrés Rodríguez and Oswaldo Trelles The Bioinformatics Open Source Conference (BOSC)
  • 2.
  • 3. 1) Load data to memory 2) Order each column of R producing a set of indexes I[G][E]=p (where p is the original position of the value in column 4) Assign the average value to all entries O[g][e]= A[g] g=1 to G; e=1 to E 3) Obtain A[G] the average value for each row 5) Sort each column O[g][E] by the index I[g][E] (reproduce the original order) Q uantile normalization
  • 4. C ode reorganization { nE = LoadProject(fname, fList); for (i=0;i< nE;i++) { // for each Exp [STEP 1] LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG); PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++) // Global average AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Out file & one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or disk) for (j=0;j<nG;j++) { // prepare OUT array dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector } } } P arallel prototype
  • 5. S hared memory version { nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); Qnorm1(dataIn, dIndex, fList[i].nG); PartialRowAccum(AvG, dataIn , nG); // Manage the Index in memory or disk } for (i=0;i<nG;i++) // Global average AvG[i].Av /=AvG[i].num; // produce the ORDERED output file [STEP 2] Prepare Output file and one column 'dataOut' array for (i=0;i<nE;i++) { Get the column index (from memory or from disk) for (j=0;j<nG;j++) { // complete output vector dataOut[dIndex[j]]=AvG[j].Av; File positioning and writing the vector } } } #pragma omp parallel shared From, To, Range // Open general parallel section #pragma omp parallel shared From, To, Range
  • 6. Master Slave(s) Get Parameters, Initialize Start with params CalculateBlocks(nP,IniBlocks) Broadcast(IniBlocks)  Receive (Block) while(!ENDsignal) { for each experiment in block { LoadExperiment(Exp) SortExperiment(Exp) AcumulateAverage(Exp); } while (ThereIsBlocks) { AverageBlock(ResultBlock) Receive(ResultBlock,who)  Send(ResultBlock) AverageBlock(ResultBlock) if(!SendAllBlocks) { CalculateNextBlock(NextBlock) Send(who,NextBlock)  Receive(Block); } } } Broadcast(ENDsignal)  ReportResults M essage P assing version
  • 7. CPU nE = LoadProject(fname, fList); for (i=0; i< nE; i++) { // for each Exp LoadFile(fList, i, dataIn); CopyToGPU(dataIn); <<kernel>> QSortGPU(dataIn, dIndex) CopyFromGPU(dIndex); WriteToDisk(dIndex); <<kernel>> RowAccum(dataIn, AvG) } <<kernel>> GlobalAvg (AvG, nE) CopyFromGPU(AvG); // Step 2: Produce Output File // Using indexes and global average G PU version GPU NVIDIA CUDA Programming Model GPU kernels: QSortGPU(dataIn, dIndex) RowAccum(dataIn, AvG) GlobalAvg(AvG, nE)
  • 8. Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 470 arrays. Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). H ardware & D ata Pablo : Shared Memory Cluster up-to 256 Nodes / JS20-IBM 512 CPUs - 1TB Distributed memory. Each node: 2 CPUs IBM PowerPC single-core 970FX - 64 bits - 2 GHz & 4GB RAM mem. HD : 40 GB (local) Interconnection Network: MERINET Picasso: Shared Memory Cluster up-to 64 Nodes Superdome HP 128 CPUs - 128 GB SM. Each node: 2 CPUs Intel Itanium-2 Dual Core - 1,6 GHz Almeria: CPU: Intel Core 2 Quad Q9450, 2.66 GHz, 1.33 GHz FSB, 12 MB L2 GPU: GeForce 9800 GX2, 600/1500 MHz, 2x1 GHz DDR3, 1 GB & 512 bits HD: 2 x 72 GB (RAID 0) **Western Digital Raptors **10000 RPM.
  • 9. Input: Affymetrix raw CEL files (GPL3718 ) / 6.5M probes x 4 70 arrays. Convert CEL files: Ben Bolstad's Affyio (part of R/Bioconductor and my Biolib). B enchmarking Distributed memory Shared memory GPU 2.9 x total speed-up 5.5 x processing speed-up
  • 10. C onclusions Background Application domain: bioinformatics (diverse, disperse, heterogeneous, huge data…) I/O and memory oriented applications Large collection of sequential code unable to deal with computational demands Aims Featuring the application domain Start-up a library of (common) parallel procedures. Benchmarking Performance is strong related to code dependencies Parallel models (shared, distributed, etc) are appropriated for different code structures Shared memory is good but expensive GPU-based solution seem to be a good alternative for local installations I/O bounded applications should search of performance in the I/O device Q norm

Notas do Editor

  1. Now, lets address the use of parallel processing in bioinformatic applications