SlideShare uma empresa Scribd logo
1 de 11
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 1
Parallel Computing System For Image Intelligent Processing
 ABSTRACT:
In this paper, a parallel computing system for image intelligent processing is
described. This parallel computing system includes two main parts:
A parallel computer and a set of software tools.
The parallel computer is consinicted by a host processor, a SIMD coprocessor, a
stream memory and a controller of this memory. Furthermore, the parallel computer
can be extended by different kinds of organization with the host processors and the
SIMD coprocessors. Programmers write image intelligent processing programs in a C-
like language and a set of software tools maps these programs to code that runs on
the parallel computer. These two parts work together to provide a high performance
for image intelligent processing.
 INTRODUCTION
In this paper, we describe a parallel computing system for image intelligent
processing applications development and show how high performance image
intelligent processing applications are realized on this system. This system includes a
parallel computer and a set of software tools.
 How parallel computing system for image intelligent processing takes place?
Image intelligent processing applications are progmmmed using a C-like language.
This C-like language includes traditional serial statements and data parallel
statements.
The serial statements part as stream programs.
The data parallel statements part as kernel programs.
The stream programs run on the host processor.
The kernel programs run on the SIMD coprocessor.
Image data and kernel programs are transferred from main memory to stream memory.
Finally, image data arrive at coprocessor’s local register file and the kernel programs
arrive at coprocessor’s program memory.
The whole transformation is controlled by the stream memory controller.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 2
 SMID :
Single instruction multiple data (SIMD), as the name suggests, A type
of parallel computingarchitecture that is classified under Flynn's taxonomy. It describes computers
with multiple processing elements that perform the same operation on multiple data points
simultaneously.
Typically this consists of many simple processors, each with a local memoryin which it keeps the
data which it will work on. Each processor simultaneously performs the same instruction on its
local data progressing through the instructions in lock-step, with the instructions issued by the
controller processor. The processors can communicate with each other in order to perform shifts
and other array operations
The number of parallel data elements processed at the same time is dependent on the size of the
elements and the capacity of the data processing resources available. Usually the capacity of the
data processing resources is fixed for a given architecture
Thus, such machines exploit data level parallelism. SIMD is particularly applicable to common
tasks like adjusting the contrast in a digital image or adjusting the volume of digital audio. Most
modern CPU designs include SIMD instructions in order to improve the performance
of multimedia use.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 3
 APPLICATIONS
We want to implement image intelligent processing on this system. Tasks in this kind of
image processing can be divided to three levels. They are :
Low level tasks,
Middle level tasks and
High level tasks.
Low level tasks in image processing are also called preprocessing which is used to
organize data for later image processing. Preprocessing mainly includes image enhancement
and image recovery. In the opinion of computing, preprocessing can be viewed as three
kinds of processes that have single pixel depended processing (such as threshold,
quantification and coding etc.) part pixels processing (such as filtering, convolution etc.)
area pixels processing and whole pixels processing (digital transformation such as
FFT).
All of the above processing has common characteristics as followings: The first is space
inflexibility that is to say the same instruction can be executed on the whole of the image
pixels. We call this characteristic data independence. The second is locality i.e., for each
pixel its processing result only depends on its neighboring pixels. The third is the fixed data
structure i.e., the image processing result is still an image array. From all the above we
conclude that the image preprocessing has obvious data parallel.
The aim of middle level processing is to find out theoutstanding features and the
proper primitive elements from the images. The middle level processing includes image
partition and description of image partition. The middle level processing corresponds to a
transformation from image to feature. So, middle level processing is a mixture of data
parallel and task parallel.
The aim of high level processing is to recognize the content of image. This kind
of processing can be divided into two periods that are symbol description and scene
understanding. From these feature, we can conclude that the high level processing has
obvious task parallel.
When a parallel computer system can fit these typical features in image processing,
the performance of this system needs to be greatly enhanced.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 4
 PARALLEL COMPUTER
The parallel computer includes a host processor, a SIMD coprocessor, a
stream memory and a controller of this stream memory as Fig. 1 illustrated.
Fig 1: parallel computer organization
The parallel computer organization consist of three parts:
The host processor is the main controller of the whole system.
The stream memory is the memory between main memory .
The coprocessor’s local register file. Its function is to enhance memory system’s speed
to match the coprocessor’s requirement.
 Parallel computer organization:
The host processor is the main controller of the whole system. We call the host
processor and the stream memory controller a serial processor. The stream memory
controller manages the data transfer between the stream memory and the memory on serial
processor. The memory management’s function is to enhance memory system’s data
transfer speed to match the coprocessor’s computing requirement.
In Fig. 1, the host processor, the stream memory controller, the stream memory and the
SIMD coprocessor are always on a same chip and always the stream memory is constructed
by register files.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 5
 Parallel Computer Architecture:
Fig. 2: Parallel computer architecture
The parallel computer is a new generation image processing computer. As Fig. 2
illustrated this computer includes a key component which is called SIMD coprocessor.
Always we call the SIMD coprocessor as PE (Processing Element) array.
The host of the parallel computer can be a PC or a workstation.
The FE array connects with host The host processor manages the whole computer.
The host transfers kernel source programs and data to the PE array’s program
memory and stream memory.
Then the kemel programs are started on the PE array.
Finally, the computing results are fetched from stream memory to the host for
further analysis. T
he host also does initialization, diagnosis and testing of the PE array. PE array is
suitable to do matrix computing, so we can view the PE array as an accelerator which
matches matrix computing.
The PE array is constructed by many processing elements in a regular array. The
PE array includes NxN PEs. Each PE includes ALU, local register file, buffer and router
which does data communication. All of the PEs are connected by a conventional ring grirL
The number of the PEs is decided by the computing requirement and the VLSI realization
level.Because the PE anay is regular and each FE is simple, the PE array chip can be
reabzed in small size. Furthermore, with many PE array chips we can get a parallel
computer.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 6
The parallel computer provides a three levels memory hierarchy. This three levels
memory is the:
local register files of PE’s,
the stream memory and
The main memory.
This hierarchy is used to exploit the parallelism and locality of image applications.
The architecture of the parallel computer meets the computation and bandwidth
demands of image applications by directly processing the naturally occurring data
streams within image applications. The PE array is designed to be a coprocessor that
operates on image data streams. Many PEs can provide high performance for real-time
image processing.
The stream memory effectively isolates the ALUs from the main memory, making the
PE array a load/store architecture for data streams. All data stream operations transfer
data streams to or from the stream memory. For instance, the PE array transfers data
streams directly out of and into the stream memory, isolating PE array transfers from main
memory accesses and computation. This simplifies the design of the processor. Meanwhile
the stream memory and the PE array can works in pipeline. So data streams transfer speed
is promoted by this stream memory.
 Parallel computer extension:
The above parallel computer is a basic image processing computer. The host
processor is a SISD computer. The coprocessor is a SIMD on a serial computer which can be
an 8 PEs chip or a 16 PEs chip. We can construct 8 PE chips or 16 PE chip on a board. In this
kind of organization we can construct a 64 PEs or 256 PEs in one parallel computer.
Furthermore, the 256 PEs on a board (for example a PCI board) can also be constructed in
the number of 8, 16 or more to mount on a computer main board. In this kind of
extension for the parallel computer we can get a parallel computer which can match the
Image processmg application's performance requirem ents.
Meanwhile, in the application section we know that image processing application has
data parallelism and task parallelism characteristics. In order to construct a parallel computing
system to match the need of different parallelism for image processing, we can use
the basic image processing computers to construct an intelligent parallel computer for
image intelligent processing. For example, use cable to connect several basic parallel
computers together we can construct a real task parallel computer which can match three
kinds of parallelism for image intelligent processing.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 7
In a word, the extension of the parallel computer can be realized in four ways:
1. Extend the number of pes on a chip.
2. Extend the number of PE chips on a board.
3 . Extend the number of boards on a computer main board.
4. Extend the number of computer main boards.
 SOFTWARE
The basic image processing computer decomposes an application into a series of kernel
programs that operate on image data. We call this kind of image data streams. Streams
are sets of sequential data records that lend themselves to high-performance computation
through their regular and predictable structure. In simple words stream is a data
collection that could be operated in parallel.
 Stream programs are written in the SISD statement ofa C-like language.
 The kernel programs are written in the SIMD statement of the C-like language.

The use of these two parts in the C-like language for image processing allows the
programmer to use the familiar high-level C language's structure. The two-level
programming system reflects the division of programs on the basic image processing
computer:
The kernel programs runing on the SIMD coprocessor, while the stream program runing
on the serial processor corresponds to a high-level description of the image processmg.
Fig. 3: Software partitions
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 8
There are several program parts work together to execute an image application, as
illustrated in Fig. 3.
Stream programs run on the serial processor, call the kernel programs and initiate the
transfer of data streams. Stream programs instruct the SIMD coprocessor to run kernel
programs and then send out the resulting data.
 Stream Program And Stream Scheduler:
Stream programs are written in the C-like language with calls into kernel programs.
Kernel program calls are treated as functions that take input and output streams as
arguments. The stream scheduler allocates space in serial processor's memory and in
stream memory used to store streams that are passed between kernel programs. The
scheduler determines when to place streams in the stream memory and when to
move them to and from serial processor's memory to take advantage of producer-
consumer locality between different kernel programs. Efficient stream programs
minimize traffic between the stream memory and the serial processor's memory. When the
image data is larger than the stream memory, the scheduler breaks the large stream into
smaller batches. One batch is loaded into stream memory at a time. The kernel program
executes on this batch before computation begins on the next rows. In this way, raw image
data arrives in the stream memory and passes between kernel programs through
temporary storage in the stream memory. Expensive memory traffic is encountered only when
final results are written out.
To ensure high resource utilization, the stream scheduler performs software
pipelining at the stream level, where kernel programs are executed concurrently with memory
operations.
 Kernel Program:
Kernel programs are user specified programs which are executed over the set of input
streams to produce elements onto the set of output streams. Kernel programs operate
implicitly over the entire set of input streams. The kernel will execute once for each
element in the input stream. The run-time will guarantee that all input streams are the
same length. If a kernel is called with input streams of differing lengths, the kernel will
immediately fault.,producing no output. Kernel programs are declared similar to standard C-
functions. An example kernel is shown below:
void kernel foo (float a <>, out float b <>, float p)
{
b = a+p;
}
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 9
The parameter 'a' is the input stream,
'b' is the output stream,
'p' is a constant parameter.
 The body of the kernel foo will be executed for each input element of 'a' in
parallelism, producing a new stream 'b', which contains a set floats which are p larger
than the corresponding 'a' elements.
 The keyword kernel is a function descriptor used to descript that this is a kernel
function, not a normal C function.
 Kernel functions are called in the same method as normal C function however
the stream arguments should be declared streams.
Programmers can distribute several kernels on different PE array chip, so we
can get several kernels in parallel. This kind of parallelism can match task parallelism of
image intelligent processing.
 Parallel Pes Communication Function:
The permute operation is used to explicitly interchanges values between PEs.
The permute operation looks like permute (PE-permutation, expression).The permute
operation takes a constant describing the permutation and the value to be
permuted as arguments. The permutation is an integer in which the nth nibble specifies the
index of the PE from which the nth PE gets the value of the permuted variable. For
instance, "permute (Ox65432107,x)", rotates the value ofx in each of the PEs to the left by
specifying that PE 7 gets the value from PE 6 and so on.
 System Working Process:
Two different types of programs are compiled and loaded on the basic image
processing parallel system. Both stream programs and the stream scheduler are compiled
for the serial processor with the C-like language. On boot, the serial processor reads a
small boot program from boot RAlv1and executes the full stream programs. Also, the
compiler compiles kernel programs for the SIMD coprocessor. The stream scheduler
loads compiled kernel programs into the SIMD coprocessor's microcode store.
For the extension parallel computer, the programmer needs to divide the application
into task parallelism in advance. The application programmer divides image processing
application in task parallelism and data parallelism in the view of image processing's three
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 10
level processmg. For the task parallelism programs, communication can be realized in the
function of MPI tools.
 CONCLUSION
The research on parallel computing system for the image intelligent processing is
popular. There are some research parts for parallel computing system for image intelligent
processing being discussed in this article. The system has two functions.
1. the system can be used as an evaluation platform for parallel system of image
processmg.
2. the system can be used as a development and debugging platform for
Image processing applications.
Through several kinds of image applications' analysis and evaluation we can get some
suggestions to improve the parallel computer's architecture which could help us
get new parallel system for image intelligent processing with higher speed and higher
performance.
 FUTURE WORKS
In Fig. 1, we get a serial processor that is a SISD processor. The coprocessor is a
SIMD processor. When we deploy the SISD processor and the SIMDcoprocessor on a
board or a chip, we get a single computer system. When several this kind of boards
or chips organize together, then setting another host processor as a management node we
can get a total MIJvlD processor which supports data parallelism, task parallelism and
pipeline parallelism. Then the image processing will be greatly promoted. Also we need
new software technology to support these kinds of parallelism.
There are many multi core processors. The parallel computer is a kind of multi
core processor. Many researches about multi core processor's software components are
doing now, such as runtime system, programming language and compiler. These are still a
challenge in the parallel computing area.
Parellel Computing system for image intelligent processing
M.SC.IT-PART-II, Sathaye College Page 11

Mais conteúdo relacionado

Mais procurados

memory Interleaving and low order interleaving and high interleaving
memory Interleaving and low order interleaving and high interleavingmemory Interleaving and low order interleaving and high interleaving
memory Interleaving and low order interleaving and high interleaving
Jawwad Rafiq
 
Computer Oraganisation and Architecture
Computer Oraganisation and ArchitectureComputer Oraganisation and Architecture
Computer Oraganisation and Architecture
yogesh1617
 

Mais procurados (20)

Computer System Architecture
Computer System ArchitectureComputer System Architecture
Computer System Architecture
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Coa module1
Coa module1Coa module1
Coa module1
 
Unit 4-input-output organization
Unit 4-input-output organizationUnit 4-input-output organization
Unit 4-input-output organization
 
Memory organization
Memory organizationMemory organization
Memory organization
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Basic+machine+organization
Basic+machine+organizationBasic+machine+organization
Basic+machine+organization
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Computer organisation -morris mano
Computer organisation  -morris manoComputer organisation  -morris mano
Computer organisation -morris mano
 
Chap2 comp architecture
Chap2 comp architectureChap2 comp architecture
Chap2 comp architecture
 
Unit 5-lecture 5
Unit 5-lecture 5Unit 5-lecture 5
Unit 5-lecture 5
 
Computer architecture and organization
Computer architecture and organizationComputer architecture and organization
Computer architecture and organization
 
Ch8
Ch8Ch8
Ch8
 
memory Interleaving and low order interleaving and high interleaving
memory Interleaving and low order interleaving and high interleavingmemory Interleaving and low order interleaving and high interleaving
memory Interleaving and low order interleaving and high interleaving
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
 
Computer Oraganisation and Architecture
Computer Oraganisation and ArchitectureComputer Oraganisation and Architecture
Computer Oraganisation and Architecture
 
Computer Organization Lecture Notes
Computer Organization Lecture NotesComputer Organization Lecture Notes
Computer Organization Lecture Notes
 
Ntroduction to computer architecture and organization
Ntroduction to computer architecture and organizationNtroduction to computer architecture and organization
Ntroduction to computer architecture and organization
 
Bindura university of science education
Bindura university of science educationBindura university of science education
Bindura university of science education
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 

Semelhante a parallel processing

Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memory
Mahesh Kumar Attri
 
number system understand
number system  understandnumber system  understand
number system understand
rickypatel151
 
11. Computer Systems Hardware 1
11. Computer Systems   Hardware 111. Computer Systems   Hardware 1
11. Computer Systems Hardware 1
New Era University
 

Semelhante a parallel processing (20)

Multilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memoryMultilevel arch & str org.& mips, 8086, memory
Multilevel arch & str org.& mips, 8086, memory
 
Cliff sugerman
Cliff sugermanCliff sugerman
Cliff sugerman
 
COMPUTER COMPONENTS
COMPUTER COMPONENTSCOMPUTER COMPONENTS
COMPUTER COMPONENTS
 
Par com
Par comPar com
Par com
 
IT Week 3
IT Week 3IT Week 3
IT Week 3
 
System components
System componentsSystem components
System components
 
Computer.pptx
Computer.pptxComputer.pptx
Computer.pptx
 
How a cpu works1
How a cpu works1How a cpu works1
How a cpu works1
 
How a cpu works1
How a cpu works1How a cpu works1
How a cpu works1
 
Learn about computer hardware and software
Learn about computer hardware and softwareLearn about computer hardware and software
Learn about computer hardware and software
 
System Software ( Os )
System Software ( Os )System Software ( Os )
System Software ( Os )
 
Pankaj kumar
Pankaj kumar Pankaj kumar
Pankaj kumar
 
Digital Computer
Digital ComputerDigital Computer
Digital Computer
 
Unit2fit
Unit2fitUnit2fit
Unit2fit
 
The Basics of Cell Computing Technology
The Basics of Cell Computing TechnologyThe Basics of Cell Computing Technology
The Basics of Cell Computing Technology
 
number system understand
number system  understandnumber system  understand
number system understand
 
Computer fundamental
Computer fundamentalComputer fundamental
Computer fundamental
 
11. Computer Systems Hardware 1
11. Computer Systems   Hardware 111. Computer Systems   Hardware 1
11. Computer Systems Hardware 1
 
In the heart of computers
In the heart of computersIn the heart of computers
In the heart of computers
 
Computer and its types
Computer and its typesComputer and its types
Computer and its types
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 

parallel processing

  • 1. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 1 Parallel Computing System For Image Intelligent Processing  ABSTRACT: In this paper, a parallel computing system for image intelligent processing is described. This parallel computing system includes two main parts: A parallel computer and a set of software tools. The parallel computer is consinicted by a host processor, a SIMD coprocessor, a stream memory and a controller of this memory. Furthermore, the parallel computer can be extended by different kinds of organization with the host processors and the SIMD coprocessors. Programmers write image intelligent processing programs in a C- like language and a set of software tools maps these programs to code that runs on the parallel computer. These two parts work together to provide a high performance for image intelligent processing.  INTRODUCTION In this paper, we describe a parallel computing system for image intelligent processing applications development and show how high performance image intelligent processing applications are realized on this system. This system includes a parallel computer and a set of software tools.  How parallel computing system for image intelligent processing takes place? Image intelligent processing applications are progmmmed using a C-like language. This C-like language includes traditional serial statements and data parallel statements. The serial statements part as stream programs. The data parallel statements part as kernel programs. The stream programs run on the host processor. The kernel programs run on the SIMD coprocessor. Image data and kernel programs are transferred from main memory to stream memory. Finally, image data arrive at coprocessor’s local register file and the kernel programs arrive at coprocessor’s program memory. The whole transformation is controlled by the stream memory controller.
  • 2. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 2  SMID : Single instruction multiple data (SIMD), as the name suggests, A type of parallel computingarchitecture that is classified under Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Typically this consists of many simple processors, each with a local memoryin which it keeps the data which it will work on. Each processor simultaneously performs the same instruction on its local data progressing through the instructions in lock-step, with the instructions issued by the controller processor. The processors can communicate with each other in order to perform shifts and other array operations The number of parallel data elements processed at the same time is dependent on the size of the elements and the capacity of the data processing resources available. Usually the capacity of the data processing resources is fixed for a given architecture Thus, such machines exploit data level parallelism. SIMD is particularly applicable to common tasks like adjusting the contrast in a digital image or adjusting the volume of digital audio. Most modern CPU designs include SIMD instructions in order to improve the performance of multimedia use.
  • 3. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 3  APPLICATIONS We want to implement image intelligent processing on this system. Tasks in this kind of image processing can be divided to three levels. They are : Low level tasks, Middle level tasks and High level tasks. Low level tasks in image processing are also called preprocessing which is used to organize data for later image processing. Preprocessing mainly includes image enhancement and image recovery. In the opinion of computing, preprocessing can be viewed as three kinds of processes that have single pixel depended processing (such as threshold, quantification and coding etc.) part pixels processing (such as filtering, convolution etc.) area pixels processing and whole pixels processing (digital transformation such as FFT). All of the above processing has common characteristics as followings: The first is space inflexibility that is to say the same instruction can be executed on the whole of the image pixels. We call this characteristic data independence. The second is locality i.e., for each pixel its processing result only depends on its neighboring pixels. The third is the fixed data structure i.e., the image processing result is still an image array. From all the above we conclude that the image preprocessing has obvious data parallel. The aim of middle level processing is to find out theoutstanding features and the proper primitive elements from the images. The middle level processing includes image partition and description of image partition. The middle level processing corresponds to a transformation from image to feature. So, middle level processing is a mixture of data parallel and task parallel. The aim of high level processing is to recognize the content of image. This kind of processing can be divided into two periods that are symbol description and scene understanding. From these feature, we can conclude that the high level processing has obvious task parallel. When a parallel computer system can fit these typical features in image processing, the performance of this system needs to be greatly enhanced.
  • 4. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 4  PARALLEL COMPUTER The parallel computer includes a host processor, a SIMD coprocessor, a stream memory and a controller of this stream memory as Fig. 1 illustrated. Fig 1: parallel computer organization The parallel computer organization consist of three parts: The host processor is the main controller of the whole system. The stream memory is the memory between main memory . The coprocessor’s local register file. Its function is to enhance memory system’s speed to match the coprocessor’s requirement.  Parallel computer organization: The host processor is the main controller of the whole system. We call the host processor and the stream memory controller a serial processor. The stream memory controller manages the data transfer between the stream memory and the memory on serial processor. The memory management’s function is to enhance memory system’s data transfer speed to match the coprocessor’s computing requirement. In Fig. 1, the host processor, the stream memory controller, the stream memory and the SIMD coprocessor are always on a same chip and always the stream memory is constructed by register files.
  • 5. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 5  Parallel Computer Architecture: Fig. 2: Parallel computer architecture The parallel computer is a new generation image processing computer. As Fig. 2 illustrated this computer includes a key component which is called SIMD coprocessor. Always we call the SIMD coprocessor as PE (Processing Element) array. The host of the parallel computer can be a PC or a workstation. The FE array connects with host The host processor manages the whole computer. The host transfers kernel source programs and data to the PE array’s program memory and stream memory. Then the kemel programs are started on the PE array. Finally, the computing results are fetched from stream memory to the host for further analysis. T he host also does initialization, diagnosis and testing of the PE array. PE array is suitable to do matrix computing, so we can view the PE array as an accelerator which matches matrix computing. The PE array is constructed by many processing elements in a regular array. The PE array includes NxN PEs. Each PE includes ALU, local register file, buffer and router which does data communication. All of the PEs are connected by a conventional ring grirL The number of the PEs is decided by the computing requirement and the VLSI realization level.Because the PE anay is regular and each FE is simple, the PE array chip can be reabzed in small size. Furthermore, with many PE array chips we can get a parallel computer.
  • 6. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 6 The parallel computer provides a three levels memory hierarchy. This three levels memory is the: local register files of PE’s, the stream memory and The main memory. This hierarchy is used to exploit the parallelism and locality of image applications. The architecture of the parallel computer meets the computation and bandwidth demands of image applications by directly processing the naturally occurring data streams within image applications. The PE array is designed to be a coprocessor that operates on image data streams. Many PEs can provide high performance for real-time image processing. The stream memory effectively isolates the ALUs from the main memory, making the PE array a load/store architecture for data streams. All data stream operations transfer data streams to or from the stream memory. For instance, the PE array transfers data streams directly out of and into the stream memory, isolating PE array transfers from main memory accesses and computation. This simplifies the design of the processor. Meanwhile the stream memory and the PE array can works in pipeline. So data streams transfer speed is promoted by this stream memory.  Parallel computer extension: The above parallel computer is a basic image processing computer. The host processor is a SISD computer. The coprocessor is a SIMD on a serial computer which can be an 8 PEs chip or a 16 PEs chip. We can construct 8 PE chips or 16 PE chip on a board. In this kind of organization we can construct a 64 PEs or 256 PEs in one parallel computer. Furthermore, the 256 PEs on a board (for example a PCI board) can also be constructed in the number of 8, 16 or more to mount on a computer main board. In this kind of extension for the parallel computer we can get a parallel computer which can match the Image processmg application's performance requirem ents. Meanwhile, in the application section we know that image processing application has data parallelism and task parallelism characteristics. In order to construct a parallel computing system to match the need of different parallelism for image processing, we can use the basic image processing computers to construct an intelligent parallel computer for image intelligent processing. For example, use cable to connect several basic parallel computers together we can construct a real task parallel computer which can match three kinds of parallelism for image intelligent processing.
  • 7. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 7 In a word, the extension of the parallel computer can be realized in four ways: 1. Extend the number of pes on a chip. 2. Extend the number of PE chips on a board. 3 . Extend the number of boards on a computer main board. 4. Extend the number of computer main boards.  SOFTWARE The basic image processing computer decomposes an application into a series of kernel programs that operate on image data. We call this kind of image data streams. Streams are sets of sequential data records that lend themselves to high-performance computation through their regular and predictable structure. In simple words stream is a data collection that could be operated in parallel.  Stream programs are written in the SISD statement ofa C-like language.  The kernel programs are written in the SIMD statement of the C-like language.  The use of these two parts in the C-like language for image processing allows the programmer to use the familiar high-level C language's structure. The two-level programming system reflects the division of programs on the basic image processing computer: The kernel programs runing on the SIMD coprocessor, while the stream program runing on the serial processor corresponds to a high-level description of the image processmg. Fig. 3: Software partitions
  • 8. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 8 There are several program parts work together to execute an image application, as illustrated in Fig. 3. Stream programs run on the serial processor, call the kernel programs and initiate the transfer of data streams. Stream programs instruct the SIMD coprocessor to run kernel programs and then send out the resulting data.  Stream Program And Stream Scheduler: Stream programs are written in the C-like language with calls into kernel programs. Kernel program calls are treated as functions that take input and output streams as arguments. The stream scheduler allocates space in serial processor's memory and in stream memory used to store streams that are passed between kernel programs. The scheduler determines when to place streams in the stream memory and when to move them to and from serial processor's memory to take advantage of producer- consumer locality between different kernel programs. Efficient stream programs minimize traffic between the stream memory and the serial processor's memory. When the image data is larger than the stream memory, the scheduler breaks the large stream into smaller batches. One batch is loaded into stream memory at a time. The kernel program executes on this batch before computation begins on the next rows. In this way, raw image data arrives in the stream memory and passes between kernel programs through temporary storage in the stream memory. Expensive memory traffic is encountered only when final results are written out. To ensure high resource utilization, the stream scheduler performs software pipelining at the stream level, where kernel programs are executed concurrently with memory operations.  Kernel Program: Kernel programs are user specified programs which are executed over the set of input streams to produce elements onto the set of output streams. Kernel programs operate implicitly over the entire set of input streams. The kernel will execute once for each element in the input stream. The run-time will guarantee that all input streams are the same length. If a kernel is called with input streams of differing lengths, the kernel will immediately fault.,producing no output. Kernel programs are declared similar to standard C- functions. An example kernel is shown below: void kernel foo (float a <>, out float b <>, float p) { b = a+p; }
  • 9. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 9 The parameter 'a' is the input stream, 'b' is the output stream, 'p' is a constant parameter.  The body of the kernel foo will be executed for each input element of 'a' in parallelism, producing a new stream 'b', which contains a set floats which are p larger than the corresponding 'a' elements.  The keyword kernel is a function descriptor used to descript that this is a kernel function, not a normal C function.  Kernel functions are called in the same method as normal C function however the stream arguments should be declared streams. Programmers can distribute several kernels on different PE array chip, so we can get several kernels in parallel. This kind of parallelism can match task parallelism of image intelligent processing.  Parallel Pes Communication Function: The permute operation is used to explicitly interchanges values between PEs. The permute operation looks like permute (PE-permutation, expression).The permute operation takes a constant describing the permutation and the value to be permuted as arguments. The permutation is an integer in which the nth nibble specifies the index of the PE from which the nth PE gets the value of the permuted variable. For instance, "permute (Ox65432107,x)", rotates the value ofx in each of the PEs to the left by specifying that PE 7 gets the value from PE 6 and so on.  System Working Process: Two different types of programs are compiled and loaded on the basic image processing parallel system. Both stream programs and the stream scheduler are compiled for the serial processor with the C-like language. On boot, the serial processor reads a small boot program from boot RAlv1and executes the full stream programs. Also, the compiler compiles kernel programs for the SIMD coprocessor. The stream scheduler loads compiled kernel programs into the SIMD coprocessor's microcode store. For the extension parallel computer, the programmer needs to divide the application into task parallelism in advance. The application programmer divides image processing application in task parallelism and data parallelism in the view of image processing's three
  • 10. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 10 level processmg. For the task parallelism programs, communication can be realized in the function of MPI tools.  CONCLUSION The research on parallel computing system for the image intelligent processing is popular. There are some research parts for parallel computing system for image intelligent processing being discussed in this article. The system has two functions. 1. the system can be used as an evaluation platform for parallel system of image processmg. 2. the system can be used as a development and debugging platform for Image processing applications. Through several kinds of image applications' analysis and evaluation we can get some suggestions to improve the parallel computer's architecture which could help us get new parallel system for image intelligent processing with higher speed and higher performance.  FUTURE WORKS In Fig. 1, we get a serial processor that is a SISD processor. The coprocessor is a SIMD processor. When we deploy the SISD processor and the SIMDcoprocessor on a board or a chip, we get a single computer system. When several this kind of boards or chips organize together, then setting another host processor as a management node we can get a total MIJvlD processor which supports data parallelism, task parallelism and pipeline parallelism. Then the image processing will be greatly promoted. Also we need new software technology to support these kinds of parallelism. There are many multi core processors. The parallel computer is a kind of multi core processor. Many researches about multi core processor's software components are doing now, such as runtime system, programming language and compiler. These are still a challenge in the parallel computing area.
  • 11. Parellel Computing system for image intelligent processing M.SC.IT-PART-II, Sathaye College Page 11