SlideShare a Scribd company logo
1 of 10
Download to read offline
Heterogeneous Computing:
     Challenges and
     Opportunities
     Ashfaq A. Khokhar, Viktor K. Prasanna, Muhammad E. Shaaban,
     and Cho-Li Wang
     University of Southern California




                                      omogeneous computing, which uses one or more machines of the same
                                      type, has provided adequate performance for many applications in the
                                      past. Many of these applications had more than one type of embedded
                            parallelism, such as single instruction, multiple data (SIMD) and multiple instruc-
                            tion, multiple data (MIMD). Most of the current parallel machines are suited only
                            for homogeneous computing. However, numerous applications that have more
                            than one type of embedded parallelism are now being considered for parallel
                            implementation. On the other hand, as the amount of homogeneous parallelism in
                            applications decreases, homogeneous systems cannot offer the desired speedups.
                            T o exploit the heterogeneity in computations, researchers are investigating a suite
                            of heterogeneous architectures.
Anytime you work with          Heterogeneous computing (HC) is the well-orchestrated and coordinated effec-
                            tive use of a suite of diverse high-performance machines (including parallel
  oranges and apples,       machines) to provide superspeed processing for computationally demanding tasks
you'll need a number of     with diverse computing needs.' An H C system includes heterogeneous machines,
                            high-speed networks, interfaces, operating systems, communication protocols,
  schemes to organize       and programming environments, all combining to produce a positive impact on
                            ease of use and performance. Figure 1 shows an example H C environment.
total performance. This        Heterogeneous computing should be distinguished from network computing or
   article surveys the      high-performance distributed computing, which have generally come to mean
                            either clusters of workstations or ad hoc connectivity among computers using little
  challenges posed by       more than opportunistic load-balancing. H C is a plausible, novel technique for
     heterogeneous          solving computationally intensive problems that have several types of embedded
                            parallelism. H C also helps to reduce design risks by incorporating proven technol-
    computing and           ogy and existing designs instead of developing them from scratch. However,
                            several issues and problems arise from employing this technique, which we discuss.
    discusses some             In the past few years, several technical meetings have addressed many of these
 approaches to opening      issues. There is also a growing interest in using this paradigm to solve Grand
                            Challenges problems. Richard Freund has organized the Heterogeneous Process-
  up its opportunities.     ing Workshops held each year at the I E E E International Parallel Processing

18                          0018-9162/93/0600-0018$03.00 Q 1993 IEEE                             COMPUTER
Glossary
Symposiums.’ Another related yearly
meeting is the IEEE International Sym-           Analytical benchmarking: A procedure to analyze the relative effectiveness
posium on High-Performance Distrib-            of machines on various computational types.
uted Computing.’
                                                  Code-type profiling: A code-specific function to identify various types of par-
                                               allelism present in code and to estimate the execution times of each code type.
Heterogeneous systems                            Cross-machine debuggers: Those available within the heterogeneous com-
                                               puting environment to help debug the application code that executes over multi-
   The quest for higher computational          ple machines.
power suitable for a wide range of ap-           Cross-over overhead: That incurred in transferring data from one machine
plications at a reasonable cost has ex-        to another. It also includes data-format-conversion overhead between the two
posed several inherent limitations of          machines.
homogeneous systems. Replacing such              Cross-parallel compiler: An intelligent compiler that can generate intermedi-
systems with yet more powerful homo-           ate code executable on different parallel machines.
geneous systems is not feasible. More-
                                                 Heterogeneous computing (HC): A well-orchestrated, coordinated effective
over. this approach does not improve
                                               use of a suite of diverse high-performance machines (including parallel ma-
the versatility of the system. H C offers
                                               chines) to provide fast processing for computationally demanding tasks that
a novel cost-effective approach to these
                                               have diverse computing needs.
problems; instead of replacing existing
multiprocessor systems at high cost, HC           Metacomputations: Computations exhibiting coarse-grained heterogeneity
proposes using existing systems in an          in terms of embedded parallelism.
integrated environment.                          Mixed-mode computations: Computations exhibiting fine-grained heteroge-
                                               neity in terms of embedded parallelism.
   Limitationsof homogeneous systems.            Multiple instruction,multiple data (MIMD): A mode in which code stored in
Conventional homogeneous systems               each processor’s local memory is executed independently.
usually use one mode of parallelism in a
given machine (like SIMD, MIMD, or               Single instruction, multiple data (SIMD): A mode in which all processors
vector processing) and thus cannot ad-         execute the same instruction synchronously on data stored in their local
equately meet the requirements of ap-          memory.
plications that require more than one




                                        MasPar MP-2
                                                                                                User workstations




                                                                                                                    Cray Y-MP

        Connection Machine CM-5




                               Massively Parallel Processor (MPP)        Image-UnderstandingArchitecture (IUA)
I

Figure 1. An example heterogeneous computing environment.


June 1993                                                                                                                           19
of heterogeneous machines (so that each
                                                           Special                                portion of the code is executed on its
                       Vector     MIMD           SlMD     ~urnose
                                                                                                  matching machine type) is likely t o

                                 /25            2 5 / y T o t a l         time = 100 units
                                                                                                  achieve speedups. Figure 2 illustrates a
                                                                                                  possible scenario (the numbers are exe-
                                                                                                  cution times in terms of basic units).

                                                                                                    Heterogeneous computing. Hetero-
                                                                                                  geneity in computing systems is not an
                                                                                                  entirely new concept. Several types of
                                                           tal time = 50 units
                                                                                                  special-purpose processors have been
                                                        Communication time                        used to provide specific services for
                                                                                                  improving system throughput. One of
                                                                                                  the most common is I/O handling. At-
                                                                                                  taching floating-point processors t o host
                                    1        1    Total time = 4 units +
                                                               communication overhead             computers is yet another heterogeneous
                                                                                                  approach t o enhance system perfor-
Figure 2. Execution of example code using various systems.                                        mance. In high-performance comput-
                                                                                                  ers, the concept of heterogeneity mani-
                                                                                                  fests itself at the instruction level in the
type of parallelism. As a result, any            the code is executed rapidly, while oth-         form of several types of functional units,
single type of machine often spends its          er portions of the code still have rela-         such as vector arithmetic pipelines and
time executing code for which it is poor-        tively higher execution times. Similarly,        fast scalar processors. However, cur-
ly suited. Moreover, many applications           the same code when executed on a suite           rent multiprocessor systems remain
need to process information at more
than one level concurrently, with differ-
ent types of parallelism at each level.
Image understanding, a Grand Chal-
lenges problem, is one such applica-
tion.'
    At the lowest level of computer vi-
sion, image-processing operations are                        of machines in                       Algorithm design
applied t o the raw image. These compu-                     HC environments
tations have a massive SIMD-type par-                                                                                                 t
allelism. In contrast, the participants in
t h e D A R P A Image-Understanding
                                                                             I                Partitioning and mapping                 I
Benchmark exercises' observed that
high-level image-understanding compu-
tations exhibit coarse-grained MIMD-
type characteristics. For such appli-
cations, users of a conventional multi-
processor system must either settle for
degraded performance on the existing
hardware or acquire more powerful (and
expensive) machines.
    Each type of homogeneous system
suffers from inherent limitations. For
example, vector machines employ in-
terleaved memory with apipelined arith-
metic logic unit, leading t o performance
in high million floating-point operations
per second (Mflops). If the data distri-
bution of an application and the result-
ing computations cannot exploit these
features, the performance degrades se-
verely.
   Consider an application code having
mixed types of embedded parallelism.
Assume that the code when executed
on a serial machine spends 100 units of                                      I                Proarammina environment                 I
time. When this code is executed on a
vector machine, the vector portion of            Figure 3. User-directed approach.

20                                                                                                                            COMPUTER
mostly homogeneous as far as the type        complete redesign. Since H C comprises      parallel code of the application is taken
of parallelism supported by them. Such       several autonomous computers, overall       as input. To run this code in an H C
systems have been traditionally classi-      system fault tolerance and longevity are    environment, users must profile the types
fied according to the number of instruc-     likely to improve.                          of heterogeneous parallelism embed-
tion and data streams.                                                                   ded in the code. For this purpose, code-
   An H C environment must contain                                                       type profilers need to be designed. Fig-
the following components:                    Issues                                      ures 3 and 4 illustrate these approaches.
                                                                                         However, both approaches need strate-
    a set of heterogeneous machines,            We consider two approaches to using      gies for partitioning, mapping, schedul-
    an intelligent high-speed network        the H C paradigm. The first one analyz-     ing, and synchronization. New tools and
    connecting all machines, and             es an application to explore embedded       metrics for performance evaluation are
    a (user-friendly) programming en-        heterogeneous parallelism. Research-        also required. Parallel programming en-
    vironment.                               ers must devise new algorithms or mod-      vironments are needed to orchestrate
                                             ify existing ones to exploit the hetero-    the effective use of the computing re-
H C lets a given system be adapted to a      geneity present in the application. Based   sources.
wide range of applications by augment-       on these algorithms, users develop the
ing it with specific functional or perfor-   code to be executed by the machines.          Algorithm design. Heterogeneous
mance capabilities without requiring a          In the second approach, an existing      computing opens new opportunities for
                                                                                         developing parallel algorithms. In this
                                                                                         section, we identify the efforts needed
                                                                                         to devise suitable algorithms. The fol-
                                                                                         lowing issues must be considered by the
                                                                                         designer:

                                                                                           (1) the types of machines available
                                        I        Code analysis       I                         and their inherent computingchar-
                                                                                               acteristics,
                                                                                           (2) alternate solutions t o various


                               Vector
                                                  J-
                                                 MIMD        SIMD
                                                                         J-
                                                                         SP
                                                                                               subproblems of the application,
                                                                                               and
                                                                                           (3) the costs of performing the com-
                                                                                               munication over the network.


                         c                                                                 Computations in H C can be classified
                                                                                         into two

                                             I                   I                           Metacomputing. Computations in
                                                                                         this class fall into the category of coarse-
                                                                                         grained heterogeneity. Instructions be-
                                                                                         longing to a particular class of parallel-
                                                                                         ism are grouped to form a module; each
                                                                                         module is then executed on a suitable
                                                                                         parallel machine. Metacomputing re-
                                                                                         fers to heterogeneity at the module lev-
                                                                                         el.
                                                                                             Mixed-modecomputing. In this fine-
                                                                                         grained heterogeneity, almost every al-
                                                                                         ternate parallel instruction belongs to a
                                                                                         different class of parallel computation.
                                                                                         Programs exhibiting this type of heter-
                                                                                         ogeneity are not suitable for execution
                                                                                         on a suite of heterogeneous machines
                                                                                         because the communication overhead
                                                                                         due to frequent exchange of informa-
                                                                                         tion between machines can become a
                                                                                         bottleneck. However, these programs
                                                                                         can be executed efficiently on a single
                                                        m                                machine such as PASM (Partitionable
                                                                                         SIMD/MIMD) which incorporates het-
                         I              Programming environment                I         erogeneous modes of computation.
                                                                                         Mixed-mode computing refers to heter-
Figure 4. Compiler-directed approach.                                                    ogeneity at the instruction level.

June 1993                                                                                                                         21
Mixed-mode machines can achieve            show that SIMD machines are well suit-       common goal of the mapping process is
large speedups for fine-grained hetero-      ed for operations such as matrix compu-      to accomplish these assignments such
geneity by using the mixed-mode pro-         tations and low-level image processing.      that the overall runtime of the task is
cessing available in a single machine. A     MIMD machines. on the other hand,            minimized.
mixed-mode machine, for example. can         are most efficient when an application          Chen et a1,"proposed a heuristic map-
use its mode-switching capability to         can be partitioned into a number of          ping methodology based on the Clus-
support SIMDiMIMD parallelism and            tasks that have limited intercommuni-        ter-M mdoel, which facilitates the de-
hardware-barrier synchronization, thus       cation. Note that analytical benchmark       sign of portable software. Only one
improving its performance over a ma-         results are used in partitioning and map-    algorithm is required for a given appli-
chine operatingin SIMD or MIMD mode          ping.                                        cation, regardless of the underlying ar-
only.                                                                                     chitecture. Various types of parallelism
                                                Partitioning and mapping. Problems        present in the application are identi-
   Code-type profiling. Fast parallel ex-    that occur in these areas of a homoge-       fied. In addition, all communication
ecution of the code in a heterogeneous       neous parallel environment have been         and computation requirements of the
computing environment requires iden-         widely studied. The partitioning prob-       application are preserved in an inter-
tifying and profiling the embedded par-      lem can be divided into two subprob-         mediate specification of the code. The
allelism. Traditional program profiling      lems. Parallelism detection determines       architecture of each machine in the en-
involves testing a program assumed to        the parallelism present in a given pro-      vironment is modeled in the system rep-
consist of several modules by executing      gram. Clustering combines several op-        resentation, which captures the inter-
it on suitable test data. The prqfiler       erations into a program module and           connections of the architecture. The four
monitors the execution of the program        thus partitions the application into sev-    components of this approach are
and gathers statistics, including the ex-    eral modules. These two subproblems
ecution time of each program module.         can be handled by the user, the compil-          an intermediate model to provide
This information is then used t o modify     er, or the machine at runtime.               an architecture-independent algorithm
the modules to improve the overall ex-          In HC, parallelism detection is not       specification of the application,
ecution time.                                the only objective; code classification          languages to support the specifica-
   In HC. profiling is done not only t o     based on the type of parallelism is also     tion in the intermediate model (such
estimate the code's execution time on        required. This is accomplished by code-      languages should be machine-indepen-
a particular machine but also t o analyze    type profiling, which also poses addi-       dent and allow a certain amount of ab-
the code's type. This is achieved by         tional constraints o n clustering.           straction of the computations),
code-type profiling. As introduced by           Mapping (allocating) program mod-             a tool that lets users specify topolo-
Freund.' this code-specific function is      ules to processors has been addressed        gies of the machines employed in the
an off-line procedure: the statistics to     by many researchers. Informally, in          H C environment, and
be gathered include the types of paral-      homogeneous environments, the map-               amappingmodule tomatch theprob-
lelism of various modules in the code        ping problem can be defined as assign-       lem specification and the system repre-
and the estimated execution time of          ing program modules to processors so         sentation.
each module on the machines available        that the total execution time (including
in the environment. Code types that can      the communication costs) is minimized.       Figure 5 illustrates this methodology.
be identified include vectorizable,          Several other costs, such as the interfer-
SIMDiMIMD parallel, scalar, and spe-         ence cost, have also been considered. In       Machine selection. An interesting
cial purpose (such as fast Fourier trans-    HC, however, other objectives, such as       problem appears in the design of H C
form).                                       matching the code type to the machine        environments: How can one find the
                                             type, result in additional constraints. If   most appropriate suite of heterogeneous
  Analytical benchmarking. This test         such a mapping has to be performed at        machines for a given collection of appli-
measures how well the available ma-          runtime for load-balancingpurposes (or       cation tasks subject to a given constraint.
chines perform on a given code type.-        due to machine failure), the mapping         such as cost a n d execution time?
While code-type profiling identifies the     problem becomes more complex due to          Freund' has proposed the Optimal Se-
type of code. analytical benchmarking        the overhead associated with the code        lection Theory (OST) t o choose an op-
ranks the available machinesin terms of      and data-format conversions. Various         timal configuration of machines for ex-
their efficiency in executing a given code   approaches to optimal and approximate        ecuting an application task on a
type. Thus. analytical benchmarking          partitioning and mapping in H C have         heterogeneous suite of computers with
techniques permit researchers to deter-      been studied.X-l"                            the assumption that the number of ma-
mine the relative effectiveness of a giv-       Mapping in H C can be performed           chines available is unlimited. It is also
en parallel machine on various types of      conceptually at two levels: system (or       assumed that machines matching the
computation.                                 macro) and machine (or micro). A t the       given set of code types are available and
  This benchmarking is also an off-line      system-level mapping, each module is         that the application code is decomposed
process and is more rigorous than previ-     assigned to one or more machines in the      into equal-sized modules.
ous benchmarking techniques, which           system so that the parallelism embed-          Wang et al.'s Augmented Optimal
simply looked at the overall result of       ded in the module matches the machine        Selection Theory (A0ST)l"incorporates
running an entire benchmark code on a        type. Machine-level mapping assigns          the performance of code segments on
processor. Some experimental results         portions of the module to individual         nonoptimal machine choices, assuming
obtained by analytical benchmarking          processors in the machine. The most          that the number of available machines

22                                                                                                                   COMPUTER
Heterogeneousarchitecture
for each code type is limited. In this
approach, the program module most
suitable for one type of machine is as-
signed to another type of machine. In
the formulation of OST and AOST, it
has been assumed that the execution of
all program modules of a given applica-
tion code is totally ordered in time. In
reality, however, different execution
interdependencies can exist among pro-
gram modules. Also, parallelism can be
present inside a module, resulting in
further decomposition of program mod-
ules. Furthermore, the effect of differ-
ent mappings on different machines
available for a program module has not
been considered in the formulation of
these selection theories.                                                      Problem-specification tool
   The Heterogeneous Optimal Selec-
tion Theory (H0ST)'extends AOST in
two ways. It incorporates the effect of
various mapping techniques available
on different machines for executing a
program module. Also, the dependen-            Figure 5 Cluster-M-basedheuristic mapping methodology.
                                                       .
cies between the program modules are
specified as a directed graph. Note that
OST and AOST assume linear ordering            tion to the dual of the above problem,       such as FIFO, round-robin, shortest-
of program modules. In the formulation         that is. finding a least expensive set of    job-first, and shortest-remaining-time,
of HOST, an application code is as-            machines to solve a given application        can be employed at each level of sched-
sumed to consist of subtasks to be exe-        subject to a maximal execution time          uling.
cuted serially. Each subtask contains a        constraint. This scheme is applicable to        While all three levels of scheduling
collection of program modules. Each            all of the above selection theories. The     can reside in each machine in an HC
program module is further decomposed           accuracy of the scheme, however, de-         environment, a fourth level is needed to
into blocks of parallel instructions, called   pends upon the method used to assign         perform with scheduling at the system
code blocks.                                   the program modules to the machines.         level. This scheduler maintains a bal-
   To find an optimal set of machines,         Iqbal also shows that for applications in    anced system-wide workload by moni-
we have to assign the program modules          which the program modules communi-           toring the progress of all program mod-
to the machines so that                        cate in a restrictive manner, one can        ules. In addition, the scheduler needs to
                                               find exact algorithms for selecting an       know the different module types and
                                               optimal set of machines. If, however,        available machine types in the environ-
                                               the program modules communicate in           ment, since modules may have to be
is minimal. while                              an arbitrary fashion, the selection prob-    reassigned when the system configura-
                                               lem is NP-complete.                          tion changes or overload situations oc-
  zc 5 c,,,
   '                                                                                        cur. Communication bottlenecks and
                                                  Scheduling. In homogeneous environ-       queueing delays incurred due to the
where P i s the time to execute program        ments, a scheduler assigns each pro-         heterogeneity of the hardware add con-
module i, C' is the cost of the machine        gram module to a processor to achieve        straints on the scheduler.
on which program module i is to be             desired performance in terms of pro-
executed, and C,,, is an overall con-          cessor utilization and throughput. De-          Synchronization. This process pro-
straint on the cost of the machines. The       signers usually employ three schedul-        vides mechanisms to control execution
cost c and execution time 71 corre-
       '                                       ing levels. High-level scheduling, also      sequencing and to supervise interpro-
sponding to the assignment under con-          called job scheduling, selects a subset of   cess cooperation. It refers to three dis-
sideration can be obtained by usingcode-       all submitted jobs competing for the         tinct but related problems:
type profiling and/or by analyzing the         available resources. Intermediate-level
algorithms.                                    scheduling responds to short-term fluc-          synchronization between the send-
   Iqbal" presented a selection scheme         tuations in the system load by tempo-            er and receiver of a message,
that finds an assignment of program            rarily suspending and activating pro-          .specification and control of the
modules to machines in H C so that the         cesses t o achieve smooth system                 shared activities of cooperating pro-
total processing time is minimized, while      operation. Low-level scheduling de-              cesses, and
the total cost of machines employed in         termines the next ready process to be            serialization of concurrent accesses
the solution does not exceed an upper          assigned to a processor for a certain            to shared objects by multiple pro-
bound. The scheme can also find a solu-        duration. Different scheduling policies,         cesses.

June 1993                                                                                                                         23
A variety of synchronization meth-          the topology, reliability, speed, and        length, a bandwidth on the order of 1
ods have been proposed in the past:           bandwidth of the network, in addition        gigabitlsecond is required t o match the
semaphores, conditional critical regions,     t o the types and number of machines in      computation and communication speeds.
monitors, and pass expressions, among         the environment. However, reducing              Even if higher bandwidth networks
others. In addition, some multiproces-        synchronization overhead is important        were available, three main sources of
sors include hardware synchronization         t o achieving large speedups in HC. Due      inefficiency would persist in current net-
primitives. In general, synchronization       t o the possibility of several concurrent-   works. First, application interfaces in-
can be implemented by using shared            ly operating autonomous machines in          cur excessive overhead due to context
variables or by message-passing.              the environment, application-code per-       switching and data copying between the
  In heterogeneous computing, the syn-        formance in H C is more sensitive t o        user process and the machine’s operat-
chronization problem resembles that of        synchronization overheads. Frequent          ing system. Second, each machine must
distributed systems. I n both cases, a        hand-shaking for synchronization may         incur the overheadof executing the high-
global clock and shared memory are            expend most of the available network         level protocols that ensure reliable com-
absent. and (unpredictable) network           bandwidth.                                   munication between program modules.
delays and a variety of operating sys-                                                     Also, the networkinterface burdens the
tems and programming environments                Interconnection requirements. Cur-        machine with interrupt handling and
complicate the process.                       rent local area networks (LANs) are          header processing for each packet. This
  Several techniques used in distribut-       not suitable for H C because higher band-    suggests incorporating additional net-
ed systems are again useful for solving       width and lower latency networks are         work-interface hardware in each ma-
H C synchronization problems. Two             needed. The bandwidth of commercial-         chine.
approaches are available: centralized         ly available LANs is limited to about 10        Nectar’* is an example of a network
(one machine is designated as a control       megabits per second. On the other hand,      backplane for heterogeneous multicom-
node) and distributed (decision-mak-          in HC, assuming machines operating at        puters. It consists of a high-speed fiber-
ing is distributed across the entire sys-     40 megahertz and 20 million instruc-         optic network, large crossbar switches,
tem). The correct choice depends on           tions per second with a 32-bit word          and powerful network-interface proces-
                                                                                           sors. Protocol processing is off-loaded
                                                                                           to these interface processors. A net-
                                                                                           working standard called Hippi (ANSI
     Some academic sites                                                                   X3T9.3 High-Performance Parallel In-
                                                                                           terface)’? is being implemented for re-
        A number of academic sites are developing HC environments and applica-             alizing heterogeneous computing envi-
     tions (this list is not exhaustive).                                                  ronments at various research sites. Hippi
                                                                                           is an open standard that defines the
     Systems and architectures                                                             physical and logical link layers of a 100-
                                                                                           Mbytelsecond network.
       Distributed High-speed Computing (DHSC) project at Pittsburgh Supercom-
                                                                                              In HC, hardware modules from vari-
         puting Center, University of Pittsburgh
                                                                                           ous vendors share physical intercon-
       Image-Understanding Architecture, University of Massachusetts at Amherst            nections. Differing communication pro-
       Mentat, University of Virginia                                                      tocols may make network-management
                                                                                           problems complex. The following gen-
       Nectar-Based Heterogeneous System, Carnegie Mellon University                       eral approaches for dealing with net-
       Northeast Parallel Architecture Center (NPAC), Syracuse University                  work heterogeneity have been discussed
       Partitionable SIMD/MIMD (PASM), Purdue University
                                                                                           in the literature:
                                                                                             (1) treat the heterogeneous network
     Institutes and departments                                                                  as apartitionednetwork,witheach
       Beckman Institute, University of Illinois at Urbana-Champaign                             partition employing a uniform set
                                                                                                 of protocols;
       Department of Biological Sciences, University of California at Los Angeles
                                                                                             (2) have a single “visible” network
       Department of Computer Science, Kent State University                                     management console; and
      Department of Computer Science, University of California at San Diego                  (3) integrate the heterogeneousman-
                                                                                                 agement functions at a single
      Department of Computer and Information Sciences, New Jersey Institute of                   management console.
       Technology
                                                                                             The I E E E Computer Society Techni-
      Department of Electrical Engineering-Systems, University of Southern Cali-
                                                                                           cal Committee on Parallel Processing,
       fornia
                                                                                           the Technical Committee on Mass Stor-
      Department of Math and Computer Science, Emory University                            age, and several research sites are work-
      Minnesota Supercomputer Center (MSC), University of Minnesota at Minne-              ing together to define interface stan-
       apolis                                                                              dards.
      Supercomputer Computations Institute (SCI), Florida State University
                                                                                              Programming environments. A par-
                                                                                           allel programming environment includes

24                                                                                                                   COMPUTER
parallel languages, intelligent compil-
ers, parallel debuggers, syntax-directed
editors. configuration-management                                                        I              I
tools, and other programming aids.
   In homogeneous computing, intelli-
gent compilers detect parallelism in
sequential code and translate it into
parallel machine code. Parallel program-
ming languages have been developed to
support parallel programming, such as
MPL for MasPar machines, and Lisp
and C for the Connection Machine. In
addition, several parallel programming
environments and models have been
designed, such as Code, Faust, Sched-
ule, and Linda.
    H C requires machine-independent
and portable parallel programming lan-
guages and tools. This requirement cre-
ates the need for designing cross-paral-
le1 compilers for all machines in the
environment, and parallel debuggers for
debugging cross-machine code. Several
programming models and environments                                                             1
 have been developed in the past for
heterogeneous computing.R.'J-16
    The Parallel Virtual Machine (PVM)
                                                                      I             Programming environment
                                                                                                                            I
system.16 evolved over the past three        Figure 6. An overview of the Parallel Virtual Machine system.
years, consists of software that provides
 a virtual concurrent computing envi-
 ronment on general-purpose networks         work, presenting a virtual concurrent       in the environment. The inherent con-
 of heterogeneous machines. It is com-       computing environment to users.             currency in a distributed computing
 posed of a set of user-interface primi-                                                 environment, the lack of total ordering
 tives and supporting software that en-         Performance evaluation.Performance       of events on different machines, and the
 able concurrent computing on a loosely      tools are used to summarize the run-        nondeterministic nature of the commu-
 coupled network of high-performance         time behavior of an application, includ-    nication delays between the processes
 machines. It can be implemented on a        ing analyzingresource use and the cause     make the problem of evaluating perfor-
 hardware base consisting of different       of any performance bottleneck. Depend-      mance more complex.
 architectures, including single-CPU sys-    ing on its design, a performance tool can      The impact of the code type must be
 tems, vector machines, and multipro-        describe program behaviors at many          considered. Thus, performance metrics
 cessors (see Figure 6).                     levels of detail. The two most common       such as processor utilization, speedup.
    Application programs view the PVM        are the intraprocess and interprocess       and efficiency are difficult to compute.
 system as a general and flexible parallel   levels. Intraprocess performance tools,     Indeed, these metrics must be carefully
 computing resource that s u p p o r t s     such as the gprof facility on BSD Unix,     defined to make a reasonable perfor-
 shared memory, message-passing, and         the H P sampler/3000, and the Mesa Spy,     mance evaluation.
 hybrid models of computation. A het-        provide information about individual
 erogeneous application can be decom-        processes.
 posed into several subtasks based on           Performance tools for distributed        Image understanding
 the embedded types of computation           computing systems concentrate on the
 and then executed by using PVM sub-         interactions between the processes. In-        Intrinsic parallelism in image process-
 routines on different matching ma-          tegrated performance models that ob-        ing and the variety of heuristics avail-
 chines available on the network. The        serve the status and the performance        able for problems in image understand-
 PVM primitives are provided in the          events at all levels can be found in the    ing make computer vision an ideal
 form of libraries linked to application     PIE (Programming and Instrumenta-           vehicle for studying heterogeneous com-
 programs written in imperative languag-     tion Environment) project.17                puting. From a computational perspec-
 es. They support process initiation and        Designing performance-evaluation         tive, vision processing is usually orga-
 management, message-passing, syn-           tools for distributed computing systems     nized as follows:
 chronization, and other housekeeping        involves collecting, interpreting, and
 facilities.                                 evaluating performance information              Early processing of the raw image
    Support software provided by the         from application programs, the operat-      (often called low-level processing). At
 PVM system executes on a set of user-       ing system, the communication network,      this level, the input is an image. The
 specified computing elements on a net-      and other hardware modules employed         output image is approximately the same

June 1993                                                                                                                       25
size. Convolutions are performed on           understandinglrecognition and symbolic             forms better than any single machine
each pixel in parallel. The data commu-       processing employ complex data struc-              considered. These results support the
nication among the pixels is local to         tures. Many of the proposed algorithms             suitability of a heterogeneous environ-
each pixel.                                   for such problems are nondeterminis-               ment for computer vision applications.
     Interfacing between low-level and        tic, and architectural requirements for
image-understanding problems (often           these problems demand coarse-grained


                                                                                                 H
termed intermediate-level processing).        MIMD machines. Parallel machines such                       eterogeneous computing offers
The operations performed on each data         as the Aspex ASP and Vista13 are well                       new challenges and opportu-
item can be nonlocal. The communica-          suited for this class of problems.                          nities to several research com-
tion is also irregular as compared with                                                          munities. To support this paradigm, the
that of low-level processing.                    Another approach is to build machines           following areas of research must be in-
     Image understanding. By this we          having multiple computational capabil-             vestigated:
mean using the acquired data from the         ities embedded in a single system. These
                                                                                                     Designing tools to identify hetero-
above processing (for example, geomet-        architectures consist of several levels.
                                                                                                     geneous parallelism embedded in
ric features such as shape, orientation,      Typically, the lower levels operate in
                                                                                                     applications.
and moments) t o infer semantic at-           SIMD mode and the higher levels oper-
                                                                                                     Studying issues in high-speed net-
tributes of an image. Processing at this      ate in MIMD mode. In the Image-Un-
                                                                                                     working, including available tech-
level can be classified as knowledge and/     derstanding A r c h i t e ~ t u r e , ’ ~ lowest
                                                                                    the
                                                                                                     nologies and specialized hardware
or symbolic processing. Search-based          level has bit-serial processors, and the
                                                                                                     for networking.
techniques are widely used at this level.     intermediate level consists of digital sig-
                                                                                                     Designing communication protocols
                                              nal processors. The highest level con-
                                                                                                     to reduce the cross-over overheads
   As evident in the preliminary results      sists of general-purpose microproces-
                                                                                                     that occur when different machines
from the 1988 D A R P A Image-Under-          sors operating in MIMD mode.
                                                                                                     communicate in the same environ-
standingBenchmark,18each level in com-
                                                                                                     ment.
puter vision exhibits a different type of            An example vision task. We present
                                                                                                     Developing standards for parallel
parallelism. Therefore, at each level a an example vision task and identify the
                                                                                                     interfaces between various m a -
suitable type of parallel machine must different types of parallelism. We have
                                                                                                     chines.
be employed. Corresponding to each of chosen the D A R P A Integrated Image-
                                                                                                     Designing efficient partitioning and
the above classes of problems, a suit- Understanding Benchmark4 as an ex-
                                                                                                     mapping strategies t o exploit heter-
able class of architecture was p r ~ p o s e d : ~ ample task. The overall task performed
                                                                                                     ogeneous parallelism embedded in
                                                   by this benchmark is the recognition of
                                                                                                     applications.
   * S I M D machines. Machines in this an approximately specified two-and-a-
                                                                                                     Designing user interfaces and user-
class are well suited for computations in half-dimensional “mobile” sculpture in
                                                                                                     friendly programming environments
low-level and in some intermediate-lev- a cluttered environment, given images
                                                                                                     to program diverse machines in the
el computer vision problems because of from intensity and range sensors.
                                                                                                     same environment.
the regular dataflow and iconic opera-               Steps in the benchmark can be identi-
                                                                                                     Developing algorithms for applica-
tions in these two levels. For example, fied by the vision-task classifications.
                                                                                                     tions with heterogeneous comput-
two-dimensional cellular arrays and First, low-level operations such as con-
                                                                                                     ing requirements.
mesh-connected computers have been nected component labeling and corner
proposed for a large class of geometric extraction are performed. Then, group-                      Indeed, H C provides an opportunity
and graph-based problems in image pro- ing the corners (an intermediate-level                    to bring together research from various
cessing. Parallel machines such as the vision operation) results in the extrac-                  disciplines of computer science and en-
MasPar MP-series and the Connection tion of candidate rectangles. Finally,                       gineering to develop a feasible approach
Machine CM-2000 fall in this category. partial matching of the candidate rect-                   for applications in the Grand Challeng-
Pipelined parallel machines (like the angles is followed by confirmed match-                     es problem set. W
Carnegie Mellon University Warp ma- ing (a high-level vision task). The re-
chine) are also well suited for low- and sults obtained on several different
intermediate-level vision computations. parallel machines were reported at the                   Acknowledgments
     Medium-grained M I M D machines. 1988 Image-Understanding Workshop.
Various intermediate- and high-level Details of the benchmark results can be                       We thank RichardFreundand Ashraf Iqbal
vision tasks are computationally inten- found in Weems et al.’*                                  for many helpful discussions. This research
                                                                                                 was partly supported by the National Sci-
sive with irregular dataflow. Moreover,              As they describe, directly interpret-       ence Foundation under Grant No. IRI-
the size of the input is smaller than the ing these results would be unfair, since               9145810.
input image size. Parallel systems hav- there were many undefined factors in
ing a set of powerful processors are the benchmark description. However,
suitable for performing computations the benchmark does give pointers to                         References
in intermediate- and high-level vision how different machines can be classi-
tasks. The Connection Machine CM-5, fied with respect t o their suitability for                  1. R. Freund and D. Conwell, “Supercon-
Vistal2, Alliant FX-80, and Sequent performing operations at different lev-                         currency: A Form of Distributed Heter-
Symmetry 81 are some ekamples.                    els of vision. Overall, the simulation            ogeneous Supercomputing,” Supercorn-
                                                                                                    puting Review, Oct. 1990, pp. 47-50.
    Coarse-grained M I M D machines. results show that the (heterogeneous)
High-level vision tasks such as image Image-Understanding Architecture per-                      2. Newsletter of the IEEE Computer Soci-


26                                                                                                                          COMPUTER
ety Technical Committee on Parallel Pro-       4. C. de Castro and S. Yalamanchili, “Parti-          puter architecture. VLSI computations. and
   cessing (TCPP). Vol. 1, No. I . Oct. 1992.        tioning Signal Flow Graphs for Execu-              computational aspects of image processing.
                                                     tion on Heterogeneous Signal Process-              vision. robotics. and neural networks.
3. V.K. Prasanna Kumar. Parallel Algo-               ing Architectures,” Proc. Workshop o n                Prasanna received the BS degree in elec-
   rithms and Architectures f o r Image Un-          Heterogeneous Processing, IEEE CS                  tronics engineering from Bangalore Univer-
   derstanding. Academic Press, Boston,              Press. Los Alamitos, Calif., Order No.             sity, the MS degree from the School of Auto-
   1991.                                             2702. 1992. pp. 81-86.                             mation. Indian Institute of Science. and the
                                                                                                        PhD in computer science from Pennsylvania
4. C. Weems et al.. “An Integrated Image-         5. J. Potter. “Heterogeneous Associative              State University in 1983. He serves as the
   Understanding Benchmark: Recognition              Computing,” Proc. Workshop on Heter-               symposium chair of the 1994 IEEE Inter-
   of a 2-112D Mobile,” Proc. D A R P A Im-          ogeneous Processing, IEEE CS Press, Los            national Parallel Processing Symposium and
   ageunderstanding Workshop, Morgan                 Alamitos, Calif., Order No. 3532-02,1993.          is a subject area editor of the Journal of
   Kaufmann Publishers. San Mateo, Calif..                                                              Parallel and Distributed Computing, I E E E
   1988. pp. 111-126.                             6. V. Sunderam, “PVM: A Framework for                 Transactions on Computers, and I E E E Trans-
                                                     Parallel Distributed Computing,” Con-              actions o n Signal Processing. He is the found-
5. T. Berg and H.J. Siegel, “Instruction Ex-         currency: Practice and Experience, Vol.            ing chair of the IEEE Computer Society
   ecution Trade-offs for SIMD vs. MIMD              2, No. 4. Dec. 1990, pp. 315-339.                  Technical Committee on Parallel Processing
   vs. Mixed-Mode Parallelism,’’ Proc. Int’l                                                            and is a senior member of the Computer
   Parallel Processing Symp. ( I P P S ) ,IEEE     7. Z. Segall and L. Rudolph, “PIE: A Pro-            Society
   CS Press. Los Alamitos. Calif., Order             gramming and Instrumentation Environ-
   NO. 2167. 1991, pp. 301-308.                      ment for Parallel Processing,” I E E E S o f t -
                                                     ware. Vol. 2, No. 6, Nov. 1985, pp. 22-27.
6. A. Khokhar et al.. “Heterogeneous Su-
   percomputing: Problems and Issues,”            18. C. Weems et al., “Preliminary Results                                   Muhammad E. Shaaban
   Proc. Workshop on Heterogeneous Pro-               from the DARPA Integrated Image-                                        is a PhD candidate in
   cessing, IEEE CS Press, Los Alamitos.              Understanding Benchmark,” Parallel A r -                                the Department of Elec-
   Calif.. Order No. 2702. 1992. pp. 3-12.           chitecturesandAlgorithms f o r Image Un-                                 trical Engineering-Sys-
                                                     dersranding, V.K. Prasanna, ed.. Academ-                                 terns, University of
7. R. Freund. “Optimal Selection Theory              ic Press, Boston, 1991, pp. 399-499.                                     Southern California. His
   for Superconcurrency.” Proc. 89 Super-                                                                                     areas of research include
   computing, IEEE CS Press, Los Alami-           19. D. Shu, J. Nash, and C. Weems, “A Mul-                                  parallel optical inter-
   tos, Calif., Order No. M2021 (microfiche),         tiple-Level Heterogeneous Architecture                                  connection networks,
   1989. pp. 13-17.                                   for Image Understanding,” Proc. Int’l                                   parallel algorithms for
                                                      Con$ Pattern Recognition, IEEE CS                   nage processing, and heterogeneous com-
8. C . Agha and R. Panwar, “An Actor-                 Press, Los Alamitos, Calif., Vol2, Order
   Based Framework for Heterogeneous                                                                    puting.
                                                      No. 2063, 1990.                                      Shaaban received the BS and MS degrees
   Computing Systems.” Proc. Workshop                                                                   in electrical engineering from the University
   on Heterogeneous Processing, IEEE CS                                                                 of Petroleum and Minerals, Dhahran, Saudi
   Press, Los Alamitos, Calif., Order No.                                                               Arabia, in 1984 and 1986, respectively. He
   2702,1992, pp. 35-42.
                                                                                                        recently served as a session chair at the Inter-
                                                                                                        national Parallel Processing Symposium.He
9. S. Chen et al., “A Selection Theory and                                                              is a student member of the Computer Soci-
   Methodology for Heterogeneous Super-                            Ashfaq A . Khokhar is a
   computing,” Proc. W o r k s h o p on Hetero-                    PhD candidate in the                 ety.
   geneous Processing, IEEE CS Press, Los                          Department of Electri-
   Alamitos, Calif., Order No. 3532-02, 1993.                      cal Engineering-Systems
                                                                   at the University of
10. M. Wang et al., “Augmenting the Opti-                          Southern California, Los
    mal Selection Theory for Superconcur-                          Angeles. His areas of re-                                 Cho-Li Wang is a PhD
    rency,” Proc. Workshop on Heteroge-                            search include parallel                                   candidate in the Depart-
    neous Processing, IEEE CS Press, Los                           architectures and scal-                                   ment of Electrical Engi-
    Alamitos, Calif., Order No. 2702, 1992,                        able algorithms, image                                    neering-Systems, Uni-
    pp. 13-22.                                understanding and parallel processing, VLSI                                    versity of Southern
                                              computations, interconnection networks, and                                    California, Los Angeles.
11.M. Iqba1,“PartitioningProblemsforHet- heterogeneous computing.                                                            His areas of research
    erogeneous Computer Systems,” tech.         Khokhar received the BSc degree in elec-                                     include computer archi-
    report, Dept. of Electrical Engineering- trical engineering from the University of                                       tectures and algorithms,
    Systems, Univ. of Southern California, Engineering and Technology, Lahore, Paki-                                         image understanding
    Los Angeles, 1993.                        stan in 1985 and the MS degree in computer                and parallel processing, image compression,
                                              engineeringfrom Syracuse University in 1988.              and heterogeneous computing.
12. E. Arnould et al., “The Design of Nectar: He is a student member of the Computer                       Wang received the BS degree in computer
    A Network Backplane for Heterogeneous Society.                                                      science and information engineering from
    Multicomputers,” Proc. Int’l Conf. A r -                                                            National Taiwan University, Taiwan, in 1985
   chitectural Support f o r Programming                                                                and the MS degree in computer engineering
   Languages and Operating Systems (AS-                                                                 from the University of Southern California
   P L O S I l l ) , IEEE CS Press, Los Alami-                                                          in 1990.
   tos, Calif., Order No. M1936 (microfiche),                            Viktor K. Prasanna
   1989, pp. 205-216.                                                    (V.K.Prasanna Kumar)
                                                                         is an associate professor
13. ANSI X3T9.3, “High-Performance Par-                                  in the Department of
    allel Interface: Hippi-PH, Hippi-SC,Hip-                             Electrical Engmeering-
    pi-FP, Hippi-LE, and Hippi-MI,”Work-                                 Systems, University of            Readers can contact Viktor K. Prasanna at
    ing Draft Proposed American National                                 Southern California, Los       the School of Engineering, Department of
    Standard for Information Systems,Amer-                               Angeles. His research          Electrical Engineering-Systems, University
    ican Nat’l Standards Inst., New York,                                interests include paral-       of Southern California, University Park, Los
    Jan.-Apr.. 1991.                                       *       ’ I   le] computation, com-          Angeles, CA 90089-2562.

June 1993                                                                                                                                           21

More Related Content

What's hot

Performance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using openclPerformance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using opencleSAT Publishing House
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
DumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionDumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionNuno Loureiro
 
Reduce course notes class xi
Reduce course notes class xiReduce course notes class xi
Reduce course notes class xiSyed Zaid Irshad
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computingHeman Pathak
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Lecture 2
Lecture 2Lecture 2
Lecture 2Mr SMAK
 
Lecture 1
Lecture 1Lecture 1
Lecture 1Mr SMAK
 

What's hot (15)

Introduction to parallel computing
Introduction to parallel computingIntroduction to parallel computing
Introduction to parallel computing
 
Performance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using openclPerformance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using opencl
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
 
DumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionDumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage Solution
 
Mimd
MimdMimd
Mimd
 
Power aware compilation
Power aware compilationPower aware compilation
Power aware compilation
 
Reduce course notes class xi
Reduce course notes class xiReduce course notes class xi
Reduce course notes class xi
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
cbs_sips2005
cbs_sips2005cbs_sips2005
cbs_sips2005
 
Lj2419141918
Lj2419141918Lj2419141918
Lj2419141918
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 

Viewers also liked

Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016Istvan Rath
 
Open Access and the Humanities at the California Digital Library and Beyond, ...
Open Access and the Humanities at the California Digital Library and Beyond, ...Open Access and the Humanities at the California Digital Library and Beyond, ...
Open Access and the Humanities at the California Digital Library and Beyond, ...Lisa Schiff
 
Librarians and Research Evaluation Support Brief Survey Results
Librarians and Research Evaluation Support Brief Survey ResultsLibrarians and Research Evaluation Support Brief Survey Results
Librarians and Research Evaluation Support Brief Survey Resultssherif user group
 
High-performance model queries
High-performance model queriesHigh-performance model queries
High-performance model queriesIstvan Rath
 
Okosabb Internet of Things rendszerek komplex eseményfeldolgozás alkalmazásával
Okosabb Internet of Things rendszerek komplex eseményfeldolgozás alkalmazásával Okosabb Internet of Things rendszerek komplex eseményfeldolgozás alkalmazásával
Okosabb Internet of Things rendszerek komplex eseményfeldolgozás alkalmazásával Istvan Rath
 
Exploring the Future of Eclipse Modeling: Web and Semantic Collaboration
Exploring the Future of Eclipse Modeling: Web and Semantic CollaborationExploring the Future of Eclipse Modeling: Web and Semantic Collaboration
Exploring the Future of Eclipse Modeling: Web and Semantic CollaborationIstvan Rath
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformAndrea Bollini
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISAndrea Bollini
 
An Introduction to the Finite Element Method
An Introduction to the Finite Element MethodAn Introduction to the Finite Element Method
An Introduction to the Finite Element MethodMohammad Tawfik
 

Viewers also liked (9)

Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
 
Open Access and the Humanities at the California Digital Library and Beyond, ...
Open Access and the Humanities at the California Digital Library and Beyond, ...Open Access and the Humanities at the California Digital Library and Beyond, ...
Open Access and the Humanities at the California Digital Library and Beyond, ...
 
Librarians and Research Evaluation Support Brief Survey Results
Librarians and Research Evaluation Support Brief Survey ResultsLibrarians and Research Evaluation Support Brief Survey Results
Librarians and Research Evaluation Support Brief Survey Results
 
High-performance model queries
High-performance model queriesHigh-performance model queries
High-performance model queries
 
Okosabb Internet of Things rendszerek komplex eseményfeldolgozás alkalmazásával
Okosabb Internet of Things rendszerek komplex eseményfeldolgozás alkalmazásával Okosabb Internet of Things rendszerek komplex eseményfeldolgozás alkalmazásával
Okosabb Internet of Things rendszerek komplex eseményfeldolgozás alkalmazásával
 
Exploring the Future of Eclipse Modeling: Web and Semantic Collaboration
Exploring the Future of Eclipse Modeling: Web and Semantic CollaborationExploring the Future of Eclipse Modeling: Web and Semantic Collaboration
Exploring the Future of Eclipse Modeling: Web and Semantic Collaboration
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRIS
 
An Introduction to the Finite Element Method
An Introduction to the Finite Element MethodAn Introduction to the Finite Element Method
An Introduction to the Finite Element Method
 

Similar to 93 1

intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxssuser413a98
 
Computing notes
Computing notesComputing notes
Computing notesthenraju24
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computingMehul Patel
 
Advanced virtualization techniques for FAUmachine
Advanced virtualization techniques for FAUmachineAdvanced virtualization techniques for FAUmachine
Advanced virtualization techniques for FAUmachinewebhostingguy
 
MYSQL DATABASE Operating System Part2 (1).pptx
MYSQL DATABASE Operating System Part2 (1).pptxMYSQL DATABASE Operating System Part2 (1).pptx
MYSQL DATABASE Operating System Part2 (1).pptxArjayBalberan1
 
Distributed Operating System
Distributed Operating SystemDistributed Operating System
Distributed Operating SystemSanthiNivas
 
Ch1 OS
Ch1 OSCh1 OS
Ch1 OSC.U
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET Journal
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Sudarshan Mondal
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
 

Similar to 93 1 (20)

50120140503017 2
50120140503017 250120140503017 2
50120140503017 2
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
 
Computing notes
Computing notesComputing notes
Computing notes
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Flynn taxonomies
Flynn taxonomiesFlynn taxonomies
Flynn taxonomies
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 
Chap 1(one) general introduction
Chap 1(one)  general introductionChap 1(one)  general introduction
Chap 1(one) general introduction
 
Advanced virtualization techniques for FAUmachine
Advanced virtualization techniques for FAUmachineAdvanced virtualization techniques for FAUmachine
Advanced virtualization techniques for FAUmachine
 
MYSQL DATABASE Operating System Part2 (1).pptx
MYSQL DATABASE Operating System Part2 (1).pptxMYSQL DATABASE Operating System Part2 (1).pptx
MYSQL DATABASE Operating System Part2 (1).pptx
 
Distributed Operating System
Distributed Operating SystemDistributed Operating System
Distributed Operating System
 
Ch1 OS
Ch1 OSCh1 OS
Ch1 OS
 
OS_Ch1
OS_Ch1OS_Ch1
OS_Ch1
 
OSCh1
OSCh1OSCh1
OSCh1
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
 
Grid Presentation
Grid PresentationGrid Presentation
Grid Presentation
 

Recently uploaded

212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technologyz xss
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarHarsh Kumar
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...Amil Baba Dawood bangali
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfMichael Silva
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Sonam Pathan
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...yordanosyohannes2
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)ECTIJ
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
Current Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxCurrent Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxuzma244191
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithAdamYassin2
 
(办理学位证)美国加州州立大学东湾分校毕业证成绩单原版一比一
(办理学位证)美国加州州立大学东湾分校毕业证成绩单原版一比一(办理学位证)美国加州州立大学东湾分校毕业证成绩单原版一比一
(办理学位证)美国加州州立大学东湾分校毕业证成绩单原版一比一S SDS
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfMichael Silva
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojnaDharmendra Kumar
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Commonwealth
 
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...Amil Baba Dawood bangali
 
(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)twfkn8xj
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Sonam Pathan
 
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》rnrncn29
 

Recently uploaded (20)

212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh Kumar
 
Q1 2024 Newsletter | Financial Synergies Wealth Advisors
Q1 2024 Newsletter | Financial Synergies Wealth AdvisorsQ1 2024 Newsletter | Financial Synergies Wealth Advisors
Q1 2024 Newsletter | Financial Synergies Wealth Advisors
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdf
 
Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713Call Girls Near Me WhatsApp:+91-9833363713
Call Girls Near Me WhatsApp:+91-9833363713
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
Current Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptxCurrent Economic situation of Pakistan .pptx
Current Economic situation of Pakistan .pptx
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam Smith
 
(办理学位证)美国加州州立大学东湾分校毕业证成绩单原版一比一
(办理学位证)美国加州州立大学东湾分校毕业证成绩单原版一比一(办理学位证)美国加州州立大学东湾分校毕业证成绩单原版一比一
(办理学位证)美国加州州立大学东湾分校毕业证成绩单原版一比一
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdf
 
PMFBY , Pradhan Mantri Fasal bima yojna
PMFBY , Pradhan Mantri  Fasal bima yojnaPMFBY , Pradhan Mantri  Fasal bima yojna
PMFBY , Pradhan Mantri Fasal bima yojna
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]
 
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
NO1 Certified Ilam kala Jadu Specialist Expert In Bahawalpur, Sargodha, Sialk...
 
(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)(中央兰开夏大学毕业证学位证成绩单-案例)
(中央兰开夏大学毕业证学位证成绩单-案例)
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
 
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
 

93 1

  • 1. Heterogeneous Computing: Challenges and Opportunities Ashfaq A. Khokhar, Viktor K. Prasanna, Muhammad E. Shaaban, and Cho-Li Wang University of Southern California omogeneous computing, which uses one or more machines of the same type, has provided adequate performance for many applications in the past. Many of these applications had more than one type of embedded parallelism, such as single instruction, multiple data (SIMD) and multiple instruc- tion, multiple data (MIMD). Most of the current parallel machines are suited only for homogeneous computing. However, numerous applications that have more than one type of embedded parallelism are now being considered for parallel implementation. On the other hand, as the amount of homogeneous parallelism in applications decreases, homogeneous systems cannot offer the desired speedups. T o exploit the heterogeneity in computations, researchers are investigating a suite of heterogeneous architectures. Anytime you work with Heterogeneous computing (HC) is the well-orchestrated and coordinated effec- tive use of a suite of diverse high-performance machines (including parallel oranges and apples, machines) to provide superspeed processing for computationally demanding tasks you'll need a number of with diverse computing needs.' An H C system includes heterogeneous machines, high-speed networks, interfaces, operating systems, communication protocols, schemes to organize and programming environments, all combining to produce a positive impact on ease of use and performance. Figure 1 shows an example H C environment. total performance. This Heterogeneous computing should be distinguished from network computing or article surveys the high-performance distributed computing, which have generally come to mean either clusters of workstations or ad hoc connectivity among computers using little challenges posed by more than opportunistic load-balancing. H C is a plausible, novel technique for heterogeneous solving computationally intensive problems that have several types of embedded parallelism. H C also helps to reduce design risks by incorporating proven technol- computing and ogy and existing designs instead of developing them from scratch. However, several issues and problems arise from employing this technique, which we discuss. discusses some In the past few years, several technical meetings have addressed many of these approaches to opening issues. There is also a growing interest in using this paradigm to solve Grand Challenges problems. Richard Freund has organized the Heterogeneous Process- up its opportunities. ing Workshops held each year at the I E E E International Parallel Processing 18 0018-9162/93/0600-0018$03.00 Q 1993 IEEE COMPUTER
  • 2. Glossary Symposiums.’ Another related yearly meeting is the IEEE International Sym- Analytical benchmarking: A procedure to analyze the relative effectiveness posium on High-Performance Distrib- of machines on various computational types. uted Computing.’ Code-type profiling: A code-specific function to identify various types of par- allelism present in code and to estimate the execution times of each code type. Heterogeneous systems Cross-machine debuggers: Those available within the heterogeneous com- puting environment to help debug the application code that executes over multi- The quest for higher computational ple machines. power suitable for a wide range of ap- Cross-over overhead: That incurred in transferring data from one machine plications at a reasonable cost has ex- to another. It also includes data-format-conversion overhead between the two posed several inherent limitations of machines. homogeneous systems. Replacing such Cross-parallel compiler: An intelligent compiler that can generate intermedi- systems with yet more powerful homo- ate code executable on different parallel machines. geneous systems is not feasible. More- Heterogeneous computing (HC): A well-orchestrated, coordinated effective over. this approach does not improve use of a suite of diverse high-performance machines (including parallel ma- the versatility of the system. H C offers chines) to provide fast processing for computationally demanding tasks that a novel cost-effective approach to these have diverse computing needs. problems; instead of replacing existing multiprocessor systems at high cost, HC Metacomputations: Computations exhibiting coarse-grained heterogeneity proposes using existing systems in an in terms of embedded parallelism. integrated environment. Mixed-mode computations: Computations exhibiting fine-grained heteroge- neity in terms of embedded parallelism. Limitationsof homogeneous systems. Multiple instruction,multiple data (MIMD): A mode in which code stored in Conventional homogeneous systems each processor’s local memory is executed independently. usually use one mode of parallelism in a given machine (like SIMD, MIMD, or Single instruction, multiple data (SIMD): A mode in which all processors vector processing) and thus cannot ad- execute the same instruction synchronously on data stored in their local equately meet the requirements of ap- memory. plications that require more than one MasPar MP-2 User workstations Cray Y-MP Connection Machine CM-5 Massively Parallel Processor (MPP) Image-UnderstandingArchitecture (IUA) I Figure 1. An example heterogeneous computing environment. June 1993 19
  • 3. of heterogeneous machines (so that each Special portion of the code is executed on its Vector MIMD SlMD ~urnose matching machine type) is likely t o /25 2 5 / y T o t a l time = 100 units achieve speedups. Figure 2 illustrates a possible scenario (the numbers are exe- cution times in terms of basic units). Heterogeneous computing. Hetero- geneity in computing systems is not an entirely new concept. Several types of tal time = 50 units special-purpose processors have been Communication time used to provide specific services for improving system throughput. One of the most common is I/O handling. At- taching floating-point processors t o host 1 1 Total time = 4 units + communication overhead computers is yet another heterogeneous approach t o enhance system perfor- Figure 2. Execution of example code using various systems. mance. In high-performance comput- ers, the concept of heterogeneity mani- fests itself at the instruction level in the type of parallelism. As a result, any the code is executed rapidly, while oth- form of several types of functional units, single type of machine often spends its er portions of the code still have rela- such as vector arithmetic pipelines and time executing code for which it is poor- tively higher execution times. Similarly, fast scalar processors. However, cur- ly suited. Moreover, many applications the same code when executed on a suite rent multiprocessor systems remain need to process information at more than one level concurrently, with differ- ent types of parallelism at each level. Image understanding, a Grand Chal- lenges problem, is one such applica- tion.' At the lowest level of computer vi- sion, image-processing operations are of machines in Algorithm design applied t o the raw image. These compu- HC environments tations have a massive SIMD-type par- t allelism. In contrast, the participants in t h e D A R P A Image-Understanding I Partitioning and mapping I Benchmark exercises' observed that high-level image-understanding compu- tations exhibit coarse-grained MIMD- type characteristics. For such appli- cations, users of a conventional multi- processor system must either settle for degraded performance on the existing hardware or acquire more powerful (and expensive) machines. Each type of homogeneous system suffers from inherent limitations. For example, vector machines employ in- terleaved memory with apipelined arith- metic logic unit, leading t o performance in high million floating-point operations per second (Mflops). If the data distri- bution of an application and the result- ing computations cannot exploit these features, the performance degrades se- verely. Consider an application code having mixed types of embedded parallelism. Assume that the code when executed on a serial machine spends 100 units of I Proarammina environment I time. When this code is executed on a vector machine, the vector portion of Figure 3. User-directed approach. 20 COMPUTER
  • 4. mostly homogeneous as far as the type complete redesign. Since H C comprises parallel code of the application is taken of parallelism supported by them. Such several autonomous computers, overall as input. To run this code in an H C systems have been traditionally classi- system fault tolerance and longevity are environment, users must profile the types fied according to the number of instruc- likely to improve. of heterogeneous parallelism embed- tion and data streams. ded in the code. For this purpose, code- An H C environment must contain type profilers need to be designed. Fig- the following components: Issues ures 3 and 4 illustrate these approaches. However, both approaches need strate- a set of heterogeneous machines, We consider two approaches to using gies for partitioning, mapping, schedul- an intelligent high-speed network the H C paradigm. The first one analyz- ing, and synchronization. New tools and connecting all machines, and es an application to explore embedded metrics for performance evaluation are a (user-friendly) programming en- heterogeneous parallelism. Research- also required. Parallel programming en- vironment. ers must devise new algorithms or mod- vironments are needed to orchestrate ify existing ones to exploit the hetero- the effective use of the computing re- H C lets a given system be adapted to a geneity present in the application. Based sources. wide range of applications by augment- on these algorithms, users develop the ing it with specific functional or perfor- code to be executed by the machines. Algorithm design. Heterogeneous mance capabilities without requiring a In the second approach, an existing computing opens new opportunities for developing parallel algorithms. In this section, we identify the efforts needed to devise suitable algorithms. The fol- lowing issues must be considered by the designer: (1) the types of machines available I Code analysis I and their inherent computingchar- acteristics, (2) alternate solutions t o various Vector J- MIMD SIMD J- SP subproblems of the application, and (3) the costs of performing the com- munication over the network. c Computations in H C can be classified into two I I Metacomputing. Computations in this class fall into the category of coarse- grained heterogeneity. Instructions be- longing to a particular class of parallel- ism are grouped to form a module; each module is then executed on a suitable parallel machine. Metacomputing re- fers to heterogeneity at the module lev- el. Mixed-modecomputing. In this fine- grained heterogeneity, almost every al- ternate parallel instruction belongs to a different class of parallel computation. Programs exhibiting this type of heter- ogeneity are not suitable for execution on a suite of heterogeneous machines because the communication overhead due to frequent exchange of informa- tion between machines can become a bottleneck. However, these programs can be executed efficiently on a single m machine such as PASM (Partitionable SIMD/MIMD) which incorporates het- I Programming environment I erogeneous modes of computation. Mixed-mode computing refers to heter- Figure 4. Compiler-directed approach. ogeneity at the instruction level. June 1993 21
  • 5. Mixed-mode machines can achieve show that SIMD machines are well suit- common goal of the mapping process is large speedups for fine-grained hetero- ed for operations such as matrix compu- to accomplish these assignments such geneity by using the mixed-mode pro- tations and low-level image processing. that the overall runtime of the task is cessing available in a single machine. A MIMD machines. on the other hand, minimized. mixed-mode machine, for example. can are most efficient when an application Chen et a1,"proposed a heuristic map- use its mode-switching capability to can be partitioned into a number of ping methodology based on the Clus- support SIMDiMIMD parallelism and tasks that have limited intercommuni- ter-M mdoel, which facilitates the de- hardware-barrier synchronization, thus cation. Note that analytical benchmark sign of portable software. Only one improving its performance over a ma- results are used in partitioning and map- algorithm is required for a given appli- chine operatingin SIMD or MIMD mode ping. cation, regardless of the underlying ar- only. chitecture. Various types of parallelism Partitioning and mapping. Problems present in the application are identi- Code-type profiling. Fast parallel ex- that occur in these areas of a homoge- fied. In addition, all communication ecution of the code in a heterogeneous neous parallel environment have been and computation requirements of the computing environment requires iden- widely studied. The partitioning prob- application are preserved in an inter- tifying and profiling the embedded par- lem can be divided into two subprob- mediate specification of the code. The allelism. Traditional program profiling lems. Parallelism detection determines architecture of each machine in the en- involves testing a program assumed to the parallelism present in a given pro- vironment is modeled in the system rep- consist of several modules by executing gram. Clustering combines several op- resentation, which captures the inter- it on suitable test data. The prqfiler erations into a program module and connections of the architecture. The four monitors the execution of the program thus partitions the application into sev- components of this approach are and gathers statistics, including the ex- eral modules. These two subproblems ecution time of each program module. can be handled by the user, the compil- an intermediate model to provide This information is then used t o modify er, or the machine at runtime. an architecture-independent algorithm the modules to improve the overall ex- In HC, parallelism detection is not specification of the application, ecution time. the only objective; code classification languages to support the specifica- In HC. profiling is done not only t o based on the type of parallelism is also tion in the intermediate model (such estimate the code's execution time on required. This is accomplished by code- languages should be machine-indepen- a particular machine but also t o analyze type profiling, which also poses addi- dent and allow a certain amount of ab- the code's type. This is achieved by tional constraints o n clustering. straction of the computations), code-type profiling. As introduced by Mapping (allocating) program mod- a tool that lets users specify topolo- Freund.' this code-specific function is ules to processors has been addressed gies of the machines employed in the an off-line procedure: the statistics to by many researchers. Informally, in H C environment, and be gathered include the types of paral- homogeneous environments, the map- amappingmodule tomatch theprob- lelism of various modules in the code ping problem can be defined as assign- lem specification and the system repre- and the estimated execution time of ing program modules to processors so sentation. each module on the machines available that the total execution time (including in the environment. Code types that can the communication costs) is minimized. Figure 5 illustrates this methodology. be identified include vectorizable, Several other costs, such as the interfer- SIMDiMIMD parallel, scalar, and spe- ence cost, have also been considered. In Machine selection. An interesting cial purpose (such as fast Fourier trans- HC, however, other objectives, such as problem appears in the design of H C form). matching the code type to the machine environments: How can one find the type, result in additional constraints. If most appropriate suite of heterogeneous Analytical benchmarking. This test such a mapping has to be performed at machines for a given collection of appli- measures how well the available ma- runtime for load-balancingpurposes (or cation tasks subject to a given constraint. chines perform on a given code type.- due to machine failure), the mapping such as cost a n d execution time? While code-type profiling identifies the problem becomes more complex due to Freund' has proposed the Optimal Se- type of code. analytical benchmarking the overhead associated with the code lection Theory (OST) t o choose an op- ranks the available machinesin terms of and data-format conversions. Various timal configuration of machines for ex- their efficiency in executing a given code approaches to optimal and approximate ecuting an application task on a type. Thus. analytical benchmarking partitioning and mapping in H C have heterogeneous suite of computers with techniques permit researchers to deter- been studied.X-l" the assumption that the number of ma- mine the relative effectiveness of a giv- Mapping in H C can be performed chines available is unlimited. It is also en parallel machine on various types of conceptually at two levels: system (or assumed that machines matching the computation. macro) and machine (or micro). A t the given set of code types are available and This benchmarking is also an off-line system-level mapping, each module is that the application code is decomposed process and is more rigorous than previ- assigned to one or more machines in the into equal-sized modules. ous benchmarking techniques, which system so that the parallelism embed- Wang et al.'s Augmented Optimal simply looked at the overall result of ded in the module matches the machine Selection Theory (A0ST)l"incorporates running an entire benchmark code on a type. Machine-level mapping assigns the performance of code segments on processor. Some experimental results portions of the module to individual nonoptimal machine choices, assuming obtained by analytical benchmarking processors in the machine. The most that the number of available machines 22 COMPUTER
  • 6. Heterogeneousarchitecture for each code type is limited. In this approach, the program module most suitable for one type of machine is as- signed to another type of machine. In the formulation of OST and AOST, it has been assumed that the execution of all program modules of a given applica- tion code is totally ordered in time. In reality, however, different execution interdependencies can exist among pro- gram modules. Also, parallelism can be present inside a module, resulting in further decomposition of program mod- ules. Furthermore, the effect of differ- ent mappings on different machines available for a program module has not been considered in the formulation of these selection theories. Problem-specification tool The Heterogeneous Optimal Selec- tion Theory (H0ST)'extends AOST in two ways. It incorporates the effect of various mapping techniques available on different machines for executing a program module. Also, the dependen- Figure 5 Cluster-M-basedheuristic mapping methodology. . cies between the program modules are specified as a directed graph. Note that OST and AOST assume linear ordering tion to the dual of the above problem, such as FIFO, round-robin, shortest- of program modules. In the formulation that is. finding a least expensive set of job-first, and shortest-remaining-time, of HOST, an application code is as- machines to solve a given application can be employed at each level of sched- sumed to consist of subtasks to be exe- subject to a maximal execution time uling. cuted serially. Each subtask contains a constraint. This scheme is applicable to While all three levels of scheduling collection of program modules. Each all of the above selection theories. The can reside in each machine in an HC program module is further decomposed accuracy of the scheme, however, de- environment, a fourth level is needed to into blocks of parallel instructions, called pends upon the method used to assign perform with scheduling at the system code blocks. the program modules to the machines. level. This scheduler maintains a bal- To find an optimal set of machines, Iqbal also shows that for applications in anced system-wide workload by moni- we have to assign the program modules which the program modules communi- toring the progress of all program mod- to the machines so that cate in a restrictive manner, one can ules. In addition, the scheduler needs to find exact algorithms for selecting an know the different module types and optimal set of machines. If, however, available machine types in the environ- the program modules communicate in ment, since modules may have to be is minimal. while an arbitrary fashion, the selection prob- reassigned when the system configura- lem is NP-complete. tion changes or overload situations oc- zc 5 c,,, ' cur. Communication bottlenecks and Scheduling. In homogeneous environ- queueing delays incurred due to the where P i s the time to execute program ments, a scheduler assigns each pro- heterogeneity of the hardware add con- module i, C' is the cost of the machine gram module to a processor to achieve straints on the scheduler. on which program module i is to be desired performance in terms of pro- executed, and C,,, is an overall con- cessor utilization and throughput. De- Synchronization. This process pro- straint on the cost of the machines. The signers usually employ three schedul- vides mechanisms to control execution cost c and execution time 71 corre- ' ing levels. High-level scheduling, also sequencing and to supervise interpro- sponding to the assignment under con- called job scheduling, selects a subset of cess cooperation. It refers to three dis- sideration can be obtained by usingcode- all submitted jobs competing for the tinct but related problems: type profiling and/or by analyzing the available resources. Intermediate-level algorithms. scheduling responds to short-term fluc- synchronization between the send- Iqbal" presented a selection scheme tuations in the system load by tempo- er and receiver of a message, that finds an assignment of program rarily suspending and activating pro- .specification and control of the modules to machines in H C so that the cesses t o achieve smooth system shared activities of cooperating pro- total processing time is minimized, while operation. Low-level scheduling de- cesses, and the total cost of machines employed in termines the next ready process to be serialization of concurrent accesses the solution does not exceed an upper assigned to a processor for a certain to shared objects by multiple pro- bound. The scheme can also find a solu- duration. Different scheduling policies, cesses. June 1993 23
  • 7. A variety of synchronization meth- the topology, reliability, speed, and length, a bandwidth on the order of 1 ods have been proposed in the past: bandwidth of the network, in addition gigabitlsecond is required t o match the semaphores, conditional critical regions, t o the types and number of machines in computation and communication speeds. monitors, and pass expressions, among the environment. However, reducing Even if higher bandwidth networks others. In addition, some multiproces- synchronization overhead is important were available, three main sources of sors include hardware synchronization t o achieving large speedups in HC. Due inefficiency would persist in current net- primitives. In general, synchronization t o the possibility of several concurrent- works. First, application interfaces in- can be implemented by using shared ly operating autonomous machines in cur excessive overhead due to context variables or by message-passing. the environment, application-code per- switching and data copying between the In heterogeneous computing, the syn- formance in H C is more sensitive t o user process and the machine’s operat- chronization problem resembles that of synchronization overheads. Frequent ing system. Second, each machine must distributed systems. I n both cases, a hand-shaking for synchronization may incur the overheadof executing the high- global clock and shared memory are expend most of the available network level protocols that ensure reliable com- absent. and (unpredictable) network bandwidth. munication between program modules. delays and a variety of operating sys- Also, the networkinterface burdens the tems and programming environments Interconnection requirements. Cur- machine with interrupt handling and complicate the process. rent local area networks (LANs) are header processing for each packet. This Several techniques used in distribut- not suitable for H C because higher band- suggests incorporating additional net- ed systems are again useful for solving width and lower latency networks are work-interface hardware in each ma- H C synchronization problems. Two needed. The bandwidth of commercial- chine. approaches are available: centralized ly available LANs is limited to about 10 Nectar’* is an example of a network (one machine is designated as a control megabits per second. On the other hand, backplane for heterogeneous multicom- node) and distributed (decision-mak- in HC, assuming machines operating at puters. It consists of a high-speed fiber- ing is distributed across the entire sys- 40 megahertz and 20 million instruc- optic network, large crossbar switches, tem). The correct choice depends on tions per second with a 32-bit word and powerful network-interface proces- sors. Protocol processing is off-loaded to these interface processors. A net- working standard called Hippi (ANSI Some academic sites X3T9.3 High-Performance Parallel In- terface)’? is being implemented for re- A number of academic sites are developing HC environments and applica- alizing heterogeneous computing envi- tions (this list is not exhaustive). ronments at various research sites. Hippi is an open standard that defines the Systems and architectures physical and logical link layers of a 100- Mbytelsecond network. Distributed High-speed Computing (DHSC) project at Pittsburgh Supercom- In HC, hardware modules from vari- puting Center, University of Pittsburgh ous vendors share physical intercon- Image-Understanding Architecture, University of Massachusetts at Amherst nections. Differing communication pro- Mentat, University of Virginia tocols may make network-management problems complex. The following gen- Nectar-Based Heterogeneous System, Carnegie Mellon University eral approaches for dealing with net- Northeast Parallel Architecture Center (NPAC), Syracuse University work heterogeneity have been discussed Partitionable SIMD/MIMD (PASM), Purdue University in the literature: (1) treat the heterogeneous network Institutes and departments as apartitionednetwork,witheach Beckman Institute, University of Illinois at Urbana-Champaign partition employing a uniform set of protocols; Department of Biological Sciences, University of California at Los Angeles (2) have a single “visible” network Department of Computer Science, Kent State University management console; and Department of Computer Science, University of California at San Diego (3) integrate the heterogeneousman- agement functions at a single Department of Computer and Information Sciences, New Jersey Institute of management console. Technology The I E E E Computer Society Techni- Department of Electrical Engineering-Systems, University of Southern Cali- cal Committee on Parallel Processing, fornia the Technical Committee on Mass Stor- Department of Math and Computer Science, Emory University age, and several research sites are work- Minnesota Supercomputer Center (MSC), University of Minnesota at Minne- ing together to define interface stan- apolis dards. Supercomputer Computations Institute (SCI), Florida State University Programming environments. A par- allel programming environment includes 24 COMPUTER
  • 8. parallel languages, intelligent compil- ers, parallel debuggers, syntax-directed editors. configuration-management I I tools, and other programming aids. In homogeneous computing, intelli- gent compilers detect parallelism in sequential code and translate it into parallel machine code. Parallel program- ming languages have been developed to support parallel programming, such as MPL for MasPar machines, and Lisp and C for the Connection Machine. In addition, several parallel programming environments and models have been designed, such as Code, Faust, Sched- ule, and Linda. H C requires machine-independent and portable parallel programming lan- guages and tools. This requirement cre- ates the need for designing cross-paral- le1 compilers for all machines in the environment, and parallel debuggers for debugging cross-machine code. Several programming models and environments 1 have been developed in the past for heterogeneous computing.R.'J-16 The Parallel Virtual Machine (PVM) I Programming environment I system.16 evolved over the past three Figure 6. An overview of the Parallel Virtual Machine system. years, consists of software that provides a virtual concurrent computing envi- ronment on general-purpose networks work, presenting a virtual concurrent in the environment. The inherent con- of heterogeneous machines. It is com- computing environment to users. currency in a distributed computing posed of a set of user-interface primi- environment, the lack of total ordering tives and supporting software that en- Performance evaluation.Performance of events on different machines, and the able concurrent computing on a loosely tools are used to summarize the run- nondeterministic nature of the commu- coupled network of high-performance time behavior of an application, includ- nication delays between the processes machines. It can be implemented on a ing analyzingresource use and the cause make the problem of evaluating perfor- hardware base consisting of different of any performance bottleneck. Depend- mance more complex. architectures, including single-CPU sys- ing on its design, a performance tool can The impact of the code type must be tems, vector machines, and multipro- describe program behaviors at many considered. Thus, performance metrics cessors (see Figure 6). levels of detail. The two most common such as processor utilization, speedup. Application programs view the PVM are the intraprocess and interprocess and efficiency are difficult to compute. system as a general and flexible parallel levels. Intraprocess performance tools, Indeed, these metrics must be carefully computing resource that s u p p o r t s such as the gprof facility on BSD Unix, defined to make a reasonable perfor- shared memory, message-passing, and the H P sampler/3000, and the Mesa Spy, mance evaluation. hybrid models of computation. A het- provide information about individual erogeneous application can be decom- processes. posed into several subtasks based on Performance tools for distributed Image understanding the embedded types of computation computing systems concentrate on the and then executed by using PVM sub- interactions between the processes. In- Intrinsic parallelism in image process- routines on different matching ma- tegrated performance models that ob- ing and the variety of heuristics avail- chines available on the network. The serve the status and the performance able for problems in image understand- PVM primitives are provided in the events at all levels can be found in the ing make computer vision an ideal form of libraries linked to application PIE (Programming and Instrumenta- vehicle for studying heterogeneous com- programs written in imperative languag- tion Environment) project.17 puting. From a computational perspec- es. They support process initiation and Designing performance-evaluation tive, vision processing is usually orga- management, message-passing, syn- tools for distributed computing systems nized as follows: chronization, and other housekeeping involves collecting, interpreting, and facilities. evaluating performance information Early processing of the raw image Support software provided by the from application programs, the operat- (often called low-level processing). At PVM system executes on a set of user- ing system, the communication network, this level, the input is an image. The specified computing elements on a net- and other hardware modules employed output image is approximately the same June 1993 25
  • 9. size. Convolutions are performed on understandinglrecognition and symbolic forms better than any single machine each pixel in parallel. The data commu- processing employ complex data struc- considered. These results support the nication among the pixels is local to tures. Many of the proposed algorithms suitability of a heterogeneous environ- each pixel. for such problems are nondeterminis- ment for computer vision applications. Interfacing between low-level and tic, and architectural requirements for image-understanding problems (often these problems demand coarse-grained H termed intermediate-level processing). MIMD machines. Parallel machines such eterogeneous computing offers The operations performed on each data as the Aspex ASP and Vista13 are well new challenges and opportu- item can be nonlocal. The communica- suited for this class of problems. nities to several research com- tion is also irregular as compared with munities. To support this paradigm, the that of low-level processing. Another approach is to build machines following areas of research must be in- Image understanding. By this we having multiple computational capabil- vestigated: mean using the acquired data from the ities embedded in a single system. These Designing tools to identify hetero- above processing (for example, geomet- architectures consist of several levels. geneous parallelism embedded in ric features such as shape, orientation, Typically, the lower levels operate in applications. and moments) t o infer semantic at- SIMD mode and the higher levels oper- Studying issues in high-speed net- tributes of an image. Processing at this ate in MIMD mode. In the Image-Un- working, including available tech- level can be classified as knowledge and/ derstanding A r c h i t e ~ t u r e , ’ ~ lowest the nologies and specialized hardware or symbolic processing. Search-based level has bit-serial processors, and the for networking. techniques are widely used at this level. intermediate level consists of digital sig- Designing communication protocols nal processors. The highest level con- to reduce the cross-over overheads As evident in the preliminary results sists of general-purpose microproces- that occur when different machines from the 1988 D A R P A Image-Under- sors operating in MIMD mode. communicate in the same environ- standingBenchmark,18each level in com- ment. puter vision exhibits a different type of An example vision task. We present Developing standards for parallel parallelism. Therefore, at each level a an example vision task and identify the interfaces between various m a - suitable type of parallel machine must different types of parallelism. We have chines. be employed. Corresponding to each of chosen the D A R P A Integrated Image- Designing efficient partitioning and the above classes of problems, a suit- Understanding Benchmark4 as an ex- mapping strategies t o exploit heter- able class of architecture was p r ~ p o s e d : ~ ample task. The overall task performed ogeneous parallelism embedded in by this benchmark is the recognition of applications. * S I M D machines. Machines in this an approximately specified two-and-a- Designing user interfaces and user- class are well suited for computations in half-dimensional “mobile” sculpture in friendly programming environments low-level and in some intermediate-lev- a cluttered environment, given images to program diverse machines in the el computer vision problems because of from intensity and range sensors. same environment. the regular dataflow and iconic opera- Steps in the benchmark can be identi- Developing algorithms for applica- tions in these two levels. For example, fied by the vision-task classifications. tions with heterogeneous comput- two-dimensional cellular arrays and First, low-level operations such as con- ing requirements. mesh-connected computers have been nected component labeling and corner proposed for a large class of geometric extraction are performed. Then, group- Indeed, H C provides an opportunity and graph-based problems in image pro- ing the corners (an intermediate-level to bring together research from various cessing. Parallel machines such as the vision operation) results in the extrac- disciplines of computer science and en- MasPar MP-series and the Connection tion of candidate rectangles. Finally, gineering to develop a feasible approach Machine CM-2000 fall in this category. partial matching of the candidate rect- for applications in the Grand Challeng- Pipelined parallel machines (like the angles is followed by confirmed match- es problem set. W Carnegie Mellon University Warp ma- ing (a high-level vision task). The re- chine) are also well suited for low- and sults obtained on several different intermediate-level vision computations. parallel machines were reported at the Acknowledgments Medium-grained M I M D machines. 1988 Image-Understanding Workshop. Various intermediate- and high-level Details of the benchmark results can be We thank RichardFreundand Ashraf Iqbal vision tasks are computationally inten- found in Weems et al.’* for many helpful discussions. This research was partly supported by the National Sci- sive with irregular dataflow. Moreover, As they describe, directly interpret- ence Foundation under Grant No. IRI- the size of the input is smaller than the ing these results would be unfair, since 9145810. input image size. Parallel systems hav- there were many undefined factors in ing a set of powerful processors are the benchmark description. However, suitable for performing computations the benchmark does give pointers to References in intermediate- and high-level vision how different machines can be classi- tasks. The Connection Machine CM-5, fied with respect t o their suitability for 1. R. Freund and D. Conwell, “Supercon- Vistal2, Alliant FX-80, and Sequent performing operations at different lev- currency: A Form of Distributed Heter- Symmetry 81 are some ekamples. els of vision. Overall, the simulation ogeneous Supercomputing,” Supercorn- puting Review, Oct. 1990, pp. 47-50. Coarse-grained M I M D machines. results show that the (heterogeneous) High-level vision tasks such as image Image-Understanding Architecture per- 2. Newsletter of the IEEE Computer Soci- 26 COMPUTER
  • 10. ety Technical Committee on Parallel Pro- 4. C. de Castro and S. Yalamanchili, “Parti- puter architecture. VLSI computations. and cessing (TCPP). Vol. 1, No. I . Oct. 1992. tioning Signal Flow Graphs for Execu- computational aspects of image processing. tion on Heterogeneous Signal Process- vision. robotics. and neural networks. 3. V.K. Prasanna Kumar. Parallel Algo- ing Architectures,” Proc. Workshop o n Prasanna received the BS degree in elec- rithms and Architectures f o r Image Un- Heterogeneous Processing, IEEE CS tronics engineering from Bangalore Univer- derstanding. Academic Press, Boston, Press. Los Alamitos, Calif., Order No. sity, the MS degree from the School of Auto- 1991. 2702. 1992. pp. 81-86. mation. Indian Institute of Science. and the PhD in computer science from Pennsylvania 4. C. Weems et al.. “An Integrated Image- 5. J. Potter. “Heterogeneous Associative State University in 1983. He serves as the Understanding Benchmark: Recognition Computing,” Proc. Workshop on Heter- symposium chair of the 1994 IEEE Inter- of a 2-112D Mobile,” Proc. D A R P A Im- ogeneous Processing, IEEE CS Press, Los national Parallel Processing Symposium and ageunderstanding Workshop, Morgan Alamitos, Calif., Order No. 3532-02,1993. is a subject area editor of the Journal of Kaufmann Publishers. San Mateo, Calif.. Parallel and Distributed Computing, I E E E 1988. pp. 111-126. 6. V. Sunderam, “PVM: A Framework for Transactions on Computers, and I E E E Trans- Parallel Distributed Computing,” Con- actions o n Signal Processing. He is the found- 5. T. Berg and H.J. Siegel, “Instruction Ex- currency: Practice and Experience, Vol. ing chair of the IEEE Computer Society ecution Trade-offs for SIMD vs. MIMD 2, No. 4. Dec. 1990, pp. 315-339. Technical Committee on Parallel Processing vs. Mixed-Mode Parallelism,’’ Proc. Int’l and is a senior member of the Computer Parallel Processing Symp. ( I P P S ) ,IEEE 7. Z. Segall and L. Rudolph, “PIE: A Pro- Society CS Press. Los Alamitos. Calif., Order gramming and Instrumentation Environ- NO. 2167. 1991, pp. 301-308. ment for Parallel Processing,” I E E E S o f t - ware. Vol. 2, No. 6, Nov. 1985, pp. 22-27. 6. A. Khokhar et al.. “Heterogeneous Su- percomputing: Problems and Issues,” 18. C. Weems et al., “Preliminary Results Muhammad E. Shaaban Proc. Workshop on Heterogeneous Pro- from the DARPA Integrated Image- is a PhD candidate in cessing, IEEE CS Press, Los Alamitos. Understanding Benchmark,” Parallel A r - the Department of Elec- Calif.. Order No. 2702. 1992. pp. 3-12. chitecturesandAlgorithms f o r Image Un- trical Engineering-Sys- dersranding, V.K. Prasanna, ed.. Academ- terns, University of 7. R. Freund. “Optimal Selection Theory ic Press, Boston, 1991, pp. 399-499. Southern California. His for Superconcurrency.” Proc. 89 Super- areas of research include computing, IEEE CS Press, Los Alami- 19. D. Shu, J. Nash, and C. Weems, “A Mul- parallel optical inter- tos, Calif., Order No. M2021 (microfiche), tiple-Level Heterogeneous Architecture connection networks, 1989. pp. 13-17. for Image Understanding,” Proc. Int’l parallel algorithms for Con$ Pattern Recognition, IEEE CS nage processing, and heterogeneous com- 8. C . Agha and R. Panwar, “An Actor- Press, Los Alamitos, Calif., Vol2, Order Based Framework for Heterogeneous puting. No. 2063, 1990. Shaaban received the BS and MS degrees Computing Systems.” Proc. Workshop in electrical engineering from the University on Heterogeneous Processing, IEEE CS of Petroleum and Minerals, Dhahran, Saudi Press, Los Alamitos, Calif., Order No. Arabia, in 1984 and 1986, respectively. He 2702,1992, pp. 35-42. recently served as a session chair at the Inter- national Parallel Processing Symposium.He 9. S. Chen et al., “A Selection Theory and is a student member of the Computer Soci- Methodology for Heterogeneous Super- Ashfaq A . Khokhar is a computing,” Proc. W o r k s h o p on Hetero- PhD candidate in the ety. geneous Processing, IEEE CS Press, Los Department of Electri- Alamitos, Calif., Order No. 3532-02, 1993. cal Engineering-Systems at the University of 10. M. Wang et al., “Augmenting the Opti- Southern California, Los mal Selection Theory for Superconcur- Angeles. His areas of re- Cho-Li Wang is a PhD rency,” Proc. Workshop on Heteroge- search include parallel candidate in the Depart- neous Processing, IEEE CS Press, Los architectures and scal- ment of Electrical Engi- Alamitos, Calif., Order No. 2702, 1992, able algorithms, image neering-Systems, Uni- pp. 13-22. understanding and parallel processing, VLSI versity of Southern computations, interconnection networks, and California, Los Angeles. 11.M. Iqba1,“PartitioningProblemsforHet- heterogeneous computing. His areas of research erogeneous Computer Systems,” tech. Khokhar received the BSc degree in elec- include computer archi- report, Dept. of Electrical Engineering- trical engineering from the University of tectures and algorithms, Systems, Univ. of Southern California, Engineering and Technology, Lahore, Paki- image understanding Los Angeles, 1993. stan in 1985 and the MS degree in computer and parallel processing, image compression, engineeringfrom Syracuse University in 1988. and heterogeneous computing. 12. E. Arnould et al., “The Design of Nectar: He is a student member of the Computer Wang received the BS degree in computer A Network Backplane for Heterogeneous Society. science and information engineering from Multicomputers,” Proc. Int’l Conf. A r - National Taiwan University, Taiwan, in 1985 chitectural Support f o r Programming and the MS degree in computer engineering Languages and Operating Systems (AS- from the University of Southern California P L O S I l l ) , IEEE CS Press, Los Alami- in 1990. tos, Calif., Order No. M1936 (microfiche), Viktor K. Prasanna 1989, pp. 205-216. (V.K.Prasanna Kumar) is an associate professor 13. ANSI X3T9.3, “High-Performance Par- in the Department of allel Interface: Hippi-PH, Hippi-SC,Hip- Electrical Engmeering- pi-FP, Hippi-LE, and Hippi-MI,”Work- Systems, University of Readers can contact Viktor K. Prasanna at ing Draft Proposed American National Southern California, Los the School of Engineering, Department of Standard for Information Systems,Amer- Angeles. His research Electrical Engineering-Systems, University ican Nat’l Standards Inst., New York, interests include paral- of Southern California, University Park, Los Jan.-Apr.. 1991. * ’ I le] computation, com- Angeles, CA 90089-2562. June 1993 21