On the Choice of Models of Computation for Writing Executable Specificatoins of System Level Designs

On the Choice of Models of Computation for Writing Executable
Specifications of System Level Designs
Ivan Jeukens1 Marius Strum2
ijeukens@lme.usp.br strum@lme.usp.br
Microelectronics Laboratory
Department of Electronics Engineering
Polytechnic School University of São Paulo
Abstract
System level designs are typically heterogeneous, thus combining different technologies. In order to
create executable specifications at such a level, a hardware description language, a programming
language, or a combination of both is used. However, the resulting description may not be efficient
because the ability of each language to capture the system's features is limited by its semantics. This
paper presents a methodology for analyzing the efficiency of a set of models of computation to build
executable specifications. We defined a set of "behavioral primitives" and evaluated how efficiently
they are captured by different models of computation. A debugger/profiler tool was developed. The
analysis of the data produced by our tool allows us to derive criteria to choose the most adequate model
of computation for each "primitive". The analysis of the "behavioral primitive" basic block is detailed
in order to illustrate the method.
1. Introduction
As the complexity of integrated system grows, the need for design methodologies that allow
designers to realize such systems under constrained resources is apparent. An important aspect of those
methodologies is the abstraction level of the initial model of the system. This model has the purpose of
capturing an informal specification, usually written in natural language, representing characteristics like
functionality, timing and structure. The obtained specification can normally be simulated or executed,
and the results analyzed.
The selection of a description language for pure embedded software or pure hardware design is
already well defined. For hardware, behavioral VHDL or C are common choices. For embedded
software, C is normally used. In certain cases, application specific languages like Matlab may be used.
When the abstraction is raised to hardware/software codesign level, the choice of a language
becomes more difficult due to intrinsic differences between hardware and software. Among the possible
solutions, two alternatives were studied: either combining in the same environment a hardware
description language and a software programming language or using one language and extending its
semantics. For instance, in [1] the cosimulation of a system described in Verilog and C is presented, in
[13] C++ classes are used to add the required semantics for hardware design.
At the system level, deciding how to create an executable specification is even a harder task,
since this level is mainly characterized by not having a particular architecture defined, although some
parts of the system may be already determined, and by having parts belonging to different application
domains. The design may involve analog hardware, digital hardware that is mainly control oriented,
data dominated digital hardware, optical devices, mechanical parts, real time software, etc.
The idea of combining a set of available programming languages was applied in [2] to a
mechatronic system. Matlab is used to model the mechanical part and SDL is used for the initial
specification of the electronic part, which is latter refined to a specification combining C and VHDL.
The approach of extending the semantics of a programming language has been studied in several recent
works [7][10][13].
Another viewpoint for the task of creating an executable specification is from the model of
computation (MoC) being used. A model of computation is a set of rules that govern the interaction
between components of a specification. Several models of computation are available [3]. The semantics
1 Graduate student with a CAPES scholarship
2 Associate Professor

of a programming language is based on one or more models of computation.
In this work we describe a methodology that may be used to solve the problem of choosing
models of computation for creating efficient executable specifications. The methodology is based on:
1. defining a set of behavioral primitives;
2. generating toy examples in order to represent each primitive under
different models of computation;
3. analyzing the characteristics of the executions;
4. deriving conclusions related to the efficiency of the pairs
primitive/MoC.
From the results of this methodology, we can establish guidelines and metrics for aiding the
designer to write executable specifications. The next section presents some background information
concerning the modules of computation. Section 3 briefly describes a tool created for analyzing
information collected from the execution of specifications. Section 4 illustrates the methodology
through the analysis of the "basic block" behavioral primitive. Finally, section 5 presents our
conclusions and future work.
2. Background
The Ptolemy II [3][12] framework was chosen for the development of this work. Besides the
public availability of the binary and source codes and its good documentation and support, Ptolemy II is
a convenient choice for our work because it implements several models of computation and supports
the addition of other models. A specification in Ptolemy II is composed of several components, also
called actors, that interact by exchanging data encapsulated inside objects called token, through
connections between input/output ports. The framework has a type system with several different types
of tokens, such as integer, double, fixed point, object, etc, and supports the addition of new types. It is
also possible to write data polymorphic actors. Hierarchy is supported by the use of container actors.
The set of models of computation used in this work are:
• Synchronous Data Flow (SDF) [8]: dataflow model where each actor consume and produce a
fixed number of tokens each time they fire. Those values are determined by the designer
when the specification is built. The size of each FIFO buffer is bounded and known at
compile time. The specification can be statically scheduled;
• Dynamic Data Flow3
(DDF) [9]: dataflow model that fires an actor when one of its firing
rules is met. A firing rule specify a pattern of tokens on the actor's input ports. Only one
firing rule may be satisfied simultaneously for a deterministic execution. This condition can
be satisfied by implementing blocking reads. In general, this model requires dynamic
scheduling;
• Process Network (PN) [6][11]: based on concurrent process communicating through
unidirectional FIFO buffers. Each actor is associated with a process. The read operation on
an empty FIFO blocks the respective actor ensuring a deterministic execution. In general, the
size of each FIFO buffer cannot be bounded, which can lead to a premature termination of
the execution;
• Discrete Event (DE) [3]: based on the processing of events generated by actors. An event is
composed of a token, a time stamp and a priority value. Events are kept by the scheduler on
a FIFO, ordered according to the time stamp and the priority value, and processed in
increasing order. An actor is fired when it is the destination of the next event to be processed.
The priority value is associated with each actor and used to ensure a deterministic execution
when events with equal time stamp values are present for different actors;
• Communication Sequential Process (CSP) [3][5]: based on concurrent processes. Each actor
is associated with a process. Rendezvous communication is employed: both sender and
receiver actors must be ready for exchanging data, otherwise the ready one must wait for the
other. The model also supports nondeterministic rendezvous through the use of guarded
communication;
• Synchronous Reactive4
(SR) [4]: based on the synchrony hypothesis: for a set of inputs, the
system will react producing outputs instantaneously. The connections are unbuffered, thus
3 This MoC was added since it is not available in the current version of the Ptolemy II tool.
4 This MoC was added since it is not available in the current version of the Ptolemy II tool.

each connection has the same (if present) event during an instant. An output may be in one
out of three states: unknown, absent or present. The state of an output may no longer change
when it goes from unknown to a known (absent or present) state. The actors are divided into
two classes: strict, require that all inputs are known in order to fire, and nonstrict, may be
fired several times, independent of the state of the inputs. A strict actor fires only once in an
iteration.
3. Debugger/Profiler
A debugger/profiler tool was developed in order to aid the analysis of behavioral primitives. It
is based on trace data collected during the execution of the specification. Two different types of
information are collected: the production and consumption of a token and the state of an actor, which
can be running, ready or blocked
5
.
The first purpose of the tool is for debugging a specification. Each token production or
consumption defines a new instant, a value used for ordering the trace data. The tool can show the state
of the execution at a particular instant, i.e., the state of each actor and in what connection a token was
being produced or consumed. A modification may be implemented to show under certain MoCs the
concurrent behavior of the execution. It is also possible to inspect all or a subset of the data collected.
The second use of the tool is for profiling the execution. Based on the trace data, it is possible
to compute a number of statistics, like the percentage of instants that an actor was in a particular state,
the number of tokens at a connection in each instant, the maximum and minimum instant interval
between consumption and production of tokens. Those statistics can be displayed in graphs and tables.
Other purposes for the tool can be: comparing the efficiency of different schedulers of a model
of computation; improving the efficiency of a specification. figure 1 shows the interface of the tool.
Figure 1 - Analyzer tool screen dump
4. The Analysis Methodology
4.1 Behavioral Primitives
A behavioral primitive is a characteristic that a designer needs to represent in order to capture
part of a system specification. Programming language constructs such as basic block, fixed length
iteration, conditional execution, data dependent iteration, procedure call and recursion are primitives.
Behaviors like synchronization, preemption and concurrency are also primitives.
4.2 Basic Block
We will now illustrate the methodology for the "basic block" primitive, i,e, a sequential
execution of statements without iteration constructs, branching constructs, function calls or preemption.
5 In certain domains read or write blocked.

4.2.1 Toy Example
We adopted the butterfly curve calculation as the toy example for studying this primitive. The
right hand side of figure 2 shows a procedure that implements this computation.
Figure 2 - The butterfly curve procedure and actor diagram. The actors inside the dashed box
implements the procedure.
4.2.2 The specification
The first decision to make when creating the executable specification is the mapping of a
procedure statement to an actor, i.e., what is the granularity of each actor's function. Although a
solution would be to create one actor that implements the procedure, this straightforward approach is
not well suited since representing a primitive only inside an actor doesn't expose its characteristics to
the model of computation. Therefore, the employed map associates each statement with a different
actor. The left hand side of figure 2 shows the obtained topology.
Two characteristics of the specification should be noted. First, it employs only domain
polymorphic actors, i.e, actors that can be used without any modification in several different MoCs. In
this situation, using a different MoC does not require creating a new specification, or it may ease the
process of translation. The only necessary modification to the butterfly example was under the discrete
event model. Since this MoC works by processing events, the source actor (Ramp) has to schedule itself
for future executions. Thus, the source actor should be changed or the addition of a triggering actor
(Clock) should be made.
The second characteristic is that although we are interested in studying the execution of
sequential statements, depending on which MoC is used, actors may be running concurrently. This is
unavoidable because concurrency is a characteristic defined by a MoC. The chosen granularity for the
butterfly example exposes the statements to concurrent execution. Thus, the inherent concurrency of
that particular butterfly's implementation will be exploited, which is different from writing a concurrent
specification (algorithm).
4.2.3 Execution Analysis
4.2.3.1 SDF
All actors consume and produce one token each time they fire. The reason for this example
being homogeneous is that the communication between actors correspond to the writing and reading of
temporary variables in the butterfly procedure. Hence, the observed number of tokens at each FIFO
buffer never exceeded one. The obtained schedule fired each actor once respecting a topological order.
4.2.3.2 DDF
The execution was very similar to the SDF one. This happened because of the classification of
butterfly(double input, double *x, double *y) {
double t1, t2, t3;
t1 = input * 1/12.0; // Scale1
t1 = sin(t1); // Sine
t2 = t1 * t1; // Mult1
t2 = t2 * t2; // Mult2
t3 = t2 * t1; // Mult3
t1 = input * 4.0; // Scale2
t1 = cos(t1); // Cos2
t1 = -2.0 * t1; // Scale3
t2 = cos(input); // Cos1
t2 = exp(t2); // Exp
t3 = t3 + t2 + t1; // Add
*x = t3 * cos(input); // PtoR
*y = t3 * sin(input); // PtoR
}
Scale1
Ramp
Sine Mult1 Mult2
Mult3
Scale2
Cos1
Cos2
Exp
Scale3 Add PtoR Plotter
22

an actor being deferrable
6
, as can be illustrated by the Ramp actor: once it fires, it will be classified as
deferrable until the actor PtoR fires. Removing the deferrable classification from the scheduler, thus
using a pure data driven one, enables the successively firing of actor Ramp and its successors. The
result is a pipelined execution, where the maximum number of tokens observed was at the Ramp/PtoR
connection. After a certain amount of time, this number varies from 7 to 8, which is related to the
longest path from actor Ramp actor to actor PtoR.
4.2.3.3 PN
The execution under this MoC was also guided by data dependencies. A difference with
respect to the two previous ones is that the actors are not fired atomically, since this MoC is based on a
set of concurrent process. The connection Ramp/PtoR was again the impediment for the presence of
temporal parallelism. If not initialized with a different value, the initial capacity of each FIFO is set to
one, and only modified when an artificial deadlock7
happens. For the sake of exploration, raising that
value from one to five caused the percentage of instants that the Ramp actor was write blocked drop
from 43 to 8.5. If this value is raised towards the positive infinity, the obtained execution would be
similar to the one when the pure data driven DDF scheduler is used.
4.2.3.4 DE
The sequence of firings followed a topological order of the actor's graph. This order is
generated by the priority assignment computation used to resolve the simultaneous event problem,
mentioned in section 2. Once again, the maximum number of tokens in a connection was one.
Observing the butterfly's execution behavior, the following question may be raised: is it
necessary to process events from a zero delay execution through the scheduler's queue, since in that
case what determines the execution sequence is priority of each actor and presence of events? In
general, it is reasonable to assume that delay actors will be used. Therefore, a more interesting question
would be if any advantage can be taken when using several zero delay actors. In seeking an answer, we
developed the following modification to the DE scheduler.
Figure 3 - The left hand side shows a simplified version of the regular DE scheduler algorithm and the
right hand side the modified version.
The modified algorithm relies on the fact that an actor can only produce events with a time
stamp greater or equal to the current time. Future events are processed through the scheduler's queue, as
in the regular case, but zero delay events are sent directly to the destination actor. The topological order
ensures the correct execution. Table 1 shows comparative results for both algorithms.
6 A deferrable actor is one that although a firing rule is met, it is prevented from firing because all
actors connected to its output ports have enough data on those connections.
7 The PN scheduler also blocks a process when there is no more room in the FIFO buffer. An
artificial deadlock happens when the system is deadlocked and some actors are write blocked.
Algorithm S1:
1 - If the event queue is empty or the stop time is
reached, goto 8.
2 - Get the next event from the queue.
3 - Set the current time to the timestamp of the event.
4 - Determine the actor who is the destination of the
event and make the token available.
5 - If there are no simultaneous events to the
destination actor, goto 7.
6 - Remove all simultaneous events to the destination
actor from the queue and make the tokens
available.
7 - Fire the actor until it has consumed all input
tokens.
8 - Goto 1.
9 - End.
Algorithm S2:
1 - If the event queue is empty or the stop time is
reached, goto 9.
2 - Get the next event from the queue.
3 - Set the current time to the timestamp of the event.
4 - Determine the actor who is the destination of the
event and make the token available.
5 - Remove from the queue all events with the
timestamp equal to the current time, making the
tokens available to the respective destination
actors.
6 - Respecting a topological order, fire each actor
that has tokens on its inputs until it has consumed
all tokens. Send the produced futre events to the
scheduler's queue and put the current time events
at the destination actor's ports.
7 - If there are actors with tokens on their input ports,
goto 6.
8 - Goto 1.
9 - End.

Table 1 - Results for the DE schedulers results using three benchmarks. The columns show (from left to
right): number of delay actors used, number of zero delay actors used, number of events sent to the
queue using algorithm 1, number of events sent to the queue using algorithm 2, execution time in ms
using algorithm 1, execution time in ms using algorithm 2.
Delay Zero Delay Events S1 Events S2 Time S1 Time S2
6 28 27131 8662 5272 4666
1 14 20021 1001 4468 3150
3 3 9884 1945 5317 4620
4.2.3.5 CSP
The execution was successfully carried out. Compared to the PN model, the percentage of
instants that actors were blocked increased, with consequent decrease of readiness. This can be
explained by the synchronous communication and illustrated by actor Scale3. After it starts a write, it is
blocked waiting for actor Add to read the token. Actor Add also has to read tokens from actors Mult3
and Exp. Therefore, actor Scale3 will be write blocked for a duration that depends on how actor Add
code is written. Such situation does not happen under PN, since the communication is done
asynchronously.
This example has a potential deadlock situation, related to how an actor's function is specified
and to the way the topology is created8
. Actor Ramp sends a token to four other actors: Scale1, Scale2,
Cos1 and PtoR. The Ptolemy execution engine will loop through the connections of actor Ramp,
sending the respective data. The order in which those connections are accessed will depend on how the
example was build. If the connection from Ramp to PtoR is created before the connection to Cos1, the
example will deadlock because actor Ramp will be write blocked on the connection to PtoR, actor PtoR
will be read blocked on the connection to actor Add, and actor Add will never execute since the branch
from actor Cos1 will not produce any token.
4.2.3.6 SR
The butterfly example is only composed of strict actors. Therefore, the resulting execution
sequence fired each actor once and respected a topological order.
4.3 Discussion
The basic block primitive was successfully captured by all six MoCs. From the results
discussed in section 4.2.3 we may derive the following general observations:
• dataflow models (DDF, PN): the bounds on the maximum size of FIFO buffers varied when
modifying execution parameters or schedulers. This variation is related to the granularity of
the actor. Changing the topology would have altered those bounds. Therefore, the decision of
how to partition a behavior or a system into different blocks will have an impact on the
buffering requirements, or/and on the activity of each block, among other possible
parameters;
• SR: The specification using the synchronous reactive model was trivial. However, the
butterfly example made no use of actors that produce several tokens on a firing. Those actors
could be easily represented in all other MoCs, but for SR it would be a problem, since each
actor can produce only one token per instant;
• CSP: aside from changing the actor's function or the example topology, guarded
communication could also be employed in solving the deadlock problem, since it enables an
actor to wait for data from several different connections.
Although only one toy example was used to capture the basic block behavioral primitive, other
toy examples may be developed. They would only become relevant if other observations of the pair
8 This particular example was pointed out by John Davis II.

primitive/MoC could be made. Therefore, in studying a primitive, more than one toy example may be
used. Also, one toy example may have more than one possible specification under the same MoC.
To illustrate the situation of a toy example with more than one specification, another primitive
will be briefly commented: the fixed length iteration. The left hand side of figure 4 shows the code for a
matrix multiplication
9
(toy example). The right hand side of figure 4 shows a first specification for this
example.
The generation of the values for variables k, j and i is done respectively by actors Modulo,
Divide1 and Divide2. The actors CT and Data represent the input matrixes. The actor AccumAdd
produces the sum of the last N values.
Figure 4 - A matrix multiplication code and its specification topology.
It was intended that this specification could be used without modification in all six MoCs, as
was done with the butterfly example. Therefore, the restriction that each actor should produce and
consume one token was applied.
The analysis of the execution for this specification produced the same observations as for the
basic block primitive, since both have similar specifications (a similar topology without feedback, only
one token consumed/produced). Although successful, this specification generates redundant tokens, as
can be illustrated by actors Modulo and Divide2: the latter should only generate a new token after N
2
tokens from actor Modulo. Therefore, more efficient specifications were created by exploiting
particular characteristics of the employed MoC. Figure 5 illustrates a modification when using SDF.
Figure 5 - A new specification for the matrix multiplication under SDF. The numbers at near each actor
connection is a sample rate. The Sequencer actor generates a sequence of values from 0 to
N - 1.
Different sample rates were used in other to avoid redundant tokens. The iterative behavior is
then achieved by firing an actor more times than another actor. For the DDF, PN and CSP MoCs, the
use of blocking reads can eliminate redundant tokens. Under DE, events are sent only when necessary.
The same behavior is achieved under SR by making outputs absent.
5. Conclusion
This paper presented a methodology that may be used to solve the problem of creating
efficient system level executable specifications when faced with different models of computation. The
concept of a behavioral primitive was defined. In order to illustrate the methodology, the butterfly curve
toy example was adopted to describe and analyze the primitive "basic block of computation" under six
different models of computation. A second primitive was briefly presented.
Future work will include the analysis of other primitives by creating different toy examples.
Available executable specifications will also be considered for enumerating behavioral primitives. It
should be noted that the results regarding the efficiency of the description of a primitive can be applied
to other system level design methods, since it is based on models of computation and some of these
models are shared by different frameworks.
The ultimate goal of this work is to generate a library of behavioral primitives, characterized
by their modeling efficiency under different models of computation. Starting from an initial
9 The actual code and the created specifications always used N = 8.
Sequencer1
Sequencer2
Sequencer3
CT
Data
Mult AccumAdd Res
R
R
C
C
1
1
1
1
8
1
64
1
1
8
64
1 8
1
1
8
8
for(int i = 0;i < N; i++) {
for(int j = 0;j < N;j++) {
for(int k = 0;k < N;k++) {
res[i][j] += data[i][k] * ct[k][j];
}
}
}
Ramp Modulo
Divide1
Divide2
CT
Data
Mult AccumAdd Res
R
R
C
C
i
j
k

specification of the system, this library can be used as a repository of information to produce an
efficient executable specification. Starting from an executable specification, the library can aid the
generation of a more efficient one.
References
[1] D. Becker, R. K. Singh, S. G. Tell, "An Engineering Environment for HW/SW CoSimulation",
Proceedings of the Design Automation Conference, 1992.
[2] P. Coste, et al, "Multilanguage Systems Codesign", Proceedings of the International Workshop on
Hardware-Software Codesign, 1999.
[3] J. Davis II, et al, "Ptolemy II: Heterogeneous Concurrent Modeling and Design in Java", Technical
Report UCB/ERL No. M99/63, Dept. EECS, University of California, Berkeley, December 15, 1999.
[4] S. A. Edwards, "The Specification and Execution of Heterogeneous Synchronous Reactive
Systems", PhD thesis, Dept. EECS, University of California, Berkeley, May, 1997.
[5] C. A. R. Hoare, "Communicating Sequential Processes", Communications of the ACM, Vol. 21, No.
8, August 1978.
[6] G. Kahn, "The Semantics of A Simple Language for Parallel Programming", Information
Processing, August, 1974.
[7] L. Lavagno, E. Sentovich, "ECL: A Specification Environment for SystemLevel Design",
Proceedings of the Design Automation Conference, 1999.
[8] E. A. Lee, D. G. Messerschmitt, "Synchronous data flow", Proceedings of the IEEE, Vol. 75, No. 9,
September 1987.
[9] E. A. Lee, T. M. Parks, "Dataflow Process Networks", Proceedings of the IEEE, Vol. 83, No. 5,
May 1995.
[10] J. Martin, et al, "Modeling Reactive Systems in Java", ACM Transaction on Design Automation of
Electronic Systems, Vol 3, No. 4, October 1998.
[11] T. M. Parks, "Bounded Scheduling of Process Networks", PhD Dissertation, Dept. EECS,
University of California, Berkeley, December 1995.
[12] Ptolemy project home page. See http://ptolemy.eecs.berkeley.edu.
[13] System C User's Guide, Version 1.0. See http://www.systemc.org.

On the Choice of Models of Computation for Writing Executable Specificatoins of System Level Designs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a On the Choice of Models of Computation for Writing Executable Specificatoins of System Level Designs

Semelhante a On the Choice of Models of Computation for Writing Executable Specificatoins of System Level Designs (20)

Último

Último (20)

On the Choice of Models of Computation for Writing Executable Specificatoins of System Level Designs