SlideShare uma empresa Scribd logo
1 de 140
A Survey on Domain-Specific Languages for Machine.pdf
A Survey on Domain-Specific Languages for Machine
Learning
August 3, 2017
CISC 603-50- R-2017/Summer - Theory of Computation
Student: Dileep Sharma
Instructor: Majid Shaalan
Contents
1 Statement 2
2 Abstract 2
3 Introduction 2
3.1 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
3.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 3
3.3 Domain Specific Language . . . . . . . . . . . . . . . . . . . . . . . .
. . . . 4
4 DSL Feature Model 4
4.1 Language Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 4
4.2 Transformation Feature . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 6
4.3 DSL Tool Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 6
4.4 DSL Process Features . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 9
5 Languages Surveyed 9
5.1 OptiML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
5.2 ScalOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
5.3 Scala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
5.4 PIG LATIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
5.5 Breukervl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
5.6 Possibility of survey of other language . . . . . . . . . . . . . . . .
. . . . . 11
5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 11
6 Reference 12
1
1 Statement
The purpose of this paper is to identify, describe and design
Domain Specific Language(
DSL ) applicable to Machine learning world in big data space,
that can make process more
faster and efficient.
2 Abstract
In last couple of decades, the data we have at our disposal have
increased tremendously
because of technology advance. This technology advance has
helped us in capturing, storing,
analogizing and visualizing data and that has lead to big data.
We need better algorithm
to read and analyze these big and complex diastases. Machine
Learning is turning out to be
the most effective way of analyzing these datasets and
predicting future behavior. To better
analyze these datasets with Machine Learning we need enhanced
computational power, that
can be obtained using parallel processing using GPUs. Machine
Learning algorithms needs
to be adapted and optimized to specific applications. However,
programming these devices
to run efficiently and correctly is difficult, error-prone, and
results in software that is harder
to read and maintain. This paper is primarily concern about
Domain Specific language that
can help us in writing Machine Algorithms in efficient way to
analyze Big Data.
3 Introduction
Technological advance in recent past has caused a data
revolution. This high volume of data
is called big data.Every second, smartphones, tablets,cars,
websites, and systems generate
a massive amount of data, and users and software engineers
have access to a subset of that
data to perform their activities.
3.1 Big Data
Apart for large amount of data, Big data also accounts for
complex data, known as vari-
ety.Big Data also created new challenges in data management.
Traditional ways of data
storage and analysis do not scale well to this amount of data,
which can reach hundreds
of terabytes or more, and new approaches are being developed
to address these issues Big
data is basically defined by 5V’s:
• Volume
– Refers to amount of data
– Big Data doesnt sample
2
– Big Data observes and tracks what happens
• Velocity
– Speed of data processing
– Speed of data generation
– Big Data is often available in real-time
• Variety
– Number of types of data
• Variability
– Inconsistency of data
• Veracity
– Quality of data
3.2 Machine Learning
Machine learning is turning out to be one of the most advanced
technique to process and
make inferences from Big Data. Machine Learning is widely
used to discover identify trends,
patterns, suggest actions, and optimize output. There are still a
lot of challenges in using
Machine Learning to solve big data problems, such as memory
and time issues. To resolve
these issues, we can use GPUs for parallel processing and
scatter data across different
machines. There are basically two kind of Machine Learning:
• Supervised
– All data is labeled
– You have both Input variable and Output variable
– Use an algorithm to learn the mapping function from the input
to the output
• Unsupervised
– All data is unlabeled
– You only have input data and no corresponding output
variables
– Algorithm try to find pattern in input data
3
3.3 Domain Specific Language
In model-driven engineering, a Domain-Specific Language
(DSL) is a specialized language,
which, combined to a transformation function, serves to raise
the abstraction level of soft-
ware and ease software development. The Machine Learning
Implementation can be made
better by using techniques such as Domain-Specific. DSL solves
problem in a single domain
while General Purpose Languages(GSL) solves problems in a
couple of domains. DSLs
facilitate results to be expressed in the idiom and at the level of
abstraction of the prob-
lem domain Language.DSLs offer pre-defined abstractions to
represent concepts from the
application domain. This representation may be more clear and
intuitive. Moreover, DSL
compilers may optimize the code written for the specific
domain, and they can perform
error detection more efficiently. Lastly, DSLs may have more
specific tool support that help
software engineers increase their productivity. These languages
are easier to learn. There
can be three kind of DSL languages:
• Markup language
• Specification Language
• Programming Language
4 DSL Feature Model
DSl Feature model covers languages, transformation, tooling,
and process aspects 1. Lan-
guage and transformation are mandatory features because they
are parts of the DSL defini-
tion. Tool is also mandatory because it serves to automate
transformation from a domain,
the problem space, down to lower abstraction levels, the
solution space. Process is optional
because it can be undefined or implicit.
4.1 Language Features
There are two language features called as2:
• Abstract Syntax
– Characterizes elements of a domain and their relationships
without implementa-
tion consideration
• Concrete Syntax
– Representation of a DSL in a human usable form
4
Figure 1: Roots of DSL Feature Language
Figure 2: Language Feature
5
Figure 3: Root of the Transformation Features
4.2 Transformation Feature
Transformation feature ensures the correspondence from the
problem to the solution, takes
into account the problem-to-solution element mapping, and all
design, implementation,
platform and architecture decisions. Transformation has to
answer to three questions3:
How to specify transformation4? What are the assets expected
from the transformation5?
How to realize the transformation to produce the expected
assets? 6.
4.3 DSL Tool Features
There are basically three kind of tool features namely Respect
of Abstraction, Assistance
and Quality Factor/reftool. The purpose of abstraction is to
reduce software description.
Abstraction can be intrusive pr seamless. Assistance aims at
guiding the DSL tool user dur-
ing definition and transformation of domain data. Assistance is
adaptive when assistance
changes in function of the context of usage and its Static when
it does not changes. Process
guidance guide the user at a process step or at the process
workflow level and Checking is
mandatory in the feature model because a DSL tool must ensure
consistency and complete-
ness of domain data. DSL checking can be realized on the fly or
on user action. Quality
Factor covers non-functional aspects of DSL Tool.
6
Figure 4: Specification Features
Figure 5: Target Asset Features
7
Figure 6: Operational Transformation Features
Figure 7: DSL Tool Features
8
Figure 8: DSL Process Features
4.4 DSL Process Features
A DSL process defines how development projects with DSL
must be executed. This part of
the feature model addresses the Domain-Specific Software
Development (DSSD). It can be
of three types: Work Definition, Role and Guidance/refProcess.
5 Languages Surveyed
5.1 OptiML
• Highly expressive language textual programming language
built on top of Scala
• OptiML provides the link between ML applications and
heterogeneous parallel hard-
ware
• OptiML code outperformed explicitly parallelized MATLAB
code on heterogeneous
system
• Require no knowledge of the underlying embedding
implementation
• No explicit parallelization
• No explicit code for the lower level programming models
• Statically typed language
9
• Declarative language
• Transative language
• Support Vector, matrix, Graph operations
• Does not supports Distributed and Cloud computing feature
5.2 ScalOps
• It enables algorithms to run on cloud
• Textual programming language
• A declarative language
• Statically typed
• Supports vector, matrix,and graph operations in both parallel
and cloud computing
environment
• Transative language
5.3 Scala
• It is concise
• Has broad applicability
• Texual programming language
5.4 PIG LATIN
• Textual programming language
• Imperative language
• Dynamically typed
• Translational language
• Support Vector, matrix operations
• Does not support graph operations
• Supports Distributed and Cloud computing feature
• Supports parallel operation
10
5.5 Breukervl
• Modeling Graphical language
• Descriptive language
• Does not Support Vector, matrix and graph operations
• Does not supports Distributed and Cloud computing and
parallel computing feature
5.6 Possibility of survey of other language
As time permits, I would take to take my analysis further and
survey other Domain Specific
Languages.
5.7 Conclusions
Because of size and complexity of datasets machine learning
algorithms are facing a lot of
software and hardware challenges. We need to move towards
parallel and heterogeneous
hardware systems. A lot of DSL are developed to resolve these
issues. These languages pro-
vide parallel execution of code, distributed computation and
cloud computation. Following
points applies to most of the DSL languages:
• Most of the DSL language are programming languages
• Most of the languages are complied
• Modeling languages are all graphical and have a descriptive
model
• Most of them support
– Parallelism
– Distribution
– Cloud
11
6 Reference
References
[1] “A Survey on Domain-Specific Languages for Machine
Learning in Big Data” 2016 IEEE
International Conference on Software Science, Technology and
Engineering (SWSTE).
[2] “OptiML: An Implicitly Parallel Domain-Specific Language
for Machine Learning” 28th
International Conference on Machine Learning, Bellevue, WA,
USA, 2011
[3] “Domain Specific Languages for Machine Learning” GPCE
2016.
[4] “Towards Model-Driven Engineering for Big Data Analytics
– An Exploratory Analysis
of Domain-Specific Languages for Machine Learning” 2014
47th Hawaii International
Conference on System Sciences.
[5] “An overview of big data opportunities, applications and
tools. In Intelligent Systems
and Computer” Vision (ISCV), 2015 (pp. 1-6). IEEE
[6] “ DSL Classification”
12
StatementAbstractIntroductionBig DataMachine
LearningDomain Specific LanguageDSL Feature
ModelLanguage FeaturesTransformation Feature DSL Tool
FeaturesDSL Process FeaturesLanguages
SurveyedOptiMLScalOpsScalaPIG LATINBreukervl Possibility
of survey of other languageConclusionsReference
Design Guidelines for Domain Specific Languages.pdf
Design Guidelines for Domain Specific Languages
Gabor Karsai
Institute for Software
Integrated Systems
Vanderbilt University
Nashville, USA
Holger Krahn
Software Engineering Group
Department of Computer
Science
RWTH Aachen, Germany
Claas Pinkernell
Software Engineering Group
Department of Computer
Science
RWTH Aachen, Germany
Bernhard Rumpe
Software Engineering Group
Department of Computer
Science
RWTH Aachen, Germany
Martin Schindler
Software Engineering Group
Department of Computer
Science
RWTH Aachen, Germany
Steven Völkel
Software Engineering Group
Department of Computer
Science
RWTH Aachen, Germany
ABSTRACT
Designing a new domain specific language is as any other
complex task sometimes error-prone and usually time con-
suming, especially if the language shall be of high-quality
and comfortably usable. Existing tool support focuses on
the simplification of technical aspects but lacks support for
an enforcement of principles for a good language design. In
this paper we investigate guidelines that are useful for de-
signing domain specific languages, largely based on our ex-
perience in developing languages as well as relying on ex-
isting guidelines on general purpose (GPLs) and modeling
languages. We defined guidelines to support a DSL devel-
oper to achieve better quality of the language design and a
better acceptance among its users.
1. INTRODUCTION
Designing a new language that allows us to model new
technical properties in a simpler and easier way, describe or
implement solutions, or to describe the problem resp. re-
quirements in a more concise way is one of the core chal-
lenges of computer science. The creation of a new language
is a time consuming task, needs experience and is thus usu-
ally carried out by specialized language engineers. Nowa-
days, the need for new languages for various growing do-
mains is strongly increasing. Fortunately, also more sophis-
ticated tools exist that allow software engineers to define a
new language with a reasonable effort. As a result, an in-
creasing number of DSLs (Domain Specific Languages) are
designed to enhance the productivity of developers within
specific domains. However, these languages often fit only
to a rather specific domain problem and are neither of the
quality that they can be used by many people nor flexible
enough to be easily adapted for related domains.
During the last years, we developed the frameworks Mon-
tiCore [13] and GME [2] which support the definition of
domain specific languages. Using these frameworks we de-
signed several DSLs for a variety of domains, e.g., a textual
version of UML/P notations [17] and a language based on
function nets in the automotive domain [5]. We experienced
that the design of a new DSL is a difficult task because dif-
ferent people have a varying perception of what a “good”
language actually is.
This of course also depends on the taste of the developer
respectively the users, but there are a number of generally
acceptable guidelines that assist in language development,
making it more a systematic, methodological task and less
an intellectual ad-hoc challenge. In this paper we summa-
rize, categorize, and amend existing guidelines as well as
add our new ones assuming that they improve design and
usability of future DSLs.
In the following we present general guidelines to be consid-
ered for both textual and graphical DSLs with main focus is
on the former. The guidelines are discussed sometimes using
examples from well-known programming languages or math-
ematics, because these languages are known best. Depend-
ing on the concrete language and the domain these guidelines
have to be weighted differently as there might be different
purposes, complexity, and number of users of the resulting
language. For example, for a rather simple configuration
language used in only one project a timely realization is usu-
ally more important than the optimization of its usability.
Therefore, guidelines must be sometimes ignored, altered, or
enforced. Especially quality-assurance guidelines can result
in an increased amount of work.
While we generally focus in our work on DSLs that are
specifically dedicated to modeling aspects of (software) sys-
tems, we believe that these guidelines generally hold for any
DSL that embeds a certain degree of complexity.
1.1 Literature on Language Design
For programming languages, design guidelines have been
intensively discussed since the early 70s. Hoare [8] intro-
duced simplicity, security, fast translation, efficient object
code, and readability as general criteria for the design of
good languages. Furthermore, Wirth [22] discussed sev-
eral guidelines for the design of languages and correspond-
ing compilers. The rationale behind most of the guidelines
and hints of both articles can be accepted as still valid to-
day, but the technical constraints have changed dramati-
cally since the 70s. First of all, computer power has in-
creased significantly. Therefore, speed and space problems
have become less important. Furthermore, due to sophis-
ticated tools (e.g., parser generators) the implementation
of accompanying tools is often not a necessary part of the
language development any more. Of course, both articles
concentrate on programming languages and do not consider
the greater variety of domain specific languages.
More recently, authors have also discussed the design of
domain specific modeling languages. General principles for
modeling language design were introduced in [14]. These
include simplicity, uniqueness, consistency, and scalability,
on which we will rely later. However, the authors did not
discuss how these higher level principles can be achieved.
In [12] certain aspects of the DSL development are explained
and some guidelines are introduced. More practical guide-
lines for implementing DSLs are given in [10]. These focus
on how to identify the necessary language constructs to gen-
erate full code from models. The authors explain how to
provide tool support with the MetaEdit+ environment. [20]
explains 12 lessons learned from DSL experiments that can
help to improve a DSL. Although more detailed discussions
on explicit guidelines are missing, these lessons embed doc-
umented empirical evidence – a documentation that many
other discussions, including ours do not have. In [16] the
authors introduce a toolset which supports the definition
of DSLs by checking their consistency with respect to sev-
eral objectives. Language designers can select properties of
their DSL to be developed and the system automatically
derives other design decisions in order to gain a consistent
language definition. However, the introduced criteria cover
only a subset of the decisions to be made and hence, cannot
serve as the only criteria for good language design. Quite
the contrary, to our experience many design guidelines can-
not be translated in automatic measures and thus cannot be
checked by a tool.
1.2 Categories of DSL Design Guidelines
The various design guidelines we will discuss below, can
be organized into several categories. Essentially, these guide-
lines describe techniques that are useful at different activities
of the language development process, which range from the
domain analysis to questions of how to realize the DSL to the
development of an abstract and a concrete syntax including
the definition of context conditions. An alignment of guide-
lines with the language development activities and the de-
veloped artifacts has the advantage that a language designer
can concentrate on the respective subset of the guidelines at
each activity. This should help identifying and realizing the
desired guidelines. Therefore, we decided for a development
phase oriented classification and identified the following cat-
egories:
Language Purpose discusses design guidelines for the early
activities of the language development process.
Language Realization introduces guidelines which discuss
how to implement the language.
Language Content contains guidelines which focus on the
elements of a language.
Concrete Syntax concentrates on design guidelines for the
readable (external) representation of a language.
Abstract Syntax concentrates on design guidelines for the
internal representation of a language.
For each of these categories we will discuss the design
guidelines we found useful. Please be aware that the subse-
quently discussed guidelines sometimes are in conflict with
each other and the language developer sometimes has to bal-
ance them accordingly. Additionally, semantics is explicitly
not listed as a separate step as it should be part of the entire
development process and therefore has an influence on all of
the categories above.
2. DSL DESIGN GUIDELINES
2.1 Language Purpose
Language design is not only influenced by the question of
what it needs to describe, but equally important what to do
with the language. Therefore, one of the first activities in
language design is to analyze the aim of the language.
Guideline 1: “Identify language uses early.” The language
defined will be used for at least one task. Most common
uses are: documentation of knowledge (only) and code ge-
neration. However, there are a lot more forms of usage:
definition or generation of tests, formal verification, auto-
matic analysis of various kinds, configuration of the system
at deployment- or run-time, and last but increasingly im-
portant, simulation.
An early identification of the language uses have strong in-
fluence on the concepts the language will allow to offer. Code
generation for example is not generally feasible when the
language embeds concepts of underspecification (e.g., non-
deterministic Statecharts). Even if everything is designed to
be executable, there are big differences regarding the over-
head necessary to run certain kinds of models. If efficient
execution on a small target machine is necessary (e.g., mo-
bile or car control device) then high-level concepts must be
designed for optimized code generations. For simulation and
validation of requirements however, efficiency plays a minor
role.
Guideline 2: “Ask questions.” Once the uses of a language
have been identified it is helpful to embed these forms of
language uses into the overall software development process.
People/roles have to be identified that develop, review, and
deploy the involved programs and models. The following
questions are helpful for determining the necessary decisions:
Who is going to model in the DSL? Who is going to review
the models? When? Who is using the models for which
purpose?
Based thereon, the question after whether the language is
too complex or captures all the necessary domain elements
can be revisited. In particular, appropriate tutorials for the
DSL users in their respective development process should
now be prepared.
Guideline 3: “Make your language consistent.” DSLs are
typically designed for a specific purpose. Therefore, each
feature of a language should contribute to this purpose, oth-
erwise it should be omitted. As an illustrative example we
consider a platform independent modeling language. In this
language, all features should be platform independent as
well. This design principle was already discussed in [14].
2.2 Language Realization
When starting to define a new language, there are several
options on how to realize it. One can implement the DSL
from scratch or reuse and extend or reduce an existing lan-
guage, one can use a graphical or a textual representation,
and so on. We have identified general hints which have to
be taken into account for these decisions.
Guideline 4: “Decide carefully whether to use graphical or
textual realization.” Nowadays, it is common to use tools
supporting the design of graphical DSLs such as the Eclipse
Modeling Framework (EMF) or MetaEdit+. On the other
hand, there exist sophisticated tools and frameworks like
MontiCore or xText for text-based modeling. As described
in [6], there are a number of advantages and disadvantages
for both approaches. Textual representations for example
usually have the advantage of faster development and are
platform and tool independent whereas graphical models
provide a better overview and ease the understanding of
models. Therefore, advantages and disadvantages have to
be weighted and matched against end users’ preferences in
order to make a substantiated decision for one of the real-
izations. From this point on, a more informed decision can
be made for a concrete tool to realize the language based
on their particular features and the intended use of the lan-
guage. Comparisons can be found in [21] or [3].
Guideline 5: “Compose existing languages where possible.”
The development of a new language and an accompanying
toolset is a labor-intensive task. However, it is often the
case that existing languages can be reused, sometimes even
without adaptation. A good example for language reuse
is OCL: it can be embedded in other languages in order
to define constraints on elements expressed in the hosting
language.
The most general and useful form of language reuse is thus
the unchanged embedding of an existing language into an-
other language. A more sophisticated approach is to have
predefined holes in a host language, such that the defini-
tion of a new language basically consists of a composition
of different languages. For textual languages this composi-
tional style of language definitions is well understood and
supported by sophisticated tools such as [11] which also as-
sists the composition of appropriate tools.
However, according to the seamlessness principle [14], the
concepts of the languages to be composed need to fit to-
gether. In the UML, the object oriented paradigm under-
lies both class diagrams and Statecharts which therefore fit
well together. Additionally, when composing languages care
must be exercised to avoid confusion: similar constructs with
different semantics should be avoided.
Guideline 6: “Reuse existing language definitions.” If the
language cannot be simply composed from some given lan-
guage parts, e.g., by language embedding as proposed in
guideline 5, it is still a good idea to reuse existing language
definitions as much as possible. In [18] more possible real-
ization strategies, such as language extension or language
specialization are analyzed. This means, taking the defini-
tion of a language as a starter to develop a new one is better
than creating a language from scratch. Both the concrete
and the abstract syntax will benefit from this form of reuse.
The new language then might retain a look-and-feel of the
original, thus allowing the user to easily identify familiar
notations. Looking at the abstract syntax of existing lan-
guages, one can identify “language pattern” (quite similar
to design pattern), which are good guidelines for language
design. For example, expressions, primary expressions, or
statements have quite a common pattern in all languages.
Only if there is no existing language/notation or the disad-
vantages do not allow using the strategies mentioned above,
a standalone realization should be considered. The websites
of parser generators like Antlr [1] or Atlantic Zoo [19] are a
good starting point for reusing language definitions.
Guideline 7: “Reuse existing type systems.” A DSL used
for software development often comprises and even extends
either a property language such as OCL or an implementa-
tion language such as Java. As described in [8], the design
of a type system for such a language is one of the hardest
tasks because of the complex correlations of name spaces,
generic types, type conversions, and polymorphism.
Furthermore, an unconventional type system would be
hard for users to adopt as well. Therefore, a language de-
signer should reuse existing type systems to improve com-
prehensibility and to avoid errors that are caused by misin-
terpretations in an implementation. Furthermore, it is far
more economical to use an existing type system, than devel-
oping a new one as this is a labor intensive and error-prone
task. A well-documented object-oriented type system can
be tailored to the needs of the DSL or even an implemented
reusable type system can be used (e.g. [4]).
2.3 Language Content
One main activity in language development is the task of
defining the different elements of the language. Obviously,
we cannot define in general which elements should be part
of a language as this typically depends on the intended use.
However, the decisions can be guided by some basic hints
we propose in this Section.
Guideline 8: “Reflect only the necessary domain concepts.”
Any language shall capture a certain set of domain artifacts.
These domain artifacts and their essential properties need
to be reflected appropriately in the language in a way that
the language user is able to express all necessary domain
concepts. To ensure this, it is helpful to define a few models
early to show how such a reflection would look like. These
models are a good basis for feedback from domain experts
which helps the developer to validate the language definition
against the domain. However, when designing a language
not all domain concepts need to be reflected, but only those
that contribute to the tasks the language shall be used for.
Guideline 9: “Keep it simple.” Simplicity is a well known
criterion which enhances the understandability of a language
[8, 14, 22]. The demand for simplicity has several rea-
sons. First, introducing a new language in a domain pro-
duces work in developing new tools and adapting existing
processes. If the language itself is complex, it is usually
harder to understand and thus raises the barrier of intro-
ducing the language. Second, even when such a language is
successfully introduced in a domain, unnecessary complexity
still minimizes the benefit the language should have yielded.
Therefore, simplicity is one of the main targets in designing
languages. The following more detailed Guidelines 10, 11,
and 12 will show how to achieve simplicity.
Guideline 10: “Avoid unnecessary generality.” Usually, a
domain has a finite collection of concepts that should be
reflected in the language design. Statements like “maybe
we can generalize or parameterize this concept for future
changes in the domain” should be avoided as they unneces-
sarily complicate the language and hinder a quick and suc-
cessful introduction of the DSL in the domain. Therefore,
this guideline can also be defined as “design only what is
necessary”.
Guideline 11: “Limit the number of language elements.” A
language which has several hundreds of elements is obviously
hard to understand. One approach to limit the number of
elements in a language for complex domains is to design
sublanguages which cover different aspects of the systems.
This concept is, e.g., employed by the UML: different kinds
of diagrams are used for special purposes such as structure,
behavior, or deployment. Each of them has its own notation
with a limited number of concepts.
A further possibility to limit the number of elements of
a language is to use libraries that contain more elaborated
concepts based on the concepts of the basic language and
that can be reused in other models. Elements which were
previously defined as part of the language itself can then
be moved to a model in the library (compare, e.g., I/O in
Pascal vs. C++). Furthermore, users can extend a library
by their own definitions and thus, can add more and more
functionality without changing the language structure itself.
Therefore, introducing a library leads to a flexible, extensi-
ble, and extensive language that nevertheless is kept simple.
On the other hand, a language capable of library import
and definition of those elements must have a number of ap-
propriate concepts embedded to enable this (e.g., method
and class definitions, modularity, interfaces - whatever this
means in the DSL under construction). This principle has
successfully been applied in GPL design where the languages
are usually small compared to their huge standard libraries.
Guideline 12: “Avoid conceptual redundancy.” Redun-
dancy is a constant source of problems. Having several con-
cepts at hand to describe the same fact allows users to model
it differently. The case of conceptual richness in C++ shows
that coding guidelines then usually forbid a number of con-
cepts. E.g., the concept of classes and structs is nearly iden-
tical, the main difference is the default access of members
which is public for structs and private for classes. There-
fore, classes and structs can be used interchangeably within
C++ whereas the slight difference might be easily forgot-
ten. So, it should be generally avoided to add redundant
concepts to a language.
Guideline 13: “Avoid inefficient language elements.” One
main target of domain specific modeling is to raise the level
of abstraction. Therefore, the main artifacts users deal with
are the input models and not the generated code. On the
other hand, the generated code is necessary to run the final
system and more important, the generated code determines
significant properties of the system such as efficiency. Hence,
the language developer should try to generate efficient code.
Furthermore, efficiency of a model should be transparent
to the language user and therefore should only depend on
the model itself and not on specific elements used within
the model. Elements which would lead to inefficient code
should be avoided already during language design so that
only the language user is able to introduce inefficiency [8].
For example, in Java there is no operator to get all instances
of one class as this would increase memory usage and oper-
ating time significantly. However, this functionality can be
implemented by a Java user if needed.
2.4 Concrete Syntax
Concrete syntax has to be chosen well in order to have an
understandable, well structured language. Thus, we con-
centrate on the concrete syntax first and will deal with the
abstract syntax later.
Guideline 14: “Adopt existing notations domain experts
use.” As [20] says, it is generally useful to adopt what-
ever formal notation the domain experts already have, rather
than inventing a new one.
Computer experts and especially language designers are
usually very practiced in learning new languages. On the
contrary, domain experts often use a language for a longer
time and do not want to learn a new concrete syntax es-
pecially when they already have a notation for a certain
problem. As already mentioned, it is often the case that the
introduction of a DSL makes new tools and modified pro-
cesses necessary. Inventing a new concrete syntax for given
concepts would raise the barrier for domain experts. Thus,
existing notations should be adopted as much as possible.
E.g., queries within the database domain should be defined
with SQL instead of inventing a new query language. Even
if queries are only part of a new language to be defined SQL
could be embedded within the new language.
In case a suitable notation does not already exist, the new
language should be adopted as close as possible to other
existing notations within the domain or to other common
used languages. A good example for commonly accepted
languages are mathematical notations like arithmetical ex-
pressions [8].
Guideline 15: “Use descriptive notations.” A descriptive
notation supports both learnability and comprehensibility
of a language especially when reusing frequently-used terms
and symbols of domain or general knowledge. To avoid mis-
interpretation it is highly important to maintain the seman-
tics of these reused elements. For instance, the sign “+”
usually stands for addition or at least something seman-
tically similar to that whereas commas or semicolons are
interpreted as separators. This applies to keywords with
a widely-accepted meaning as well. Furthermore, keywords
should be easily identifiable. It is helpful to restrict the num-
ber of keywords to a few memorizable ones and of course, to
have a keyword-sensitive editor.
A good example for a descriptive notation is the way how
special character like Greek letters are expressed in Latex.
Instead of using a Unicode-notation each letter can be ex-
pressed by its name (alpha for α, beta for β, and so on).
Guideline 16: “Make elements distinguishable.” Easily dis-
tinguishable representations of language elements are a ba-
sic requirement to support understandability. In graphical
DSLs, different model elements should have representations
that exhibit enough syntactic differences to be easily dis-
tinguishable. Different colors as the only criteria may be
counterproductive, e.g., when printed in black and white. In
textual languages usually keywords are used in order to sep-
arate kinds of elements. These keywords have to be placed
in appropriate positions of the concrete syntax, as other-
wise readers need to start backtracking when “parsing” the
text [8, 22]. The absence of keywords is often based on effi-
ciency for the writer. But this is a very weak reason because
models are much more often read than written and therefore
to be designed from a readers point of view.
Guideline 17: “Use syntactic sugar appropriately.” Lan-
guages typically offer syntactic sugar, i.e., elements which do
not contribute to the expressiveness of the language. Syn-
tactic sugar mainly serves to improve readability, but to
some extent also helps the parser to parse effectively. Key-
words chosen wisely help to make text readable. Generally,
if an efficient parser cannot be implemented, the language
is probably also hard to understand for human readers.
However, an overuse of the addition of syntactic sugar dis-
tracts, because verbosity hinders to see the important con-
tent directly. Furthermore, it should be kept in mind that
several forms of syntactic sugar for one concept may hinder
communication as different persons might prefer different
elements for expressing the same idea.
Nevertheless the introduction of syntactic sugar can also
improve a language, e.g., the enhanced for-statement in Java
5 is widely accepted although it is conceptually redundant to
a common for-statement. This is a conflict to guideline 12,
but the frequency of occurrence of common for-statements in
Java legitimates a more effective alternative of this notation.
Guideline 18: “Permit comments.” Comments on model
elements are essential for explaining design decisions made
for other developers. This makes models more understand-
able and simplifies or even enables collaborative work. So
a widely accepted standard form of grouped comments, like
/* ... */, and line comments, like // ... for textual
languages or text boxes and tooltips for graphical languages
should be embedded.
Furthermore, specially structured comments can be used
for further documentation purposes as generating HTML-
pages like Javadoc. In [8] it is mentioned that the “purpose
of a programming language is to assist in the documenta-
tion of programs”. Therefore we recommend that every DSL
should allow a user to generally comment at various parts
of the model. If desired, the language may even contain the
definition of a comment structure directly, thus enforcing a
certain style of documentation.
Guideline 19: “Provide organizational structures for mod-
els.” Especially for complex systems the separation of mod-
els in separate artifacts (files) is inevitable but often not
enough as the number of files would lead to an overflowed
model directory. Therefore, it is desirable to allow users
to arrange their models in hierarchies, e.g., using a pack-
age mechanism similar to Java and store them in various
directories.
As a consequence, the language should provide concepts
to define references between different files. Most commonly
“import” is used to refer to another name space. Imports
make elements defined in other DSL artifacts visible, while
direct references to elements in other files usually are ex-
pressed by qualified names like “package.File.name”. Some-
times one form of import isn’t enough and various relations
apply which have to be reflected in the concrete syntax of
the language.
Guideline 20: “Balance compactness and comprehensibil-
ity.” As stated above, usually a document is written only
once but read many times. Therefore, the comprehensibility
of a notation is very important, without too much verbosity.
On the other hand, the compactness of a language is still
a worthwhile and important target in order to achieve ef-
fectiveness and productivity while writing in the language.
Hence a short notation is more preferable for frequently used
elements rather than for rarely used elements.
Guideline 21: “Use the same style everywhere.” DSLs are
typically developed for a clearly defined task or viewpoint.
Therefore, it is often necessary to use several languages to
specify all aspects of a system. In order to increase under-
standability the same look-and-feel should be used for all
sublanguages and especially for the elements within a lan-
guage. In this way the user can obtain some kind of intuition
for a new language due to his knowledge of other ones. For
instance, it is hardly intuitive if curly braces are used for
combining elements in one language and parentheses in an-
other. Additionally, a general style can also assist the user in
identifying language elements, e.g., if every keyword consists
of one word and is written in lower case letters.
A conflicting example is the embedment of OCL. One the
one hand it is possible to adapt the OCL syntax to the
enclosing language to provide the same syntactic style in
both languages. On the other hand different OCL styles
impede the comprehensibility of OCL, what endorses the
use of a standard OCL syntax.
Guideline 22: “Identify usage conventions.” Preferably
not every single aspect should be defined within the language
definition itself to keep it simple and comprehensible (see
guideline 11). Furthermore, besides syntactic correctness it
is too rigid to enforce a certain layout directly by the tools.
Instead, usage conventions can be used which describe more
detailed regulations that can, but need not be enforced.
In general, usage conventions can be used to raise the level
of comprehensibility and maintainability of a language. The
decision, whether something goes as a usage convention or
within a language definition is not always clear. So, usage
conventions must be defined in parallel to the concrete syn-
tax of the language itself. Typical usage conventions include
notation of identifiers (uppercase/lowercase), order of ele-
ments (e.g. attributes before methods), or extent and form
of comments. A good example for code conventions for a
programming language can be found in [9].
2.5 Abstract Syntax
Guideline 23: “Align abstract and concrete syntax.” Given
the concrete syntax, the abstract syntax and especially its
structure should follow closely to the concrete syntax to ease
automated processing, internal transformations and also pre-
sentation (pretty printing) of the model.
In order to align abstract and concrete syntax three main
principles apply: First, elements that differ in the concrete
syntax need to have different abstract notations. Second,
elements that have a similar meaning can be internally rep-
resented by reusing concepts of the abstract syntax (usually
through subclassing). This is more a semantics-based deci-
sion than a structurally based decision. Third, the abstract
notation should not depend on the context an element is
used in but only on the element itself. A pretty bad exam-
ple for context-dependent notations is the use of “=” as as-
signment in OCL-statements (let-construct) and as equality
in OCL-expressions. Here, the semantics obviously differs
whilst the syntax is equal.
Furthermore, the use of a transformation engine usually
also requires an understanding of the internal structure of a
language, which is related to the abstract syntax. Therefore,
the user to some extent is exposed to the internal structure
of the language and hence needs an alignment between his
concrete representations and the abstract syntax, where the
transformations operate on.
Alignment of both versions of syntax and the seamlessness
principle discussed in [14] assures that it is possible to map
abstractions from a problem space to concrete realizations
in the solution space. For a domain specific language the
domain is then reflected as directly as possible without much
bias, e.g., of implementation or executability considerations.
Guideline 24: “Prefer layout which does not affect trans-
lation from concrete to abstract syntax.” A good layout of
a model can be used to simplify the understanding for a hu-
man reader and is often used to structure the model. Nev-
ertheless, a layout should be preferred which does not have
any impact on the meaning of the model, and thus, does not
affect the translation of the concrete to the abstract syntax
and the semantics. As an example, this is the case for com-
puter languages where the program structure is achieved by
indentation. From a practical point of view, line separators,
tabs, and spaces are often treated differently depending on
editors and platforms and are usually difficult to distinguish
by a human reader. If these elements gain a meaning, de-
velopers have to be much more cautious and a collaborative
development requires more effort. For graphical languages
a well-known bad example is the twelve o’clock semantics in
Stateflow [7] where the order of the placement of transitions
can change the behavior of the Statechart. To simplify the
usage of DSLs, we recommend that the layout of programs
doesn’t affect their semantics.
Guideline 25: “Enable modularity.” Nowadays, systems
are very complex and thus, hard to understand in their en-
tirety. One main technique to tackle complexity is modu-
larization [15] which leads to a managerial, flexible, compre-
hensible, and understandable infrastructure. Furthermore,
modularization is a prerequisite for incremental code gener-
ation which in turn can lead to a significant improvement
of productivity. Therefore, the language should provide a
means to decompose systems into small pieces that can be
separately defined by the language users, e.g., by providing
language elements which can be used in order to reference
artifacts in other files.
Guideline 26: “Introduce interfaces.” Interfaces in pro-
gramming languages provide means for a modular develop-
ment of parts of the system. This is especially important
for complex systems as developers may define interfaces be-
tween their parts to be able to exchange one implementa-
tion of an interface with another which significantly increases
flexibility. Furthermore, the introduction of interfaces is a
common technique for information hiding: developers are
able to change parts of their models and can be sure that
these changes do not affect other parts of the system when
the interface does not change. Therefore, we recommend
that a DSL should provide an interface concept similar to
the interfaces of known programming languages.
One example of interfaces are visibility modifiers in Java.
They provide a means to restrict the access to members in
a simple way. Another common example are ports, e.g., in
composite structure diagrams, which explicitly define inter-
action points and specify services they provide or need, thus
declaring a more detailed interface of a part of a system.
3. DISCUSSION
In the previous sections we introduced and categorized a
bundle of guidelines dedicated to different language artifacts
and development phases. Some of them already contained
notes on relationships with other guidelines and trade-offs
between them, and some of them briefly discussed their im-
portance in different project settings. However, the follow-
ing more detailed discussion shall help to identify possible
conflicting guidelines and their reasons and gives hints on
decision criteria.
The most contradicting point is reuse of existing artifacts
versus the implementation of a language from scratch (cf.
No. 5, 6, and 7). The main reason for the reuse of a lan-
guage or a type system is that it can significantly decrease
development time. Furthermore, existing languages often
provide at least an initial level of quality. Thus, some of the
guidelines, e.g., guidelines which target at consistency (e.g.,
No. 21) or claim modularity (e.g., No. 25), are met auto-
matically. However, reusing existing languages can hinder
flexibility and agility as an adaption may be hard to realize
if not impossible. The same ideas apply to an improvement
of the reused language itself (e.g., to meet guidelines which
were not respected by the original language): the implemen-
tation of a single guideline may require a significant change
of the language. Another important point is that this ap-
proach may influence the satisfiability of other guidelines.
One example is No. 14 which suggests the reuse of exist-
ing notations of the domain. In case there are no languages
which are similar to these notations, this guideline and lan-
guage reuse are obviously contradicting. Furthermore, com-
bining several existing languages may introduce conceptual
inconsistencies, such as different styles or different underly-
ing type systems which have to be translated into each other
(cf., No. 5).
Implementing a new language from scratch in turn permits
a high degree of freedom, agility, and flexibility. In this
case, some guidelines can be realized more easily than in the
case of reuse. However, these advantages are not for free:
designing concrete and abstract syntax, context conditions,
and a type system are time- and cost-intensive task. To
summarize, a decision whether to reuse existing languages
or to implement a new one is one of the most important and
critical decisions to be made.
Another important point which was already mentioned
in the introduction is that some of the presented guidelines
have to be weighted according to the project settings, to the
form of use, etc. One example is the expected size of the
languages instances. Some DSLs serve as configuration lan-
guages and thus, typical instances consist of a small amount
of lines only. Other DSLs are used to describe complex sys-
tems leading to huge instances. In the former case guidelines
which target at compositionality or claim references between
files (e.g., No. 19 and 25) have nearly no validity whereas in
the latter example these guidelines are of high importance.
However, not only the expected size of the instances can in-
fluence the weight of guidelines. Another important aspect
is the intended usage of the language. Sometimes DSLs are
not executable; they are designed for documentation only.
In these cases, the guideline which demands to avoid inef-
ficient elements in the language (No. 13) is of course not
meaningful. However, for languages which are translated
into running code, this is of high importance.
A last point we want to discuss here are the costs induced
by applying the guidelines. Some of them can be imple-
mented easily and straightforward (e.g., distinguishability
of elements or permitting comments, No. 16 and 18) whilst
others require a significant amount of work (e.g., introduc-
tion of references between files including appropriate reso-
lution mechanisms and symbol tables, No. 19). Of course,
especially guidelines whose implementation is cost intensive
have to be matched against project settings as described
above. For small DSLs such guidelines should be ignored in-
stead as the cost will often not amortize the improvements.
However, from our experiences DSLs are often subject to
changes. While growing these guidelines become more and
more important. The main problem which emerges in these
cases is that adding new things to a grown language (e.g.,
modularity) is typically more difficult and time-consuming
than it would have been at the beginning. Therefore, ana-
lyzing the domain and usage scenarios as described in Guide-
lines 1 and 2 can prevent those unnecessary costs.
4. CONCLUSION
In this paper 26 guidelines have been discussed that should
be considered while developing domain specific languages.
To our experience this set of guidelines is a good basis for
developing a language. For space reasons, we restricted our-
selves to guidelines for designing the language itself. Other
guidelines are needed for successfully integrating DSLs in
a software development process, deploying it to new users,
and evolving the syntax and existing models in a coherent
way.
In general, a guideline should not be followed closely, but
many of them are proposals as to what a language designer
should consider during development. Some of the guidelines
have to be discussed in certain domains, because they might
not have the same relevance and as discussed many guide-
lines contradict each other and the language developer has
to balance them appropriately.
But generally, the consideration of explicitly formulated
guidelines is improving language design. We also think that
it is worthwhile to develop much more detailed sets of con-
crete instructions for particular DSLs. We currently focus
on textual languages in the spirit of Java.
Although we have compiled this list from literature and
our own experience, we are sure that this list is not com-
plete and has to be extended constantly. In addition, guide-
lines might change during time as developers gather more
experience, tools become more elaborate, and taste changes.
Maybe some guidelines are not relevant anymore in a few
years, as some guidelines from the 1970’s are less important
today.
Acknowledgment: The work presented in this paper is
partly undertaken in the MODELPLEX project. MOD-
ELPLEX is a project co-funded by the European Commis-
sion under the “Information Society Technologies”Sixth Frame-
work Programme (2002-2006). Information included in this
document reflects only the authors’ views. The European
Community is not liable for any use that may be made of
the information contained herein.
5. REFERENCES
[1] Antlr Website www.antlr.org.
[2] GME Website
http://www.isis.vanderbilt.edu/projects/gme/.
[3] T. Goldschmidt, S. Becker, and A. Uhl. Classification
of concrete textual syntax mapping approaches. In
ECMDA-FA, pages 169–184, 2008.
[4] J. Gough. Compiling for the .NET Common Language
Runtime (CLR). Prentice Hall, November 2001.
[5] H. Grönniger, J. Hartmann, H. Krahn, S. Kriebel, and
B. Rumpe. View-based modeling of function nets. In
Proceedings of the Object-oriented Modelling of
Embedded Real-Time Systems (OMER4) Workshop,
Paderborn,, October 2007.
[6] H. Grönniger, H. Krahn, B. Rumpe, M. Schindler, and
S. Völkel. Textbased Modeling. In 4th International
Workshop on Software Language Engineering, 2007.
[7] G. Hamon and J. Rushby. An operational semantics
for stateflow. In Fundamental Approaches to Software
Engineering: 7th International Conference (FASE),
volume 2984 of Lecture Notes in Computer Science,
pages 229–243, Barcelona, Spain, March 2004.
Springer-Verlag.
[8] C. A. R. Hoare. Hints on programming language
design. Technical report, Stanford University,
Stanford, CA, USA, 1973.
[9] Java Code Conventions
http://java.sun.com/docs/codeconv/.
[10] S. Kelly and J.-P. Tolvanen. Domain-Specific
Modeling: Enabling Full Code Generation. Wiley,
2008.
[11] H. Krahn, B. Rumpe, and S. Völkel. Monticore:
Modular development of textual domain specific
languages. In Proceedings of Tools Europe, 2008.
[12] M. Mernik, J. Heering, and A. M. Sloane. When and
how to develop domain-specific languages. Technical
Report SEN-E0309, Centrum voor Wiskunde en
Informatica, Amsterdam, 2005.
[13] MontiCore Website http://www.monticore.de.
[14] R. Paige, J. Ostroff, and P. Brooke. Principles for
Modeling Language Design. Technical Report
CS-1999-08, York University, December 1999.
[15] D. L. Parnas. On the criteria to be used in
decomposing systems into modules. Commun. ACM,
15(12):1053–1058, 1972.
[16] P. Pfahler and U. Kastens. Language Design and
Implementation by Selection. In Proc. 1st
ACM-SIGPLAN Workshop on
Domain-Specific-Languages, DSL ’97, pages 97–108,
Paris, France, January 1997. Technical Report,
University of Illinois at Urbana-Champaign.
[17] B. Rumpe. Modellierung mit UML. Springer, Berlin,
May 2004.
[18] D. Spinellis. Notable Design Patterns for Domain
Specific Languages. Journal of Systems and Software,
56(1):91–99, Feb. 2001.
[19] The Atlantic Zoo Website
http://www.eclipse.org/gmt/am3/zoos/atlanticZoo/.
[20] D. Wile. Lessons learned from real DSL experiments.
Science of Computer Programming, 51(3):265–290,
June 2004.
[21] D. S. Wile. Supporting the DSL Spectrum. Computing
and Information Technology, 4:263–287, 2001.
[22] N. Wirth. On the Design of Programming Languages.
In IFIP Congress, pages 386–393, 1974.
Evolving Domain Specific Languages for Project Summary.pdf
Evolving Domain Specific Languages
Project Summary
We believe that the use of Domain Specific Languages (DSLs)
can significantly improve the ability of
domain experts to use computers effectively in their work.
However the cost of developing and deploying
DSLs is significant, and this has severely limited their
deployment.
Intellectual Merit: Like other software, DSLs have a well-
defined life-cycle. SQL is a “mature” DSL: its
design is specified by international standards, and its stand-
alone implementations are carefully crafted by
hand to be stable, robust and to operate with high performance
in specific execution environments. The
DSLs that we have designed are experimental: their designs
vary week-by-week as researchers try out new
ideas, their implementations must be malleable, and efficiency
and the quality of the error messages are less
important. The simplest way to implement an experimental DSL
is by embedding it as a library of functions
within an existing programming language. These two
approaches have both advantages and disadvantages,
and the point in the life cycle of a particular DSL seems to
favor one approach over another.
The life cycle of a DSL has three stages, and the needs of the
three stages vary. By recognizing
this fact, we can design processes and tools that can
dramatically change the balance point between the
two choices, thus allowing the cheaper embedded approach to
be viable longer. In addition, our research
in meta-programming systems seems highly applicable to the
opposite approach of making stand-alone
implementations cheaper to build, so that they can be applicable
earlier in the lifecycle. Finally, experience
with some of the tools we have constructed in our meta-
programming research suggest that stand-alone
implementations can be extracted from embedded
implementations in a semi automated way, further blending
the distinction between the two approaches. This bodes well for
lessening the cost of deploying a DSL since
its suggests several strategies for evolving an embedded
implementation to a stand-alone one in a cost effective
way. The proposed research would investigate applying meta-
programming tools to DSL evolution, in several
different dimensions:
• Extending the capabilities of host languages for embedded
implementations.
• Exploring automated techniques for converting embedded
implementations to stand-alone implemen-
tations.
• Automating the generation of stand-alone implementations.
Broader Impact: The opportunity costs of being unable to utilize
computers effectively has a large cost:
projects that are never attempted, or whose scope is
significantly reduced, because the cost of developing
software is too high to contemplate, or too complex to write.
DSLs can play a large role in mitigating these
costs, but only if the costs of DSL implementation can be
lowered. The research proposed here will make
significant advances in this direction.
As a graduate-only institution, OGI is in a unique position to
promote and advance the introduction of
new ideas into the work force. Our students are by and large
working students, uniquely motivated by the
problems that they have experienced. Our close connections
with industry means that our students acutely
aware of the problems of constructing complex software. OGI
has a tradition teaching research oriented
topics in advance graduate level courses, and in building and
maintaining widely distributed tools.
1
1 Overview
Computers are useful only if there are people to program them.
In an ideal world, the end-user, who is
an expert in the domain of the problem that needs to be solved,
would be able to directly command the
computer to do what is required. Unfortunately, even the best of
modern general-purpose programming
languages can be used effectively only by experts in computing.
As computers become ever more pervasive,
and the problem domains in which they are applied ever more
diverse, the proportion of users who have
sufficient computing knowledge dwindles.
There is growing recognition that domain-specific languages
(DSLs) are an effective way for end-users
to program computers without becoming software engineers.
Although DSLs are attractive, they can be
difficult and expensive to design and implement. DSL creators
must combine expert knowledge of the
domain, programming skills necessary for building solutions,
ability to make appropriate generalizations
over the domain, and programming language design and
implementation skills. This makes the current cost
of introducing a new DSL prohibitively high. We propose two
interrelated research paths that we believe
can dramatically lower this cost.
First, at OGI, we have studied scores of DSLs, and we have
designed and developed a number of our
own. We have discovered that DSL’s (like many other systems)
have a life cycle. This life cycle has three
stages, and the needs of the three stages vary. By recognizing
this fact, we can design processes and tools
that can dramatically increase reuse amongst DSL
implementations, and ease the transition of a DSL from
earlier stages to later stages.
Second, our study of typeful meta-programming systems has
helped us design a new generation of tools
for manipulating programs as data in a way that is aware of the
semantic properties of the programs being
manipulated. Applying these tools to DSL design and
implementation should significantly ease the burden
of the DSL implementer. The aims of the research proposed
include:
• Lowering the cost of DSL development. DSL’s are effective
only if they exist. The current cost
for developing a DSL can be quite high. Lowering the cost for
deploying a new DSL could be an
effective means of increasing their use.
• Apply meta-programming techniques to DSL development.
Much of our experience in building
meta-programming systems is directly applicable to DSL
design. We’d like to reuse this work in a new
and exciting domain. Many of the tools we have built, such as
automatic binding time analyses as a
way of automatically staging programs, and the meta-
programming language Ωmega seem applicable
to the process of evolving DSL’s.
• Develop instructional materials on DSL development. Without
making significant efforts to
educate others on the building of DSL implementations their
construction will remain out of reach to
most software developers.
2 Introduction to DSLs
Domain-Specific Languages (DSLs) are a promising approach to
making computers accessible to end-users
without the need for conventional programming. Perhaps the
most widely used DSL is the formula language
used inside Microsoft Excel to define how the content of a cell
on a spreadsheet is derived from other cells.
Here, the domain is calculation; the particular problems solved
are expressing that one cell should be the
sum, or the average, of some other cells. People who write such
formulae think of themselves as solving
problems in finance, or surveying, or engineering; we know that
they are in fact writing computer programs,
but the power of DSLs is that from the point of view of the
domain expert, the program has disappeared.
Other widely used DSLs include SQL, a DSL for extracting
relevant information from a relational
database without explicitly specifying the physical layout or
access path for the data; Yacc, a DSL for
1
describing parsers by the grammar to be parsed; and HTML, a
DSL for defining the appearance of a
document without specifying explicit layout algorithms. As
computers are applied to more and more aspects
of our society, we believe that the ability of such DSLs to
“make the program disappear” will be vital. DSLs
provide many advantages over general-purpose languages. To
illustrate this consider the query language
SQL.
• A SQL program provides a common language that allows
different database users to share precisely
the description of the data they are interested in.
• A SQL program relieves the programmer from needing to
know about various aspects of the database,
such as the file structures that store the persistent data
(sequential file, or a balanced tree?), how to
open and process the different file formats (random or
sequential access?), how to design and manipulate
the ephemeral data structures needed to process the query (a
linked list, an array, a hash table?), and
what algorithm to use (a nested join, a merge join, or a hash
join?).
• A SQL program encodes constraints. It restricts the
complexity of the programs that the user can write.
A SQL program is guaranteed to terminate, and is bounded by
polynomial rather than exponential
time behavior.
• A SQL program can take advantage of domain specific
properties. All commercial SQL processors use
an extensive query-optimization phase that attempts to
aggressively minimize the resources used to
answer the query.
• A SQL program is a rapid prototype. A SQL query is easy to
write and lets the query writer experiment
with different solutions. If the results from a query do not
answer the question the query writer had
in mind, he is free to throw the original query away and try
another without excessive losses in time
or resources.
• A SQL query can integrate with a number of existing contexts
or environments. SQL is a front end to
a number of different query engines.
• A SQL query is an efficient division of labor. Programming
language experts designed the SQL lan-
guage, algorithm experts designed the algorithms used, database
experts designed the query optimizers,
and domain experts ask the queries.
• A SQL query adopts an existing notation (from relational
algebra) and allows database users to program
queries effectively with little or no programming experience.
Advantages like these are not unique to SQL, but are general
properties of all domain specific languages.
While providing significant advantages, the DSL approach
comes with a cost. Realizing a DSL requires both
a design and an implementation. Such implementations can be
large, expensive to produce, conceptually
complex, and hard to scale. An implementation for even a
simple language often does not scale as the
language evolves to meet newer demands. Because of this,
domain experts, and even most programmers, are
not comfortable taking on the tasks of language design and
implementation. Engineering issues, like these,
have kept DSLs from being widely as used as they ought to be.
2.1 Domain-Specific Language Evolution
Although past research has produced methodologies and tools
that aid in various stages of language design
and implementation process, the level of automation available
for building DSLs is low. Moreover, DSLs
are seldom static. Initially, the domain may not be well-
understood, so a period of experimental design is
required while ideas are developed. Constraints on the
implementation tend to become stronger over time,
as the experimental DSL is moved into production. In addition,
the problem domain itself can change over
the intended lifetime of the DSL. Hence, the design and
implementation of a DSL must be evolved over time.
2
DSL implementations are not only complex and difficult to
build, but they are also difficult to evolve.
Implementation techniques that are most cost-effective at early
stages in the DSL lifecycle may be less appro-
priate at later stages, and vice-versa. Embedding the DSL as a
library within an existing host programming
language may work well for an initial implementation. Syntactic
extensions, perhaps implemented with
macros or a pre-processor, might be added later. Such
implementations are quick and easier to build (than
non-embedded ones), but inherit all the limitations, as well as
the capabilities, of the host language and its
implementation. More mature versions of the DSL may benefit
from a stand-alone implementation, but this
is typically expensive to build, as it must handle all the tasks of
a conventional language translator, such
as parsing, semantic analysis, optimization, and code
generation. Current technologies provide no smooth
transition path from embedded to stand-alone implementations.
It is a goal of this proposal to identify and
smooth this transition from embedded DSL to stand-alone DSL.
2.2 Supporting the DSL Lifecycle
The life cycle of a DSL may be thought of as having three
stages, each calling for different implementation
approaches.
Infancy. In this stage, language design ideas are in flux and
domain abstractions are not well understood.
While some domains have existing conceptual frameworks and
notations that can be used for problem
specification, many domains must be analyzed and studied to
identify the key abstractions. At this stage,
modeling flexibility is at a premium, performance is less
important, and embedded implementations are
likely to be cost-effective.
In an embedded approach, each domain concept is realized
directly as a host-language construct: domain
operators are host-language procedures, domain types are host-
language user-defined data types, etc. Thus,
creating or modifying a DSL is relatively cheap, provided a
suitably powerful host language is used. Our
experience with scores of DSLs has shown that successful
embedding of a DSL is made much easier by the
use of higher-order functions. Thus, languages such as O’Caml,
Haskell, or Lisp have made suitable host
languages [16, 14, 12].
Embedding may be thought of as rapid prototyping. Even if the
domain ultimately requires generating
code for a specialized target environment, the embedded
implementation can be used for modeling and
simulation. Many language features needed by a typical DSL
(e.g., support for procedural abstraction) will
already exist in the host language; moreover, it is
straightforward to integrate code from multiple DSLs if
they share the same host implementation.
Adolescence. In this stage, domain abstractions stabilize and
become better-understood as the language
begins to be exercised by end-users. Implementation flexibility
is still necessary. Actual use uncovers the
limits of the implementation, which leads to new features,
abstractions, and implementation techniques.
In this stage users begin to develop domain-specific logics for
their DSLs. Such logics describe how the
features of the DSL interact. They allow the DSL programmers
to calculate equivalences, and to reason about
the behavior of each solution at a high level of abstraction.
Domain-specific logics also enable domain-specific
optimizations, and efficient implementations.
While embedded languages remain important at this stage, their
limitations begin to outweigh their
advantages. As more users exercise the system, the need for
specialized language syntax, type discipline,
and error reporting become more apparent. These are difficult
or impossible to provide in a conventional
host language. Because most host languages lack the ability to
reflect on their own code – i.e., to analyze
it as well as execute it – they make it very difficult to enhance
an embedded DSL implementation with
validation or optimization passes.
Few host language implementations provide much flexibility
about the target environment in which they
3
run, though this may be essential in domains involving
embedded systems or requiring close integration with
legacy code. Currently, these demands often force implementers
to switch to a stand-alone implementation
at this stage, but improvements to host languages (i.e. foreign
function interfaces) may enable embedded
implementation to remain a viable choice longer in the
adolescence stage.
Maturity. In this stage, domain abstractions are well-
understood, and the DSL design is fixed. Features
that enhance usability – such as type systems, debuggers, and
programming environments – become impor-
tant as the DSL gains adherents. Performance of the DSL
implementation itself may become a problem as
DSL program size and complexity increase.
A stand-alone implementation for a DSL can have its own
syntax and type system appropriate for just
that domain. The DSL can be “restricted” to enforce constraints
on what can be expressed, and it can
have its own optimizer that relies on domain-specific knowledge
so that performance bottlenecks can be
addressed. If investing in a stand-alone DSL implementation
ever makes sense, it will be at this phase, when
the language has become relatively stable and user tools are
important. Automated construction tools for
interpreters and compilers can make this process cheaper; while
many such tools exist, some important ones
are still missing. It is still desirable to re-use any existing
embedded or stand-alone implementation as far
as possible; again, improvements in embedded host language
design may help this goal.
2.3 Example: Maturation of the Hawk DSL
At OGI we have developed a number of DSLs, including MSL
[35, 13], a DSL for specifying radar message
formats and invariants; Stratego [33], a language for describing
transformations on computer programs;
and Hawk [16], a DSL for describing computer chip
microarchitecture. As an example of DSL maturation
we will consider the Hawk DSL, developed by Prof. John
Launchbury at OGI. Hawk is used to describe
computer microarchitectures at the level of abstraction used in
an architecture textbook, i.e., in terms of
register files, memories, delays, pipeline elements and reorder
buffers. A Hawk program specifies a particular
microarchitecture; it can be executed to simulate the
architecture; it can be transformed, say to increase
throughput, and it can be analyzed for various properties. Hawk
has been used to build simulators for several
pipelined architectures, as well as superscalar architectures.
The infant Hawk implementation was embedded in Haskell. An
embedded approach was used to help
determine the most appropriate abstractions. Hawk language
constructs describe circuits as a set of equations
describing data flow relationships. The equations were
implemented as higher-order functions in Haskell.
Through experimentation we found very successful abstractions
such as transactions and bypasses that
succinctly capture complex functionality at the level of detail
needed to understand microarchitectures.
As Hawk reached adolescence, we began to run into the
characteristic limitations of the embedded
approach.
• Syntax. We found a need for a language construct to express
mutual recursion at the monadic
level [10], something not available in Haskell. Such a construct
would have been very useful to describe
cyclic architectures with feedback loops using monadic
descriptions.
• Analysis. Because Hawk represents its abstractions as Haskell
functions (rather than data structures)
we found it impossible to implement some useful analyses on
Hawk programs. Performance analysis
of pipelines requires knowing the length of the pipeline, and
correctness analysis of feedback loops
requires knowing that every feedback loop is delayed at least
once by the use of a latch. Neither of
these things could be discovered in the embedded Hawk
implementation.
• Error messages. Error messages in Hawk are expressed in
terms of the Haskell implementation rather
than the abstractions they represent. This, coupled with
extensive use of Haskell’s class mechanism,
makes error messages from the type checker nearly
incomprehensible to end-users.
4
• Interoperation. Hawk could benefit by interoperation or
integration with existing compilations of
microarchitecture units, e.g., memories, written in other
languages such as VHDL, or by interoperating
with existing vector-based simulation tools. However, Haskell’s
interoperability facilities are limited.
• Performance. Although Hawk’s expressiveness and
verification ability are unparalleled, its simulation
performance (in comparison with competing simulation
technology such as SimpleScalar [1]) leaves
much to be desired.
A stand-alone implementation for Hawk could address these
problems directly, but a stand-alone im-
plementation also requires an order of magnitude more work.
Either we need richer host languages that
solve some of these problems, delaying the need for a stand-
alone implementation, or we need better tools
for building stand-alone implementations.
3 Tools for DSL Design – Typeful Meta-Programming Systems
Meta-programming systems can play an important role in both
building stand-alone implementations and
enriching host languages. Meta-programming is the study of
programs that generate and/or manipulate
other programs. While our work on meta-programming systems
was not explicitly motivated by the issue
of DSL design and evolution, meta-programming systems seem
to provide many of the capabilities that are
needed to implement DSLs in a direct and efficient manner.
Their ability to manipulate programs as data
objects also allows them to support the evolution of existing
embedded implementations into stand-alone
ones. Our interests in the area of meta-programming fall
roughly into two broad categories:
1. Homogeneous meta-programming. In homogeneous meta-
programming languages (MetaML [24, 32])
the object-language and the meta-language are the same. The
representation of programs (both syn-
tactic, and semantic) are chosen by the language designer and
cannot be observed by meta-programs.
This relieves the meta-programmer from the responsibility to
represent object-programs, and allows the
meta-language itself to ensure that important properties (such as
static typing) of object-languages are
automatically enforced in all meta-programs. Our work in
homogeneous meta-programming languages
was directed primarily at staged programming languages. Staged
programs proceed in stages. Each
stage “writes” a program that is executed in the next stage.
Staging can be exploited to build efficient
implementations whenever constants or program invariants only
become available after compile-time.
2. Heterogeneous meta-programming. In heterogeneous meta-
programming, the meta- and the object-
language are not (necessarily) the same. We are particularly
interested in open heterogeneous meta-
programming languages. Such languages equip the programmer
with the tools to specify both the
syntactic representation and semantics of arbitrary object-
languages. The ability of the meta-language
to ensure object-language types and properties in a
heterogeneous system is considerably complicated
by the difference between the meta- and object-languages. If the
type system of the object-language
differs significantly from the type system of the meta-language,
we cannot use the single type system
approach used in homogeneous meta-programming. Instead, we
must rely on an extension mechanism,
that allows the embedding of new (object-language) type
systems inside the type system of the meta-
language.
Our work on two previous NSF supported projects Type Safe
Program Generators (CCR-9625462,
10/96-03/00), and Heterogeneous Meta Programming Systems
(CCR-0098126 10/01-06/04) has given us the
opportunity to think deeply on the issues of language
representation and manipulation. We have reported
on the design[32], use[24, 25], semantics[30], type systems[3,
18, 29], implementation[17, 26, 4], and open
problems[27] of meta-programming systems. With the help of
his colleagues the proposer has built and
distributed two different meta-programming systems:
MetaML[24] and Template Haskell[26].
Recently, we completed the preliminary design for Ωmega, an
open heterogeneous meta-programming
language. Ωmega gives special support to the description of new
object-languages. One of the unique
5
features of Ωmega is its ability to define object-language
representations that use Ωmega’s meta-level types
to guarantee programmer-specified properties of object-
language syntax. Ωmega is also a homogeneous
meta-programming system that inherits MetaML’s ability to
write staged programs. This combination of
heterogeneous and homogeneous features provides the tools
needed for implementation and evolution of
domain specific languages.
4 Example: A language of geometric regions.
In this section, we shall apply Ωmega to a small example as a
way of presenting our ideas for evolving domain
specific languages. We define a small DSL that describes
geometric objects. This DSL has its own syntax
and semantics. It includes a simple type system in which the
type of an object specifies the measuring units
(i.e. inches, centimeters, pixels) used to describe it. The type
system ensures that as we combine objects,
units remain consistent. There are several important points that
this example makes and addresses.
• We shall illustrate the difference between an embedded and
stand-alone implementation. The embedded
DSL is implemented by combinators that directly manipulate the
denotations of the DSL expressions.
These combinators are typed in a way that forces the
programmer to combine them in a way that
preserves the typing of DSL expressions. This is an important
advantage of using the embedding
approach for many DSL’s.
• The embedded Region language is easy to implement but hard
to manipulate. A region program
is implemented as an Ωmega function (or closure), and its
internal structure cannot be observed.
This denies the DSL implementer the opportunity of writing
potentially useful meta-programs that
manipulate and optimize DSL programs.
• Representing a DSL program as a piece of data recovers the
ability to manipulate the program, but has
two drawbacks. First, we lose the ability to encode the type
system of the DSL in the type system of
the host language. Second, implementing the DSL in the host
language now introduces several layers
of interpretive overhead.
• A stand-alone implementation usually involves representing
the object-language as a data structure
and writing separate type checking and code generation
functionality. Code generation usually targets
a language different from the language used to build the stand-
alone implementation. This is done
mostly for efficiency or interoperability/compatibility reasons.
• Reusing the work invested in an embedded implementation in
a stand-alone implementation is hard
because of the vastly different strategies employed.
Ωmega has several unique features that we believe allows us to
overcome all of the weakness described
in the bullets. The example illustrates a potential transition
pathway, from an embedded implementation
of the DSL to a stand-alone implementation, with many reuse
possibilities. We shall also demonstrate and
emphasize the importance of a meta-language that can transform
the DSL expressions into another language
to support efficient and effective implementation, thus
automating some of the steps in the evolution.
4.1 The Region DSL
The Region DSL provides a notation to describe subsets of the
two-dimensional plane, which we shall call
Regions. Regions are specified in a planar coordinate system
using magnitudes that can be measured in
different kinds of units (centimeters, pixels, inches, and so on).
The Region language is typed. Its type
system is designed to prevent combining regions defined with
mismatched measuring units. Once defined, a
region can be drawn or displayed. The region language is very
simple, but rich enough that we can illustrate
the points made in the previous section.
6
u ∈ U ::= cm | pix units
d ∈ D ::= n u magnitudes
r ∈ R ::= univ | empty | circle d | rect d d | trans (d, d) r regions
| r1 ∪ r2 | r1 ∩ r2 | conv u1 u2 r
� n u : u
� empty : R u � univ : R u
� d : u
� (circle d) : R u
� d1 : u � d2 : u
� (rect d1 d2) : R u
� r1 : R � r2 : R u
� r1 ∪ r2 : R u
� r1 : R � r2 : R u
� r1 ∩ r2 : R u
� r : R u1
� (conv u1 u2 r) : R u2
� d1 : u � d2 : u � r : R u
� trans (d1, d2) r : R u
Figure 1: The language of geometric regions: syntax and type
system.
In Figure 1 we give the syntax and static semantics (typing
rules) for the Region DSL. There are three
syntactic categories: units, magnitudes (i.e. 10 cm, or 2 pix) and
regions. There are four primitive regions:
empty is the empty region; univ is the full plane; (circle d) is a
circular region with center at the origin of the
coordinate system and radius specified by d; (rect d1 d2) is a
rectangular region centered at the origin with
length d1 and height d2. Regions can be combined using “set-
operations:” union (r1 ∪ r2) and intersection
(r1 ∩ r2). Translation (trans (d1, d2) r) translates a region by an
amount specified by a pair of magnitudes
(d1, d2). Finally, a region of type u1 can be coerced to a region
of type u2 using the explicit conversion
expression (conv u1 u2 r).
An embedded region DSL. In Figure 2 we implement regions as
an embedded DSL using Ωmega as
the host language. The meaning of a region is a characteristic
function: for each coordinate in the plain,
the meaning function should answer whether that points lies
inside the region. In the embedding, regions
are defined in terms of host language functions (notice the type
synonym for Region as a function type).
Magnitudes are defined in terms of an algebraic datatype (Mag).
The embedded implementation is just a
series of definitions, one for each of the DSL’s operators.
This embedding uses the type system of the host language to
implement the type system of the Region
language. The types of the region combinators are
parameterized by the type of the unit used to construct
it. For example a region of type (Region Centimeter) must have
been constructed using centimeters as
units. This requires a certain amount of overloading in the
implementation. Consider the definition of the
basic region for a circle.
circle :: Mag u -> Region u
circle r =  x y -> leq (plus (square x)(square y)) (square r)
A point (x,y) lies within the circular region of radius r, if and
only if x2 + y2 ≤ r2. We need some way
of overloading leq, plus and square to work on different
representations of magnitudes. There are several
ways one could do this, but we choose to use two of Ωmega’s
unique features (kind extension and equality
qualified types) because we will find them useful later on.
We use kind extension to define a new kind Unit. Kinds[2, 11,
19] are similar to types in that, while
types classify values, kinds classify types. We indicate this by
the classifies relation (::). For example: 5 ::
Int :: *0 . We say 5 is classified by Int, and Int is classified by
*0 (star-zero). The kind *0 classifies
all the types that classify values that can be computed (like Int,
Char, pairs, functions etc). The kind
definition also introduces two new types Pixel and Centimeter
that are classified by Unit (Pixel::Unit,
Centimeter::Unit). We then define a new type constructor
Mag::Unit -> *0 which is parameterized by
types of kind Unit. This allows us to define data- constructor
functions Pix and Cm that inject Int values
into (Mag Pixel) values, and Float values into (Mag Centimeter)
values respectively. Consider the clause
7
-- Units and Magnitudes
kind Unit = Pixel | Centimeter
data Mag u = Pix Int where u = Pixel
| Cm Float where u = Centimeter
-- Regions as Characteristic Function
type Region u = Mag u -> Mag u -> Bool
-- basic regions
circle::Mag u -> Region u
circle r =
x y -> leq (plus (square x) (square y))
(square r)
rect::Mag u -> Mag u -> Region u
rect w h =
x y -> between (neg w) (plus x x) w &&
between (neg h) (plus y y) h
univ::Region u
univ = x y -> True
empty :: Region u
empty = x y -> False
-- region combinators
trans::(Mag u,Mag u) -> Region u -> Region u
trans (a,b) r =
x y -> r (minus x a) (minus y b)
convert::(Mag v -> Mag u)-> Region u -> Region v
convert t r = x y -> r (t x) (t y)
intersect::Region u -> Region u -> Region u
intersect r1 r2 = x y -> r1 x y && r2 x y
union::Region u -> Region u -> Region u
union r1 r2 =  x y -> r1 x y || r2 x y
plus :: Mag u -> Mag u -> Mag u
plus (Pix a) (Pix b) = Pix (intAdd a b)
plus (Cm a) (Cm b) = Cm (floatAdd a b)
Figure 2: An embedded implementation of regions with
combinators.
Pix Int where u = Pixel The use of equality qualification in the
where clause gives the constructor
function Pix the qualified type (forall (u:Unit).(u = Pixel)=>Int
-> Mag u). On one hand whenever
we apply the function Pix we are required to establish that
(u=Pixel). On the other hand, whenever we
pattern match against a value constructed by (Pix n) we can
assume that (u=Pixel). These two features
allow us to define type constructors whose type parameters
encode information about the internal structure
of the values they classify. For example, a value of type (Mag
Pixel) must have been constructed by the
Pix constructor function. This ability will play an important role
later on in the example.
We can now see how leq, plus and square are overloaded.
Consider the first clause of the definition of
plus in Figure 2. To type check this clause we must be able to
derive:
{Pix a :: Mag u, Pix b :: Mag u, } � (Pix(intAdd a b)) :: Mag u.
Because Pix has an equality qualified
type Pix::(u=Pixel)=>Int -> Mag u we can derive a type for a
and b plus an equality between u and
Pixel. {(u = Pixel), a :: Int, b :: Int, } � (Pix(intAdd a b)) ::
Mag u. We can derive the type Mag u for
the expression (Pix (intAdd a b)) provided (intAdd a b)::Int and
we can show (u=Pixel), but this is
precisely the guarantee we obtained from the pattern (Pix a), so
we can derive the required type.
An important feature of this embedding is that the type-system
of the host language ensures type-correct
combination of regions, i.e., it disallows the mixing of units of
magnitude that the type system of the regions
language is designed to prevent. For example, the expression
union (circle (Cm 2)) (circle (Pix 4))
is simply rejected by Ωmega’s type checker as ill-typed.
Syntax as data. The essential feature of the Region embedding
is that it directly manipulates the
denotations of regions as function-values. The disadvantage of
this kind of embedding is that Region programs
cannot be syntactically analyzed since they are just functions.
We cannot even write a pretty-printer for
Region expressions. In this section we present an approach to
implementing the Regions DSL that will allow
us to ensure that only well-typed regions are combined, and still
let us analyze Region expressions’ syntax.
Usually, object-languages such as Region expressions are
represented as algebraic data-types. However,
a näıve data-type representation would not allow us to statically
enforce the invariant that only regions with
the same type of units are combined. In Figure 3, we define a
new type constructor RegExp which encodes
Region expressions. The important thing to note is that RegExp
is a type constructor whose argument type
8
data RegExp u
= Univ
| Empty
| Circle (Mag u)
| Rect (Mag u) (Mag u)
| Union (RegExp u) (RegExp u)
| Inter (RegExp u) (RegExp u)
| Trans (Mag u, Mag u) (RegExp u)
| forall v. Convert (CoordTrans v u) (RegExp v)
evalTrans :: CoordTrans u v -> Mag v -> Mag u
evalTrans CM2PX = pxTOcm
evalTrans PX2CM = cmTOpx
eval :: RegExp u -> Region u
eval Univ = univ
eval Empty = empty
eval (Circle r) = circle r
eval (Rect w h) = rect w h
eval (Union a b) = union (eval a) (eval b)
eval (Inter a b) = intersect (eval a) (eval b)
eval (Trans xy r) = trans xy (eval r)
eval (Convert trans r) =
convert (evalTrans trans) (eval r)
Figure 3: data structure embedding.
indicates the unit of magnitude for the region it represents. In
the previous section, we defined the Mag type
with equality qualified units because we wanted to parameterize
Region expressions by a unit type. As in
the embedded combination approach, this parameterized data
constructor approach also requires arguments
of constructor functions like Union, to have matching units. In a
similar fashion a syntactic representation
of unit transformers is also defined:
data CoordTrans u v = CM2PX where u = Centimeter, v = Pixel
| PX2CM where u = Pixel, v = Centimeter
This uses the equality qualified types in the same way they were
used in the type constructor Mag. The
where clauses force the constructors CM2PX and PX2CM to
have the type CoordTrans Centimeter Pixel
and CoordTrans Pixel Centimeter, respectively. The values of
CoordTrans are used to encode the Trans
regions: given a region of some unit type v, and a unit
transformer CoordTrans v u, we can obtain a region
of unit type u.
At first glance, the technique presented here does not seem
much different from a standard data-type
representation of syntax available in functional languages such
as ML or Haskell. This is because the type
system of the Regions DSL is quite primitive. Expressing the
typing invariants on the region expressions
is not very difficult. Embedding richer typing invariants in
algebraic syntax becomes quite hard without
using equality qualified types or other mechanisms. Using this
mechanism we have accumulated considerable
experience in expressing significantly more complex object-
language typing invariants (e.g., the simply typed
λ-calculus, security type systems[22, 34], modal types[8, 9],
and so on) [20].
The next step in developing a stand-alone implementation of the
Region DSL is to define a function
mapping Region expressions to their denotations (i.e., the
appropriate characteristic functions). The type
of eval is most interesting – it maps (RegExp u) to (Region u).
This type ensures that the meaning of
regions preserves magnitude unit types. The function evalTrans
maps a CoordTrans u v to a function that
converts between magnitudes of type u and v. Moreover, from
the definition of eval, we can see that each
clause of eval simply reuses the embedding combinators to give
meanings to region expressions. This style of
definition provides a syntactic and semantic connection between
embedded and stand-alone implementation
of regions.
Using domain specific optimizations. A benefit of representing
Region programs as data is the
ability to improve their efficiency by applying domain specific,
meaning-preserving transformations. Figure 4
shows several equivalences of Region programs that we can use
to improve their efficiency. It is a simple
step to code up the algebra of equivalences in Ωmega. Some
sample equivalences are found in Figure 5.
The function normalize transforms its argument by applying
these equivalences to regions. It does so
by replacing all of the region’s constructors with smart
constructors in a bottom-up fashion. Each smart
constructor applies some domain specific equivalences for that
constructor. The figure shows only one smart
constructor, but the others are similarly defined.
9
univ ∪ r = univ empty ∩ r = empty empty ∪ r = r = univ ∩ r (r1
∪ r2) ∩ (r1 ∪ r3) = r1 ∪ (r2 ∩ r3)
conv τ1 τ2 (conv τ2 τ1 r) = r conv τ1 τ2 (r1 ∪ r2) = conv τ1 τ2
r1 ∪ trans τ1 τ2 r2
trans δ (trans δ
′
r) = trans (δ + δ
′
) r trans δ (r1 ∪ r2) = trans δ r1 ∪ trans δ r2
Figure 4: Some algebraic identities on region expressions.
normalize :: RegExp u -> RegExp u
normalize (Union a b)
= mkUnion (normalize a) (normalize b)
normalize (Inter a b)
= mkInter (normalize a) (normalize b)
normalize (Trans xy r)
= mkTrans xy (normalize r)
normalize (Convert t r)
= mkConvert t (normalize r)
normalize t = t
mkUnion :: RegExp u -> RegExp u -> RegExp u
mkUnion a Univ = Univ
mkUnion Univ b = Univ
mkUnion a Empty = a
mkUnion Empty b = b
mkUnion a (Inter x y)
= Inter (mkUnion a x) (mkUnion a y)
mkUnion a b = Union a b
Figure 5: The expression normalize r puts r into a normal form.
Removing interpretive overhead. Using the syntax as data
approach injects a layer of interpretive
overhead into the implementation. Executing a Region program
now involves a call to the eval function. This
call must walk over the structure of the data representing the
Region program. In the embedded approach
this extra overhead was not present. To remove this
overhead[31] we employ the staging capabilities of
Ωmega.
There are four staging annotations in Ωmega. Bracket [| |],
escape $( ), lift (lift ) and run
(run ). Brackets introduce a new code template, and specify that
the expression inside the brackets should
be generated as a program for potential execution in the next
stage. Within brackets, escape specifies a
hole within a template. The escaped expression is executed
(resulting in a piece of code), and the resultant
code is spliced into that hole. The escape annotation may only
appear inside code brackets. Escape is the
mechanism used for building larger pieces of code from smaller
pieces. The lift annotation evaluates an
expression of primitive type to a literal value and builds some
trivial code that returns that value. The final
annotation is run. It evaluates its argument – the result should
then be a piece of code, and this code is
then run. The run annotation is how Ωmega transitions from one
stage to the next.
The staged version appears in Figure 6. The staged interpreter is
implemented by the function comp
which is very similar in structure to the unstaged interpreter
eval. The transition from eval to comp is
accomplished in a few simple steps:
1. Stage the Mag datatype so that it contains (Code Int) instead
of Int and (Code Float) instead of
Float. This allows us to know the units of a magnitude at
“compile time”, even though we will not
know its value until “run time”. The definition of CodeMag
accomplishes this staging.
2. Stage the primitive functions on Mag values to work with this
new type. We follow the convention that
the staged versions of helper functions are given the same name
but suffixed with an “S”. Each of these
staged functions is derived from the original in a systematic
way. For example, the pixel versions of
the original and staged version of the circle combinators are as
follows:
circle :: Mag u -> Region u
circle (Pix r) = (x y -> ((x * x) + (y * y)) <= (r * r))
circleS :: Mag u -> CodeMag u -> CodeMag u -> Code Bool
circleS (Pix r) =  (CodePx x) (CodePx y) -> [| (($x * $x) + ($y
* $y)) <= $(lift (r*r)) |]
10
-- staged magnitude datatype
data CodeMag u
= CodePix (Code Int) where u = Pixel
| CodeCm (Code Float) where u = Centimeter
-- compiled region type synonym
type CReg u =
CodeMag u -> CodeMag u -> Code Bool
-- helper function
compTrans :: CoordTrans u v ->
CodeMag v -> CodeMag u
compTrans CM2PX (CodePix i) =
CodeCm [| rem $i $(lift scale) |]
compTrans PX2CM (CodeCm i) =
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning
Survey of Domain-Specific Languages for Machine Learning

Mais conteúdo relacionado

Semelhante a Survey of Domain-Specific Languages for Machine Learning

[EN] PLC programs development guidelines
[EN] PLC programs development guidelines[EN] PLC programs development guidelines
[EN] PLC programs development guidelinesItris Automation Square
 
ModelTalk - When Everything is a Domain Specific Language
ModelTalk - When Everything is a Domain Specific LanguageModelTalk - When Everything is a Domain Specific Language
ModelTalk - When Everything is a Domain Specific LanguageAtzmon Hen-Tov
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endthkoch
 
Publicidad Por Internet
Publicidad Por InternetPublicidad Por Internet
Publicidad Por InternetCarlos Chacon
 
Build your own Language - Why and How?
Build your own Language - Why and How?Build your own Language - Why and How?
Build your own Language - Why and How?Markus Voelter
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Doppl Development Introduction
Doppl Development IntroductionDoppl Development Introduction
Doppl Development IntroductionDiego Perini
 
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...Precisely
 
Domain specific modelling (DSM)
Domain specific modelling (DSM)Domain specific modelling (DSM)
Domain specific modelling (DSM)PG Scholar
 
Ppdg Robust File Replication
Ppdg Robust File ReplicationPpdg Robust File Replication
Ppdg Robust File Replicationguru100
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
A novel data type architecture support for programming languages
A novel data type architecture support for programming languagesA novel data type architecture support for programming languages
A novel data type architecture support for programming languagesijpla
 
The Design, Evolution and Use of KernelF
The Design, Evolution and Use of KernelFThe Design, Evolution and Use of KernelF
The Design, Evolution and Use of KernelFMarkus Voelter
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilDatabricks
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET Journal
 

Semelhante a Survey of Domain-Specific Languages for Machine Learning (20)

[EN] PLC programs development guidelines
[EN] PLC programs development guidelines[EN] PLC programs development guidelines
[EN] PLC programs development guidelines
 
ModelTalk - When Everything is a Domain Specific Language
ModelTalk - When Everything is a Domain Specific LanguageModelTalk - When Everything is a Domain Specific Language
ModelTalk - When Everything is a Domain Specific Language
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
 
Publicidad Por Internet
Publicidad Por InternetPublicidad Por Internet
Publicidad Por Internet
 
Build your own Language - Why and How?
Build your own Language - Why and How?Build your own Language - Why and How?
Build your own Language - Why and How?
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Ameya_Kasbekar_Resume
Ameya_Kasbekar_ResumeAmeya_Kasbekar_Resume
Ameya_Kasbekar_Resume
 
SE.pdf
SE.pdfSE.pdf
SE.pdf
 
Doppl Development Introduction
Doppl Development IntroductionDoppl Development Introduction
Doppl Development Introduction
 
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
 
Domain specific modelling (DSM)
Domain specific modelling (DSM)Domain specific modelling (DSM)
Domain specific modelling (DSM)
 
Ppdg Robust File Replication
Ppdg Robust File ReplicationPpdg Robust File Replication
Ppdg Robust File Replication
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
A novel data type architecture support for programming languages
A novel data type architecture support for programming languagesA novel data type architecture support for programming languages
A novel data type architecture support for programming languages
 
The Design, Evolution and Use of KernelF
The Design, Evolution and Use of KernelFThe Design, Evolution and Use of KernelF
The Design, Evolution and Use of KernelF
 
Madhu_Resume
Madhu_ResumeMadhu_Resume
Madhu_Resume
 
Metamorphic Domain-Specific Languages
Metamorphic Domain-Specific LanguagesMetamorphic Domain-Specific Languages
Metamorphic Domain-Specific Languages
 
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobilNLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
NLP-Focused Applied ML at Scale for Global Fleet Analytics at ExxonMobil
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 

Mais de bartholomeocoombs

CompetencyAnalyze how human resource standards and practices.docx
CompetencyAnalyze how human resource standards and practices.docxCompetencyAnalyze how human resource standards and practices.docx
CompetencyAnalyze how human resource standards and practices.docxbartholomeocoombs
 
CompetencyAnalyze financial statements to assess performance.docx
CompetencyAnalyze financial statements to assess performance.docxCompetencyAnalyze financial statements to assess performance.docx
CompetencyAnalyze financial statements to assess performance.docxbartholomeocoombs
 
CompetencyAnalyze ethical and legal dilemmas that healthcare.docx
CompetencyAnalyze ethical and legal dilemmas that healthcare.docxCompetencyAnalyze ethical and legal dilemmas that healthcare.docx
CompetencyAnalyze ethical and legal dilemmas that healthcare.docxbartholomeocoombs
 
CompetencyAnalyze ethical and legal dilemmas that healthcare wor.docx
CompetencyAnalyze ethical and legal dilemmas that healthcare wor.docxCompetencyAnalyze ethical and legal dilemmas that healthcare wor.docx
CompetencyAnalyze ethical and legal dilemmas that healthcare wor.docxbartholomeocoombs
 
CompetencyAnalyze collaboration tools to support organizatio.docx
CompetencyAnalyze collaboration tools to support organizatio.docxCompetencyAnalyze collaboration tools to support organizatio.docx
CompetencyAnalyze collaboration tools to support organizatio.docxbartholomeocoombs
 
Competency Checklist and Professional Development Resources .docx
Competency Checklist and Professional Development Resources .docxCompetency Checklist and Professional Development Resources .docx
Competency Checklist and Professional Development Resources .docxbartholomeocoombs
 
Competency 6 Enagage with Communities and Organizations (3 hrs) (1 .docx
Competency 6 Enagage with Communities and Organizations (3 hrs) (1 .docxCompetency 6 Enagage with Communities and Organizations (3 hrs) (1 .docx
Competency 6 Enagage with Communities and Organizations (3 hrs) (1 .docxbartholomeocoombs
 
Competency 2 Examine the organizational behavior within busines.docx
Competency 2 Examine the organizational behavior within busines.docxCompetency 2 Examine the organizational behavior within busines.docx
Competency 2 Examine the organizational behavior within busines.docxbartholomeocoombs
 
CompetenciesEvaluate the challenges and benefits of employ.docx
CompetenciesEvaluate the challenges and benefits of employ.docxCompetenciesEvaluate the challenges and benefits of employ.docx
CompetenciesEvaluate the challenges and benefits of employ.docxbartholomeocoombs
 
CompetenciesDescribe the supply chain management principle.docx
CompetenciesDescribe the supply chain management principle.docxCompetenciesDescribe the supply chain management principle.docx
CompetenciesDescribe the supply chain management principle.docxbartholomeocoombs
 
CompetenciesABCDF1.1 Create oral, written, or visual .docx
CompetenciesABCDF1.1 Create oral, written, or visual .docxCompetenciesABCDF1.1 Create oral, written, or visual .docx
CompetenciesABCDF1.1 Create oral, written, or visual .docxbartholomeocoombs
 
COMPETENCIES734.3.4 Healthcare Utilization and Finance.docx
COMPETENCIES734.3.4  Healthcare Utilization and Finance.docxCOMPETENCIES734.3.4  Healthcare Utilization and Finance.docx
COMPETENCIES734.3.4 Healthcare Utilization and Finance.docxbartholomeocoombs
 
Competencies and KnowledgeWhat competencies were you able to dev.docx
Competencies and KnowledgeWhat competencies were you able to dev.docxCompetencies and KnowledgeWhat competencies were you able to dev.docx
Competencies and KnowledgeWhat competencies were you able to dev.docxbartholomeocoombs
 
Competencies and KnowledgeThis assignment has 2 parts.docx
Competencies and KnowledgeThis assignment has 2 parts.docxCompetencies and KnowledgeThis assignment has 2 parts.docx
Competencies and KnowledgeThis assignment has 2 parts.docxbartholomeocoombs
 
Competencies and KnowledgeThis assignment has 2 partsWhat.docx
Competencies and KnowledgeThis assignment has 2 partsWhat.docxCompetencies and KnowledgeThis assignment has 2 partsWhat.docx
Competencies and KnowledgeThis assignment has 2 partsWhat.docxbartholomeocoombs
 
Competences, Learning Theories and MOOCsRecent Developments.docx
Competences, Learning Theories and MOOCsRecent Developments.docxCompetences, Learning Theories and MOOCsRecent Developments.docx
Competences, Learning Theories and MOOCsRecent Developments.docxbartholomeocoombs
 
Compensation  & Benefits Class 700 words with referencesA stra.docx
Compensation  & Benefits Class 700 words with referencesA stra.docxCompensation  & Benefits Class 700 words with referencesA stra.docx
Compensation  & Benefits Class 700 words with referencesA stra.docxbartholomeocoombs
 
Compensation, Benefits, Reward & Recognition Plan for V..docx
Compensation, Benefits, Reward & Recognition Plan for V..docxCompensation, Benefits, Reward & Recognition Plan for V..docx
Compensation, Benefits, Reward & Recognition Plan for V..docxbartholomeocoombs
 
Compete the following tablesTheoryKey figuresKey concepts o.docx
Compete the following tablesTheoryKey figuresKey concepts o.docxCompete the following tablesTheoryKey figuresKey concepts o.docx
Compete the following tablesTheoryKey figuresKey concepts o.docxbartholomeocoombs
 
Compensation Strategy for Knowledge WorkersTo prepare for this a.docx
Compensation Strategy for Knowledge WorkersTo prepare for this a.docxCompensation Strategy for Knowledge WorkersTo prepare for this a.docx
Compensation Strategy for Knowledge WorkersTo prepare for this a.docxbartholomeocoombs
 

Mais de bartholomeocoombs (20)

CompetencyAnalyze how human resource standards and practices.docx
CompetencyAnalyze how human resource standards and practices.docxCompetencyAnalyze how human resource standards and practices.docx
CompetencyAnalyze how human resource standards and practices.docx
 
CompetencyAnalyze financial statements to assess performance.docx
CompetencyAnalyze financial statements to assess performance.docxCompetencyAnalyze financial statements to assess performance.docx
CompetencyAnalyze financial statements to assess performance.docx
 
CompetencyAnalyze ethical and legal dilemmas that healthcare.docx
CompetencyAnalyze ethical and legal dilemmas that healthcare.docxCompetencyAnalyze ethical and legal dilemmas that healthcare.docx
CompetencyAnalyze ethical and legal dilemmas that healthcare.docx
 
CompetencyAnalyze ethical and legal dilemmas that healthcare wor.docx
CompetencyAnalyze ethical and legal dilemmas that healthcare wor.docxCompetencyAnalyze ethical and legal dilemmas that healthcare wor.docx
CompetencyAnalyze ethical and legal dilemmas that healthcare wor.docx
 
CompetencyAnalyze collaboration tools to support organizatio.docx
CompetencyAnalyze collaboration tools to support organizatio.docxCompetencyAnalyze collaboration tools to support organizatio.docx
CompetencyAnalyze collaboration tools to support organizatio.docx
 
Competency Checklist and Professional Development Resources .docx
Competency Checklist and Professional Development Resources .docxCompetency Checklist and Professional Development Resources .docx
Competency Checklist and Professional Development Resources .docx
 
Competency 6 Enagage with Communities and Organizations (3 hrs) (1 .docx
Competency 6 Enagage with Communities and Organizations (3 hrs) (1 .docxCompetency 6 Enagage with Communities and Organizations (3 hrs) (1 .docx
Competency 6 Enagage with Communities and Organizations (3 hrs) (1 .docx
 
Competency 2 Examine the organizational behavior within busines.docx
Competency 2 Examine the organizational behavior within busines.docxCompetency 2 Examine the organizational behavior within busines.docx
Competency 2 Examine the organizational behavior within busines.docx
 
CompetenciesEvaluate the challenges and benefits of employ.docx
CompetenciesEvaluate the challenges and benefits of employ.docxCompetenciesEvaluate the challenges and benefits of employ.docx
CompetenciesEvaluate the challenges and benefits of employ.docx
 
CompetenciesDescribe the supply chain management principle.docx
CompetenciesDescribe the supply chain management principle.docxCompetenciesDescribe the supply chain management principle.docx
CompetenciesDescribe the supply chain management principle.docx
 
CompetenciesABCDF1.1 Create oral, written, or visual .docx
CompetenciesABCDF1.1 Create oral, written, or visual .docxCompetenciesABCDF1.1 Create oral, written, or visual .docx
CompetenciesABCDF1.1 Create oral, written, or visual .docx
 
COMPETENCIES734.3.4 Healthcare Utilization and Finance.docx
COMPETENCIES734.3.4  Healthcare Utilization and Finance.docxCOMPETENCIES734.3.4  Healthcare Utilization and Finance.docx
COMPETENCIES734.3.4 Healthcare Utilization and Finance.docx
 
Competencies and KnowledgeWhat competencies were you able to dev.docx
Competencies and KnowledgeWhat competencies were you able to dev.docxCompetencies and KnowledgeWhat competencies were you able to dev.docx
Competencies and KnowledgeWhat competencies were you able to dev.docx
 
Competencies and KnowledgeThis assignment has 2 parts.docx
Competencies and KnowledgeThis assignment has 2 parts.docxCompetencies and KnowledgeThis assignment has 2 parts.docx
Competencies and KnowledgeThis assignment has 2 parts.docx
 
Competencies and KnowledgeThis assignment has 2 partsWhat.docx
Competencies and KnowledgeThis assignment has 2 partsWhat.docxCompetencies and KnowledgeThis assignment has 2 partsWhat.docx
Competencies and KnowledgeThis assignment has 2 partsWhat.docx
 
Competences, Learning Theories and MOOCsRecent Developments.docx
Competences, Learning Theories and MOOCsRecent Developments.docxCompetences, Learning Theories and MOOCsRecent Developments.docx
Competences, Learning Theories and MOOCsRecent Developments.docx
 
Compensation  & Benefits Class 700 words with referencesA stra.docx
Compensation  & Benefits Class 700 words with referencesA stra.docxCompensation  & Benefits Class 700 words with referencesA stra.docx
Compensation  & Benefits Class 700 words with referencesA stra.docx
 
Compensation, Benefits, Reward & Recognition Plan for V..docx
Compensation, Benefits, Reward & Recognition Plan for V..docxCompensation, Benefits, Reward & Recognition Plan for V..docx
Compensation, Benefits, Reward & Recognition Plan for V..docx
 
Compete the following tablesTheoryKey figuresKey concepts o.docx
Compete the following tablesTheoryKey figuresKey concepts o.docxCompete the following tablesTheoryKey figuresKey concepts o.docx
Compete the following tablesTheoryKey figuresKey concepts o.docx
 
Compensation Strategy for Knowledge WorkersTo prepare for this a.docx
Compensation Strategy for Knowledge WorkersTo prepare for this a.docxCompensation Strategy for Knowledge WorkersTo prepare for this a.docx
Compensation Strategy for Knowledge WorkersTo prepare for this a.docx
 

Último

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Último (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 

Survey of Domain-Specific Languages for Machine Learning

  • 1. A Survey on Domain-Specific Languages for Machine.pdf A Survey on Domain-Specific Languages for Machine Learning August 3, 2017 CISC 603-50- R-2017/Summer - Theory of Computation Student: Dileep Sharma Instructor: Majid Shaalan Contents 1 Statement 2 2 Abstract 2 3 Introduction 2 3.1 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.3 Domain Specific Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
  • 2. 4 DSL Feature Model 4 4.1 Language Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.2 Transformation Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.3 DSL Tool Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.4 DSL Process Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Languages Surveyed 9 5.1 OptiML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.2 ScalOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.3 Scala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.4 PIG LATIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.5 Breukervl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.6 Possibility of survey of other language . . . . . . . . . . . . . . . . . . . . . 11 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 3. . 11 6 Reference 12 1 1 Statement The purpose of this paper is to identify, describe and design Domain Specific Language( DSL ) applicable to Machine learning world in big data space, that can make process more faster and efficient. 2 Abstract In last couple of decades, the data we have at our disposal have increased tremendously because of technology advance. This technology advance has helped us in capturing, storing, analogizing and visualizing data and that has lead to big data. We need better algorithm to read and analyze these big and complex diastases. Machine Learning is turning out to be the most effective way of analyzing these datasets and predicting future behavior. To better analyze these datasets with Machine Learning we need enhanced
  • 4. computational power, that can be obtained using parallel processing using GPUs. Machine Learning algorithms needs to be adapted and optimized to specific applications. However, programming these devices to run efficiently and correctly is difficult, error-prone, and results in software that is harder to read and maintain. This paper is primarily concern about Domain Specific language that can help us in writing Machine Algorithms in efficient way to analyze Big Data. 3 Introduction Technological advance in recent past has caused a data revolution. This high volume of data is called big data.Every second, smartphones, tablets,cars, websites, and systems generate a massive amount of data, and users and software engineers have access to a subset of that data to perform their activities. 3.1 Big Data Apart for large amount of data, Big data also accounts for complex data, known as vari- ety.Big Data also created new challenges in data management.
  • 5. Traditional ways of data storage and analysis do not scale well to this amount of data, which can reach hundreds of terabytes or more, and new approaches are being developed to address these issues Big data is basically defined by 5V’s: • Volume – Refers to amount of data – Big Data doesnt sample 2 – Big Data observes and tracks what happens • Velocity – Speed of data processing – Speed of data generation – Big Data is often available in real-time • Variety – Number of types of data • Variability
  • 6. – Inconsistency of data • Veracity – Quality of data 3.2 Machine Learning Machine learning is turning out to be one of the most advanced technique to process and make inferences from Big Data. Machine Learning is widely used to discover identify trends, patterns, suggest actions, and optimize output. There are still a lot of challenges in using Machine Learning to solve big data problems, such as memory and time issues. To resolve these issues, we can use GPUs for parallel processing and scatter data across different machines. There are basically two kind of Machine Learning: • Supervised – All data is labeled – You have both Input variable and Output variable – Use an algorithm to learn the mapping function from the input to the output • Unsupervised
  • 7. – All data is unlabeled – You only have input data and no corresponding output variables – Algorithm try to find pattern in input data 3 3.3 Domain Specific Language In model-driven engineering, a Domain-Specific Language (DSL) is a specialized language, which, combined to a transformation function, serves to raise the abstraction level of soft- ware and ease software development. The Machine Learning Implementation can be made better by using techniques such as Domain-Specific. DSL solves problem in a single domain while General Purpose Languages(GSL) solves problems in a couple of domains. DSLs facilitate results to be expressed in the idiom and at the level of abstraction of the prob- lem domain Language.DSLs offer pre-defined abstractions to represent concepts from the application domain. This representation may be more clear and intuitive. Moreover, DSL
  • 8. compilers may optimize the code written for the specific domain, and they can perform error detection more efficiently. Lastly, DSLs may have more specific tool support that help software engineers increase their productivity. These languages are easier to learn. There can be three kind of DSL languages: • Markup language • Specification Language • Programming Language 4 DSL Feature Model DSl Feature model covers languages, transformation, tooling, and process aspects 1. Lan- guage and transformation are mandatory features because they are parts of the DSL defini- tion. Tool is also mandatory because it serves to automate transformation from a domain, the problem space, down to lower abstraction levels, the solution space. Process is optional because it can be undefined or implicit. 4.1 Language Features
  • 9. There are two language features called as2: • Abstract Syntax – Characterizes elements of a domain and their relationships without implementa- tion consideration • Concrete Syntax – Representation of a DSL in a human usable form 4 Figure 1: Roots of DSL Feature Language Figure 2: Language Feature 5 Figure 3: Root of the Transformation Features 4.2 Transformation Feature Transformation feature ensures the correspondence from the problem to the solution, takes into account the problem-to-solution element mapping, and all design, implementation, platform and architecture decisions. Transformation has to
  • 10. answer to three questions3: How to specify transformation4? What are the assets expected from the transformation5? How to realize the transformation to produce the expected assets? 6. 4.3 DSL Tool Features There are basically three kind of tool features namely Respect of Abstraction, Assistance and Quality Factor/reftool. The purpose of abstraction is to reduce software description. Abstraction can be intrusive pr seamless. Assistance aims at guiding the DSL tool user dur- ing definition and transformation of domain data. Assistance is adaptive when assistance changes in function of the context of usage and its Static when it does not changes. Process guidance guide the user at a process step or at the process workflow level and Checking is mandatory in the feature model because a DSL tool must ensure consistency and complete- ness of domain data. DSL checking can be realized on the fly or on user action. Quality Factor covers non-functional aspects of DSL Tool.
  • 11. 6 Figure 4: Specification Features Figure 5: Target Asset Features 7 Figure 6: Operational Transformation Features Figure 7: DSL Tool Features 8 Figure 8: DSL Process Features 4.4 DSL Process Features A DSL process defines how development projects with DSL must be executed. This part of the feature model addresses the Domain-Specific Software Development (DSSD). It can be of three types: Work Definition, Role and Guidance/refProcess. 5 Languages Surveyed 5.1 OptiML
  • 12. • Highly expressive language textual programming language built on top of Scala • OptiML provides the link between ML applications and heterogeneous parallel hard- ware • OptiML code outperformed explicitly parallelized MATLAB code on heterogeneous system • Require no knowledge of the underlying embedding implementation • No explicit parallelization • No explicit code for the lower level programming models • Statically typed language 9 • Declarative language • Transative language • Support Vector, matrix, Graph operations • Does not supports Distributed and Cloud computing feature 5.2 ScalOps • It enables algorithms to run on cloud
  • 13. • Textual programming language • A declarative language • Statically typed • Supports vector, matrix,and graph operations in both parallel and cloud computing environment • Transative language 5.3 Scala • It is concise • Has broad applicability • Texual programming language 5.4 PIG LATIN • Textual programming language • Imperative language • Dynamically typed • Translational language • Support Vector, matrix operations • Does not support graph operations • Supports Distributed and Cloud computing feature
  • 14. • Supports parallel operation 10 5.5 Breukervl • Modeling Graphical language • Descriptive language • Does not Support Vector, matrix and graph operations • Does not supports Distributed and Cloud computing and parallel computing feature 5.6 Possibility of survey of other language As time permits, I would take to take my analysis further and survey other Domain Specific Languages. 5.7 Conclusions Because of size and complexity of datasets machine learning algorithms are facing a lot of software and hardware challenges. We need to move towards parallel and heterogeneous hardware systems. A lot of DSL are developed to resolve these issues. These languages pro- vide parallel execution of code, distributed computation and
  • 15. cloud computation. Following points applies to most of the DSL languages: • Most of the DSL language are programming languages • Most of the languages are complied • Modeling languages are all graphical and have a descriptive model • Most of them support – Parallelism – Distribution – Cloud 11 6 Reference References [1] “A Survey on Domain-Specific Languages for Machine Learning in Big Data” 2016 IEEE International Conference on Software Science, Technology and Engineering (SWSTE). [2] “OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning” 28th
  • 16. International Conference on Machine Learning, Bellevue, WA, USA, 2011 [3] “Domain Specific Languages for Machine Learning” GPCE 2016. [4] “Towards Model-Driven Engineering for Big Data Analytics – An Exploratory Analysis of Domain-Specific Languages for Machine Learning” 2014 47th Hawaii International Conference on System Sciences. [5] “An overview of big data opportunities, applications and tools. In Intelligent Systems and Computer” Vision (ISCV), 2015 (pp. 1-6). IEEE [6] “ DSL Classification” 12 StatementAbstractIntroductionBig DataMachine LearningDomain Specific LanguageDSL Feature ModelLanguage FeaturesTransformation Feature DSL Tool FeaturesDSL Process FeaturesLanguages SurveyedOptiMLScalOpsScalaPIG LATINBreukervl Possibility of survey of other languageConclusionsReference Design Guidelines for Domain Specific Languages.pdf Design Guidelines for Domain Specific Languages Gabor Karsai Institute for Software
  • 17. Integrated Systems Vanderbilt University Nashville, USA Holger Krahn Software Engineering Group Department of Computer Science RWTH Aachen, Germany Claas Pinkernell Software Engineering Group Department of Computer Science RWTH Aachen, Germany Bernhard Rumpe Software Engineering Group Department of Computer Science RWTH Aachen, Germany Martin Schindler Software Engineering Group Department of Computer Science RWTH Aachen, Germany
  • 18. Steven Völkel Software Engineering Group Department of Computer Science RWTH Aachen, Germany ABSTRACT Designing a new domain specific language is as any other complex task sometimes error-prone and usually time con- suming, especially if the language shall be of high-quality and comfortably usable. Existing tool support focuses on the simplification of technical aspects but lacks support for an enforcement of principles for a good language design. In this paper we investigate guidelines that are useful for de- signing domain specific languages, largely based on our ex- perience in developing languages as well as relying on ex- isting guidelines on general purpose (GPLs) and modeling languages. We defined guidelines to support a DSL devel- oper to achieve better quality of the language design and a better acceptance among its users. 1. INTRODUCTION Designing a new language that allows us to model new technical properties in a simpler and easier way, describe or implement solutions, or to describe the problem resp. re- quirements in a more concise way is one of the core chal- lenges of computer science. The creation of a new language is a time consuming task, needs experience and is thus usu- ally carried out by specialized language engineers. Nowa- days, the need for new languages for various growing do- mains is strongly increasing. Fortunately, also more sophis- ticated tools exist that allow software engineers to define a
  • 19. new language with a reasonable effort. As a result, an in- creasing number of DSLs (Domain Specific Languages) are designed to enhance the productivity of developers within specific domains. However, these languages often fit only to a rather specific domain problem and are neither of the quality that they can be used by many people nor flexible enough to be easily adapted for related domains. During the last years, we developed the frameworks Mon- tiCore [13] and GME [2] which support the definition of domain specific languages. Using these frameworks we de- signed several DSLs for a variety of domains, e.g., a textual version of UML/P notations [17] and a language based on function nets in the automotive domain [5]. We experienced that the design of a new DSL is a difficult task because dif- ferent people have a varying perception of what a “good” language actually is. This of course also depends on the taste of the developer respectively the users, but there are a number of generally acceptable guidelines that assist in language development, making it more a systematic, methodological task and less an intellectual ad-hoc challenge. In this paper we summa- rize, categorize, and amend existing guidelines as well as add our new ones assuming that they improve design and usability of future DSLs. In the following we present general guidelines to be consid- ered for both textual and graphical DSLs with main focus is on the former. The guidelines are discussed sometimes using examples from well-known programming languages or math- ematics, because these languages are known best. Depend- ing on the concrete language and the domain these guidelines have to be weighted differently as there might be different purposes, complexity, and number of users of the resulting language. For example, for a rather simple configuration
  • 20. language used in only one project a timely realization is usu- ally more important than the optimization of its usability. Therefore, guidelines must be sometimes ignored, altered, or enforced. Especially quality-assurance guidelines can result in an increased amount of work. While we generally focus in our work on DSLs that are specifically dedicated to modeling aspects of (software) sys- tems, we believe that these guidelines generally hold for any DSL that embeds a certain degree of complexity. 1.1 Literature on Language Design For programming languages, design guidelines have been intensively discussed since the early 70s. Hoare [8] intro- duced simplicity, security, fast translation, efficient object code, and readability as general criteria for the design of good languages. Furthermore, Wirth [22] discussed sev- eral guidelines for the design of languages and correspond- ing compilers. The rationale behind most of the guidelines and hints of both articles can be accepted as still valid to- day, but the technical constraints have changed dramati- cally since the 70s. First of all, computer power has in- creased significantly. Therefore, speed and space problems have become less important. Furthermore, due to sophis- ticated tools (e.g., parser generators) the implementation of accompanying tools is often not a necessary part of the language development any more. Of course, both articles concentrate on programming languages and do not consider the greater variety of domain specific languages. More recently, authors have also discussed the design of domain specific modeling languages. General principles for
  • 21. modeling language design were introduced in [14]. These include simplicity, uniqueness, consistency, and scalability, on which we will rely later. However, the authors did not discuss how these higher level principles can be achieved. In [12] certain aspects of the DSL development are explained and some guidelines are introduced. More practical guide- lines for implementing DSLs are given in [10]. These focus on how to identify the necessary language constructs to gen- erate full code from models. The authors explain how to provide tool support with the MetaEdit+ environment. [20] explains 12 lessons learned from DSL experiments that can help to improve a DSL. Although more detailed discussions on explicit guidelines are missing, these lessons embed doc- umented empirical evidence – a documentation that many other discussions, including ours do not have. In [16] the authors introduce a toolset which supports the definition of DSLs by checking their consistency with respect to sev- eral objectives. Language designers can select properties of their DSL to be developed and the system automatically derives other design decisions in order to gain a consistent language definition. However, the introduced criteria cover only a subset of the decisions to be made and hence, cannot serve as the only criteria for good language design. Quite the contrary, to our experience many design guidelines can- not be translated in automatic measures and thus cannot be checked by a tool. 1.2 Categories of DSL Design Guidelines The various design guidelines we will discuss below, can be organized into several categories. Essentially, these guide- lines describe techniques that are useful at different activities of the language development process, which range from the domain analysis to questions of how to realize the DSL to the development of an abstract and a concrete syntax including the definition of context conditions. An alignment of guide-
  • 22. lines with the language development activities and the de- veloped artifacts has the advantage that a language designer can concentrate on the respective subset of the guidelines at each activity. This should help identifying and realizing the desired guidelines. Therefore, we decided for a development phase oriented classification and identified the following cat- egories: Language Purpose discusses design guidelines for the early activities of the language development process. Language Realization introduces guidelines which discuss how to implement the language. Language Content contains guidelines which focus on the elements of a language. Concrete Syntax concentrates on design guidelines for the readable (external) representation of a language. Abstract Syntax concentrates on design guidelines for the internal representation of a language. For each of these categories we will discuss the design guidelines we found useful. Please be aware that the subse- quently discussed guidelines sometimes are in conflict with each other and the language developer sometimes has to bal- ance them accordingly. Additionally, semantics is explicitly not listed as a separate step as it should be part of the entire development process and therefore has an influence on all of the categories above. 2. DSL DESIGN GUIDELINES 2.1 Language Purpose
  • 23. Language design is not only influenced by the question of what it needs to describe, but equally important what to do with the language. Therefore, one of the first activities in language design is to analyze the aim of the language. Guideline 1: “Identify language uses early.” The language defined will be used for at least one task. Most common uses are: documentation of knowledge (only) and code ge- neration. However, there are a lot more forms of usage: definition or generation of tests, formal verification, auto- matic analysis of various kinds, configuration of the system at deployment- or run-time, and last but increasingly im- portant, simulation. An early identification of the language uses have strong in- fluence on the concepts the language will allow to offer. Code generation for example is not generally feasible when the language embeds concepts of underspecification (e.g., non- deterministic Statecharts). Even if everything is designed to be executable, there are big differences regarding the over- head necessary to run certain kinds of models. If efficient execution on a small target machine is necessary (e.g., mo- bile or car control device) then high-level concepts must be designed for optimized code generations. For simulation and validation of requirements however, efficiency plays a minor role. Guideline 2: “Ask questions.” Once the uses of a language have been identified it is helpful to embed these forms of language uses into the overall software development process. People/roles have to be identified that develop, review, and deploy the involved programs and models. The following questions are helpful for determining the necessary decisions: Who is going to model in the DSL? Who is going to review the models? When? Who is using the models for which
  • 24. purpose? Based thereon, the question after whether the language is too complex or captures all the necessary domain elements can be revisited. In particular, appropriate tutorials for the DSL users in their respective development process should now be prepared. Guideline 3: “Make your language consistent.” DSLs are typically designed for a specific purpose. Therefore, each feature of a language should contribute to this purpose, oth- erwise it should be omitted. As an illustrative example we consider a platform independent modeling language. In this language, all features should be platform independent as well. This design principle was already discussed in [14]. 2.2 Language Realization When starting to define a new language, there are several options on how to realize it. One can implement the DSL from scratch or reuse and extend or reduce an existing lan- guage, one can use a graphical or a textual representation, and so on. We have identified general hints which have to be taken into account for these decisions. Guideline 4: “Decide carefully whether to use graphical or textual realization.” Nowadays, it is common to use tools supporting the design of graphical DSLs such as the Eclipse Modeling Framework (EMF) or MetaEdit+. On the other hand, there exist sophisticated tools and frameworks like MontiCore or xText for text-based modeling. As described in [6], there are a number of advantages and disadvantages for both approaches. Textual representations for example usually have the advantage of faster development and are
  • 25. platform and tool independent whereas graphical models provide a better overview and ease the understanding of models. Therefore, advantages and disadvantages have to be weighted and matched against end users’ preferences in order to make a substantiated decision for one of the real- izations. From this point on, a more informed decision can be made for a concrete tool to realize the language based on their particular features and the intended use of the lan- guage. Comparisons can be found in [21] or [3]. Guideline 5: “Compose existing languages where possible.” The development of a new language and an accompanying toolset is a labor-intensive task. However, it is often the case that existing languages can be reused, sometimes even without adaptation. A good example for language reuse is OCL: it can be embedded in other languages in order to define constraints on elements expressed in the hosting language. The most general and useful form of language reuse is thus the unchanged embedding of an existing language into an- other language. A more sophisticated approach is to have predefined holes in a host language, such that the defini- tion of a new language basically consists of a composition of different languages. For textual languages this composi- tional style of language definitions is well understood and supported by sophisticated tools such as [11] which also as- sists the composition of appropriate tools. However, according to the seamlessness principle [14], the concepts of the languages to be composed need to fit to- gether. In the UML, the object oriented paradigm under- lies both class diagrams and Statecharts which therefore fit well together. Additionally, when composing languages care must be exercised to avoid confusion: similar constructs with different semantics should be avoided.
  • 26. Guideline 6: “Reuse existing language definitions.” If the language cannot be simply composed from some given lan- guage parts, e.g., by language embedding as proposed in guideline 5, it is still a good idea to reuse existing language definitions as much as possible. In [18] more possible real- ization strategies, such as language extension or language specialization are analyzed. This means, taking the defini- tion of a language as a starter to develop a new one is better than creating a language from scratch. Both the concrete and the abstract syntax will benefit from this form of reuse. The new language then might retain a look-and-feel of the original, thus allowing the user to easily identify familiar notations. Looking at the abstract syntax of existing lan- guages, one can identify “language pattern” (quite similar to design pattern), which are good guidelines for language design. For example, expressions, primary expressions, or statements have quite a common pattern in all languages. Only if there is no existing language/notation or the disad- vantages do not allow using the strategies mentioned above, a standalone realization should be considered. The websites of parser generators like Antlr [1] or Atlantic Zoo [19] are a good starting point for reusing language definitions. Guideline 7: “Reuse existing type systems.” A DSL used for software development often comprises and even extends either a property language such as OCL or an implementa- tion language such as Java. As described in [8], the design of a type system for such a language is one of the hardest tasks because of the complex correlations of name spaces, generic types, type conversions, and polymorphism. Furthermore, an unconventional type system would be hard for users to adopt as well. Therefore, a language de- signer should reuse existing type systems to improve com-
  • 27. prehensibility and to avoid errors that are caused by misin- terpretations in an implementation. Furthermore, it is far more economical to use an existing type system, than devel- oping a new one as this is a labor intensive and error-prone task. A well-documented object-oriented type system can be tailored to the needs of the DSL or even an implemented reusable type system can be used (e.g. [4]). 2.3 Language Content One main activity in language development is the task of defining the different elements of the language. Obviously, we cannot define in general which elements should be part of a language as this typically depends on the intended use. However, the decisions can be guided by some basic hints we propose in this Section. Guideline 8: “Reflect only the necessary domain concepts.” Any language shall capture a certain set of domain artifacts. These domain artifacts and their essential properties need to be reflected appropriately in the language in a way that the language user is able to express all necessary domain concepts. To ensure this, it is helpful to define a few models early to show how such a reflection would look like. These models are a good basis for feedback from domain experts which helps the developer to validate the language definition against the domain. However, when designing a language not all domain concepts need to be reflected, but only those that contribute to the tasks the language shall be used for. Guideline 9: “Keep it simple.” Simplicity is a well known criterion which enhances the understandability of a language [8, 14, 22]. The demand for simplicity has several rea- sons. First, introducing a new language in a domain pro- duces work in developing new tools and adapting existing processes. If the language itself is complex, it is usually harder to understand and thus raises the barrier of intro-
  • 28. ducing the language. Second, even when such a language is successfully introduced in a domain, unnecessary complexity still minimizes the benefit the language should have yielded. Therefore, simplicity is one of the main targets in designing languages. The following more detailed Guidelines 10, 11, and 12 will show how to achieve simplicity. Guideline 10: “Avoid unnecessary generality.” Usually, a domain has a finite collection of concepts that should be reflected in the language design. Statements like “maybe we can generalize or parameterize this concept for future changes in the domain” should be avoided as they unneces- sarily complicate the language and hinder a quick and suc- cessful introduction of the DSL in the domain. Therefore, this guideline can also be defined as “design only what is necessary”. Guideline 11: “Limit the number of language elements.” A language which has several hundreds of elements is obviously hard to understand. One approach to limit the number of elements in a language for complex domains is to design sublanguages which cover different aspects of the systems. This concept is, e.g., employed by the UML: different kinds of diagrams are used for special purposes such as structure, behavior, or deployment. Each of them has its own notation with a limited number of concepts. A further possibility to limit the number of elements of a language is to use libraries that contain more elaborated concepts based on the concepts of the basic language and that can be reused in other models. Elements which were previously defined as part of the language itself can then be moved to a model in the library (compare, e.g., I/O in
  • 29. Pascal vs. C++). Furthermore, users can extend a library by their own definitions and thus, can add more and more functionality without changing the language structure itself. Therefore, introducing a library leads to a flexible, extensi- ble, and extensive language that nevertheless is kept simple. On the other hand, a language capable of library import and definition of those elements must have a number of ap- propriate concepts embedded to enable this (e.g., method and class definitions, modularity, interfaces - whatever this means in the DSL under construction). This principle has successfully been applied in GPL design where the languages are usually small compared to their huge standard libraries. Guideline 12: “Avoid conceptual redundancy.” Redun- dancy is a constant source of problems. Having several con- cepts at hand to describe the same fact allows users to model it differently. The case of conceptual richness in C++ shows that coding guidelines then usually forbid a number of con- cepts. E.g., the concept of classes and structs is nearly iden- tical, the main difference is the default access of members which is public for structs and private for classes. There- fore, classes and structs can be used interchangeably within C++ whereas the slight difference might be easily forgot- ten. So, it should be generally avoided to add redundant concepts to a language. Guideline 13: “Avoid inefficient language elements.” One main target of domain specific modeling is to raise the level of abstraction. Therefore, the main artifacts users deal with are the input models and not the generated code. On the other hand, the generated code is necessary to run the final system and more important, the generated code determines significant properties of the system such as efficiency. Hence, the language developer should try to generate efficient code. Furthermore, efficiency of a model should be transparent
  • 30. to the language user and therefore should only depend on the model itself and not on specific elements used within the model. Elements which would lead to inefficient code should be avoided already during language design so that only the language user is able to introduce inefficiency [8]. For example, in Java there is no operator to get all instances of one class as this would increase memory usage and oper- ating time significantly. However, this functionality can be implemented by a Java user if needed. 2.4 Concrete Syntax Concrete syntax has to be chosen well in order to have an understandable, well structured language. Thus, we con- centrate on the concrete syntax first and will deal with the abstract syntax later. Guideline 14: “Adopt existing notations domain experts use.” As [20] says, it is generally useful to adopt what- ever formal notation the domain experts already have, rather than inventing a new one. Computer experts and especially language designers are usually very practiced in learning new languages. On the contrary, domain experts often use a language for a longer time and do not want to learn a new concrete syntax es- pecially when they already have a notation for a certain problem. As already mentioned, it is often the case that the introduction of a DSL makes new tools and modified pro- cesses necessary. Inventing a new concrete syntax for given concepts would raise the barrier for domain experts. Thus, existing notations should be adopted as much as possible. E.g., queries within the database domain should be defined with SQL instead of inventing a new query language. Even if queries are only part of a new language to be defined SQL could be embedded within the new language.
  • 31. In case a suitable notation does not already exist, the new language should be adopted as close as possible to other existing notations within the domain or to other common used languages. A good example for commonly accepted languages are mathematical notations like arithmetical ex- pressions [8]. Guideline 15: “Use descriptive notations.” A descriptive notation supports both learnability and comprehensibility of a language especially when reusing frequently-used terms and symbols of domain or general knowledge. To avoid mis- interpretation it is highly important to maintain the seman- tics of these reused elements. For instance, the sign “+” usually stands for addition or at least something seman- tically similar to that whereas commas or semicolons are interpreted as separators. This applies to keywords with a widely-accepted meaning as well. Furthermore, keywords should be easily identifiable. It is helpful to restrict the num- ber of keywords to a few memorizable ones and of course, to have a keyword-sensitive editor. A good example for a descriptive notation is the way how special character like Greek letters are expressed in Latex. Instead of using a Unicode-notation each letter can be ex- pressed by its name (alpha for α, beta for β, and so on). Guideline 16: “Make elements distinguishable.” Easily dis- tinguishable representations of language elements are a ba- sic requirement to support understandability. In graphical DSLs, different model elements should have representations that exhibit enough syntactic differences to be easily dis- tinguishable. Different colors as the only criteria may be counterproductive, e.g., when printed in black and white. In textual languages usually keywords are used in order to sep- arate kinds of elements. These keywords have to be placed in appropriate positions of the concrete syntax, as other-
  • 32. wise readers need to start backtracking when “parsing” the text [8, 22]. The absence of keywords is often based on effi- ciency for the writer. But this is a very weak reason because models are much more often read than written and therefore to be designed from a readers point of view. Guideline 17: “Use syntactic sugar appropriately.” Lan- guages typically offer syntactic sugar, i.e., elements which do not contribute to the expressiveness of the language. Syn- tactic sugar mainly serves to improve readability, but to some extent also helps the parser to parse effectively. Key- words chosen wisely help to make text readable. Generally, if an efficient parser cannot be implemented, the language is probably also hard to understand for human readers. However, an overuse of the addition of syntactic sugar dis- tracts, because verbosity hinders to see the important con- tent directly. Furthermore, it should be kept in mind that several forms of syntactic sugar for one concept may hinder communication as different persons might prefer different elements for expressing the same idea. Nevertheless the introduction of syntactic sugar can also improve a language, e.g., the enhanced for-statement in Java 5 is widely accepted although it is conceptually redundant to a common for-statement. This is a conflict to guideline 12, but the frequency of occurrence of common for-statements in Java legitimates a more effective alternative of this notation. Guideline 18: “Permit comments.” Comments on model elements are essential for explaining design decisions made for other developers. This makes models more understand- able and simplifies or even enables collaborative work. So a widely accepted standard form of grouped comments, like
  • 33. /* ... */, and line comments, like // ... for textual languages or text boxes and tooltips for graphical languages should be embedded. Furthermore, specially structured comments can be used for further documentation purposes as generating HTML- pages like Javadoc. In [8] it is mentioned that the “purpose of a programming language is to assist in the documenta- tion of programs”. Therefore we recommend that every DSL should allow a user to generally comment at various parts of the model. If desired, the language may even contain the definition of a comment structure directly, thus enforcing a certain style of documentation. Guideline 19: “Provide organizational structures for mod- els.” Especially for complex systems the separation of mod- els in separate artifacts (files) is inevitable but often not enough as the number of files would lead to an overflowed model directory. Therefore, it is desirable to allow users to arrange their models in hierarchies, e.g., using a pack- age mechanism similar to Java and store them in various directories. As a consequence, the language should provide concepts to define references between different files. Most commonly “import” is used to refer to another name space. Imports make elements defined in other DSL artifacts visible, while direct references to elements in other files usually are ex- pressed by qualified names like “package.File.name”. Some- times one form of import isn’t enough and various relations apply which have to be reflected in the concrete syntax of the language. Guideline 20: “Balance compactness and comprehensibil- ity.” As stated above, usually a document is written only once but read many times. Therefore, the comprehensibility
  • 34. of a notation is very important, without too much verbosity. On the other hand, the compactness of a language is still a worthwhile and important target in order to achieve ef- fectiveness and productivity while writing in the language. Hence a short notation is more preferable for frequently used elements rather than for rarely used elements. Guideline 21: “Use the same style everywhere.” DSLs are typically developed for a clearly defined task or viewpoint. Therefore, it is often necessary to use several languages to specify all aspects of a system. In order to increase under- standability the same look-and-feel should be used for all sublanguages and especially for the elements within a lan- guage. In this way the user can obtain some kind of intuition for a new language due to his knowledge of other ones. For instance, it is hardly intuitive if curly braces are used for combining elements in one language and parentheses in an- other. Additionally, a general style can also assist the user in identifying language elements, e.g., if every keyword consists of one word and is written in lower case letters. A conflicting example is the embedment of OCL. One the one hand it is possible to adapt the OCL syntax to the enclosing language to provide the same syntactic style in both languages. On the other hand different OCL styles impede the comprehensibility of OCL, what endorses the use of a standard OCL syntax. Guideline 22: “Identify usage conventions.” Preferably not every single aspect should be defined within the language definition itself to keep it simple and comprehensible (see guideline 11). Furthermore, besides syntactic correctness it is too rigid to enforce a certain layout directly by the tools. Instead, usage conventions can be used which describe more detailed regulations that can, but need not be enforced.
  • 35. In general, usage conventions can be used to raise the level of comprehensibility and maintainability of a language. The decision, whether something goes as a usage convention or within a language definition is not always clear. So, usage conventions must be defined in parallel to the concrete syn- tax of the language itself. Typical usage conventions include notation of identifiers (uppercase/lowercase), order of ele- ments (e.g. attributes before methods), or extent and form of comments. A good example for code conventions for a programming language can be found in [9]. 2.5 Abstract Syntax Guideline 23: “Align abstract and concrete syntax.” Given the concrete syntax, the abstract syntax and especially its structure should follow closely to the concrete syntax to ease automated processing, internal transformations and also pre- sentation (pretty printing) of the model. In order to align abstract and concrete syntax three main principles apply: First, elements that differ in the concrete syntax need to have different abstract notations. Second, elements that have a similar meaning can be internally rep- resented by reusing concepts of the abstract syntax (usually through subclassing). This is more a semantics-based deci- sion than a structurally based decision. Third, the abstract notation should not depend on the context an element is used in but only on the element itself. A pretty bad exam- ple for context-dependent notations is the use of “=” as as- signment in OCL-statements (let-construct) and as equality in OCL-expressions. Here, the semantics obviously differs whilst the syntax is equal. Furthermore, the use of a transformation engine usually also requires an understanding of the internal structure of a
  • 36. language, which is related to the abstract syntax. Therefore, the user to some extent is exposed to the internal structure of the language and hence needs an alignment between his concrete representations and the abstract syntax, where the transformations operate on. Alignment of both versions of syntax and the seamlessness principle discussed in [14] assures that it is possible to map abstractions from a problem space to concrete realizations in the solution space. For a domain specific language the domain is then reflected as directly as possible without much bias, e.g., of implementation or executability considerations. Guideline 24: “Prefer layout which does not affect trans- lation from concrete to abstract syntax.” A good layout of a model can be used to simplify the understanding for a hu- man reader and is often used to structure the model. Nev- ertheless, a layout should be preferred which does not have any impact on the meaning of the model, and thus, does not affect the translation of the concrete to the abstract syntax and the semantics. As an example, this is the case for com- puter languages where the program structure is achieved by indentation. From a practical point of view, line separators, tabs, and spaces are often treated differently depending on editors and platforms and are usually difficult to distinguish by a human reader. If these elements gain a meaning, de- velopers have to be much more cautious and a collaborative development requires more effort. For graphical languages a well-known bad example is the twelve o’clock semantics in Stateflow [7] where the order of the placement of transitions can change the behavior of the Statechart. To simplify the usage of DSLs, we recommend that the layout of programs doesn’t affect their semantics.
  • 37. Guideline 25: “Enable modularity.” Nowadays, systems are very complex and thus, hard to understand in their en- tirety. One main technique to tackle complexity is modu- larization [15] which leads to a managerial, flexible, compre- hensible, and understandable infrastructure. Furthermore, modularization is a prerequisite for incremental code gener- ation which in turn can lead to a significant improvement of productivity. Therefore, the language should provide a means to decompose systems into small pieces that can be separately defined by the language users, e.g., by providing language elements which can be used in order to reference artifacts in other files. Guideline 26: “Introduce interfaces.” Interfaces in pro- gramming languages provide means for a modular develop- ment of parts of the system. This is especially important for complex systems as developers may define interfaces be- tween their parts to be able to exchange one implementa- tion of an interface with another which significantly increases flexibility. Furthermore, the introduction of interfaces is a common technique for information hiding: developers are able to change parts of their models and can be sure that these changes do not affect other parts of the system when the interface does not change. Therefore, we recommend that a DSL should provide an interface concept similar to the interfaces of known programming languages. One example of interfaces are visibility modifiers in Java. They provide a means to restrict the access to members in a simple way. Another common example are ports, e.g., in composite structure diagrams, which explicitly define inter- action points and specify services they provide or need, thus declaring a more detailed interface of a part of a system. 3. DISCUSSION
  • 38. In the previous sections we introduced and categorized a bundle of guidelines dedicated to different language artifacts and development phases. Some of them already contained notes on relationships with other guidelines and trade-offs between them, and some of them briefly discussed their im- portance in different project settings. However, the follow- ing more detailed discussion shall help to identify possible conflicting guidelines and their reasons and gives hints on decision criteria. The most contradicting point is reuse of existing artifacts versus the implementation of a language from scratch (cf. No. 5, 6, and 7). The main reason for the reuse of a lan- guage or a type system is that it can significantly decrease development time. Furthermore, existing languages often provide at least an initial level of quality. Thus, some of the guidelines, e.g., guidelines which target at consistency (e.g., No. 21) or claim modularity (e.g., No. 25), are met auto- matically. However, reusing existing languages can hinder flexibility and agility as an adaption may be hard to realize if not impossible. The same ideas apply to an improvement of the reused language itself (e.g., to meet guidelines which were not respected by the original language): the implemen- tation of a single guideline may require a significant change of the language. Another important point is that this ap- proach may influence the satisfiability of other guidelines. One example is No. 14 which suggests the reuse of exist- ing notations of the domain. In case there are no languages which are similar to these notations, this guideline and lan- guage reuse are obviously contradicting. Furthermore, com- bining several existing languages may introduce conceptual inconsistencies, such as different styles or different underly- ing type systems which have to be translated into each other (cf., No. 5).
  • 39. Implementing a new language from scratch in turn permits a high degree of freedom, agility, and flexibility. In this case, some guidelines can be realized more easily than in the case of reuse. However, these advantages are not for free: designing concrete and abstract syntax, context conditions, and a type system are time- and cost-intensive task. To summarize, a decision whether to reuse existing languages or to implement a new one is one of the most important and critical decisions to be made. Another important point which was already mentioned in the introduction is that some of the presented guidelines have to be weighted according to the project settings, to the form of use, etc. One example is the expected size of the languages instances. Some DSLs serve as configuration lan- guages and thus, typical instances consist of a small amount of lines only. Other DSLs are used to describe complex sys- tems leading to huge instances. In the former case guidelines which target at compositionality or claim references between files (e.g., No. 19 and 25) have nearly no validity whereas in the latter example these guidelines are of high importance. However, not only the expected size of the instances can in- fluence the weight of guidelines. Another important aspect is the intended usage of the language. Sometimes DSLs are not executable; they are designed for documentation only. In these cases, the guideline which demands to avoid inef- ficient elements in the language (No. 13) is of course not meaningful. However, for languages which are translated into running code, this is of high importance. A last point we want to discuss here are the costs induced by applying the guidelines. Some of them can be imple- mented easily and straightforward (e.g., distinguishability
  • 40. of elements or permitting comments, No. 16 and 18) whilst others require a significant amount of work (e.g., introduc- tion of references between files including appropriate reso- lution mechanisms and symbol tables, No. 19). Of course, especially guidelines whose implementation is cost intensive have to be matched against project settings as described above. For small DSLs such guidelines should be ignored in- stead as the cost will often not amortize the improvements. However, from our experiences DSLs are often subject to changes. While growing these guidelines become more and more important. The main problem which emerges in these cases is that adding new things to a grown language (e.g., modularity) is typically more difficult and time-consuming than it would have been at the beginning. Therefore, ana- lyzing the domain and usage scenarios as described in Guide- lines 1 and 2 can prevent those unnecessary costs. 4. CONCLUSION In this paper 26 guidelines have been discussed that should be considered while developing domain specific languages. To our experience this set of guidelines is a good basis for developing a language. For space reasons, we restricted our- selves to guidelines for designing the language itself. Other guidelines are needed for successfully integrating DSLs in a software development process, deploying it to new users, and evolving the syntax and existing models in a coherent way. In general, a guideline should not be followed closely, but many of them are proposals as to what a language designer should consider during development. Some of the guidelines have to be discussed in certain domains, because they might not have the same relevance and as discussed many guide- lines contradict each other and the language developer has to balance them appropriately.
  • 41. But generally, the consideration of explicitly formulated guidelines is improving language design. We also think that it is worthwhile to develop much more detailed sets of con- crete instructions for particular DSLs. We currently focus on textual languages in the spirit of Java. Although we have compiled this list from literature and our own experience, we are sure that this list is not com- plete and has to be extended constantly. In addition, guide- lines might change during time as developers gather more experience, tools become more elaborate, and taste changes. Maybe some guidelines are not relevant anymore in a few years, as some guidelines from the 1970’s are less important today. Acknowledgment: The work presented in this paper is partly undertaken in the MODELPLEX project. MOD- ELPLEX is a project co-funded by the European Commis- sion under the “Information Society Technologies”Sixth Frame- work Programme (2002-2006). Information included in this document reflects only the authors’ views. The European Community is not liable for any use that may be made of the information contained herein. 5. REFERENCES [1] Antlr Website www.antlr.org. [2] GME Website http://www.isis.vanderbilt.edu/projects/gme/. [3] T. Goldschmidt, S. Becker, and A. Uhl. Classification of concrete textual syntax mapping approaches. In ECMDA-FA, pages 169–184, 2008. [4] J. Gough. Compiling for the .NET Common Language
  • 42. Runtime (CLR). Prentice Hall, November 2001. [5] H. Grönniger, J. Hartmann, H. Krahn, S. Kriebel, and B. Rumpe. View-based modeling of function nets. In Proceedings of the Object-oriented Modelling of Embedded Real-Time Systems (OMER4) Workshop, Paderborn,, October 2007. [6] H. Grönniger, H. Krahn, B. Rumpe, M. Schindler, and S. Völkel. Textbased Modeling. In 4th International Workshop on Software Language Engineering, 2007. [7] G. Hamon and J. Rushby. An operational semantics for stateflow. In Fundamental Approaches to Software Engineering: 7th International Conference (FASE), volume 2984 of Lecture Notes in Computer Science, pages 229–243, Barcelona, Spain, March 2004. Springer-Verlag. [8] C. A. R. Hoare. Hints on programming language design. Technical report, Stanford University, Stanford, CA, USA, 1973. [9] Java Code Conventions http://java.sun.com/docs/codeconv/. [10] S. Kelly and J.-P. Tolvanen. Domain-Specific Modeling: Enabling Full Code Generation. Wiley, 2008. [11] H. Krahn, B. Rumpe, and S. Völkel. Monticore: Modular development of textual domain specific languages. In Proceedings of Tools Europe, 2008. [12] M. Mernik, J. Heering, and A. M. Sloane. When and how to develop domain-specific languages. Technical
  • 43. Report SEN-E0309, Centrum voor Wiskunde en Informatica, Amsterdam, 2005. [13] MontiCore Website http://www.monticore.de. [14] R. Paige, J. Ostroff, and P. Brooke. Principles for Modeling Language Design. Technical Report CS-1999-08, York University, December 1999. [15] D. L. Parnas. On the criteria to be used in decomposing systems into modules. Commun. ACM, 15(12):1053–1058, 1972. [16] P. Pfahler and U. Kastens. Language Design and Implementation by Selection. In Proc. 1st ACM-SIGPLAN Workshop on Domain-Specific-Languages, DSL ’97, pages 97–108, Paris, France, January 1997. Technical Report, University of Illinois at Urbana-Champaign. [17] B. Rumpe. Modellierung mit UML. Springer, Berlin, May 2004. [18] D. Spinellis. Notable Design Patterns for Domain Specific Languages. Journal of Systems and Software, 56(1):91–99, Feb. 2001. [19] The Atlantic Zoo Website http://www.eclipse.org/gmt/am3/zoos/atlanticZoo/. [20] D. Wile. Lessons learned from real DSL experiments. Science of Computer Programming, 51(3):265–290, June 2004. [21] D. S. Wile. Supporting the DSL Spectrum. Computing and Information Technology, 4:263–287, 2001.
  • 44. [22] N. Wirth. On the Design of Programming Languages. In IFIP Congress, pages 386–393, 1974. Evolving Domain Specific Languages for Project Summary.pdf Evolving Domain Specific Languages Project Summary We believe that the use of Domain Specific Languages (DSLs) can significantly improve the ability of domain experts to use computers effectively in their work. However the cost of developing and deploying DSLs is significant, and this has severely limited their deployment. Intellectual Merit: Like other software, DSLs have a well- defined life-cycle. SQL is a “mature” DSL: its design is specified by international standards, and its stand- alone implementations are carefully crafted by hand to be stable, robust and to operate with high performance in specific execution environments. The DSLs that we have designed are experimental: their designs vary week-by-week as researchers try out new ideas, their implementations must be malleable, and efficiency and the quality of the error messages are less important. The simplest way to implement an experimental DSL is by embedding it as a library of functions within an existing programming language. These two approaches have both advantages and disadvantages, and the point in the life cycle of a particular DSL seems to favor one approach over another.
  • 45. The life cycle of a DSL has three stages, and the needs of the three stages vary. By recognizing this fact, we can design processes and tools that can dramatically change the balance point between the two choices, thus allowing the cheaper embedded approach to be viable longer. In addition, our research in meta-programming systems seems highly applicable to the opposite approach of making stand-alone implementations cheaper to build, so that they can be applicable earlier in the lifecycle. Finally, experience with some of the tools we have constructed in our meta- programming research suggest that stand-alone implementations can be extracted from embedded implementations in a semi automated way, further blending the distinction between the two approaches. This bodes well for lessening the cost of deploying a DSL since its suggests several strategies for evolving an embedded implementation to a stand-alone one in a cost effective way. The proposed research would investigate applying meta- programming tools to DSL evolution, in several different dimensions: • Extending the capabilities of host languages for embedded implementations. • Exploring automated techniques for converting embedded implementations to stand-alone implemen- tations. • Automating the generation of stand-alone implementations. Broader Impact: The opportunity costs of being unable to utilize computers effectively has a large cost: projects that are never attempted, or whose scope is significantly reduced, because the cost of developing software is too high to contemplate, or too complex to write.
  • 46. DSLs can play a large role in mitigating these costs, but only if the costs of DSL implementation can be lowered. The research proposed here will make significant advances in this direction. As a graduate-only institution, OGI is in a unique position to promote and advance the introduction of new ideas into the work force. Our students are by and large working students, uniquely motivated by the problems that they have experienced. Our close connections with industry means that our students acutely aware of the problems of constructing complex software. OGI has a tradition teaching research oriented topics in advance graduate level courses, and in building and maintaining widely distributed tools. 1 1 Overview Computers are useful only if there are people to program them. In an ideal world, the end-user, who is an expert in the domain of the problem that needs to be solved, would be able to directly command the computer to do what is required. Unfortunately, even the best of modern general-purpose programming languages can be used effectively only by experts in computing. As computers become ever more pervasive, and the problem domains in which they are applied ever more diverse, the proportion of users who have sufficient computing knowledge dwindles. There is growing recognition that domain-specific languages (DSLs) are an effective way for end-users
  • 47. to program computers without becoming software engineers. Although DSLs are attractive, they can be difficult and expensive to design and implement. DSL creators must combine expert knowledge of the domain, programming skills necessary for building solutions, ability to make appropriate generalizations over the domain, and programming language design and implementation skills. This makes the current cost of introducing a new DSL prohibitively high. We propose two interrelated research paths that we believe can dramatically lower this cost. First, at OGI, we have studied scores of DSLs, and we have designed and developed a number of our own. We have discovered that DSL’s (like many other systems) have a life cycle. This life cycle has three stages, and the needs of the three stages vary. By recognizing this fact, we can design processes and tools that can dramatically increase reuse amongst DSL implementations, and ease the transition of a DSL from earlier stages to later stages. Second, our study of typeful meta-programming systems has helped us design a new generation of tools for manipulating programs as data in a way that is aware of the semantic properties of the programs being manipulated. Applying these tools to DSL design and implementation should significantly ease the burden of the DSL implementer. The aims of the research proposed include: • Lowering the cost of DSL development. DSL’s are effective only if they exist. The current cost for developing a DSL can be quite high. Lowering the cost for deploying a new DSL could be an effective means of increasing their use.
  • 48. • Apply meta-programming techniques to DSL development. Much of our experience in building meta-programming systems is directly applicable to DSL design. We’d like to reuse this work in a new and exciting domain. Many of the tools we have built, such as automatic binding time analyses as a way of automatically staging programs, and the meta- programming language Ωmega seem applicable to the process of evolving DSL’s. • Develop instructional materials on DSL development. Without making significant efforts to educate others on the building of DSL implementations their construction will remain out of reach to most software developers. 2 Introduction to DSLs Domain-Specific Languages (DSLs) are a promising approach to making computers accessible to end-users without the need for conventional programming. Perhaps the most widely used DSL is the formula language used inside Microsoft Excel to define how the content of a cell on a spreadsheet is derived from other cells. Here, the domain is calculation; the particular problems solved are expressing that one cell should be the sum, or the average, of some other cells. People who write such formulae think of themselves as solving problems in finance, or surveying, or engineering; we know that they are in fact writing computer programs, but the power of DSLs is that from the point of view of the domain expert, the program has disappeared. Other widely used DSLs include SQL, a DSL for extracting relevant information from a relational
  • 49. database without explicitly specifying the physical layout or access path for the data; Yacc, a DSL for 1 describing parsers by the grammar to be parsed; and HTML, a DSL for defining the appearance of a document without specifying explicit layout algorithms. As computers are applied to more and more aspects of our society, we believe that the ability of such DSLs to “make the program disappear” will be vital. DSLs provide many advantages over general-purpose languages. To illustrate this consider the query language SQL. • A SQL program provides a common language that allows different database users to share precisely the description of the data they are interested in. • A SQL program relieves the programmer from needing to know about various aspects of the database, such as the file structures that store the persistent data (sequential file, or a balanced tree?), how to open and process the different file formats (random or sequential access?), how to design and manipulate the ephemeral data structures needed to process the query (a linked list, an array, a hash table?), and what algorithm to use (a nested join, a merge join, or a hash join?). • A SQL program encodes constraints. It restricts the complexity of the programs that the user can write. A SQL program is guaranteed to terminate, and is bounded by polynomial rather than exponential
  • 50. time behavior. • A SQL program can take advantage of domain specific properties. All commercial SQL processors use an extensive query-optimization phase that attempts to aggressively minimize the resources used to answer the query. • A SQL program is a rapid prototype. A SQL query is easy to write and lets the query writer experiment with different solutions. If the results from a query do not answer the question the query writer had in mind, he is free to throw the original query away and try another without excessive losses in time or resources. • A SQL query can integrate with a number of existing contexts or environments. SQL is a front end to a number of different query engines. • A SQL query is an efficient division of labor. Programming language experts designed the SQL lan- guage, algorithm experts designed the algorithms used, database experts designed the query optimizers, and domain experts ask the queries. • A SQL query adopts an existing notation (from relational algebra) and allows database users to program queries effectively with little or no programming experience. Advantages like these are not unique to SQL, but are general properties of all domain specific languages. While providing significant advantages, the DSL approach comes with a cost. Realizing a DSL requires both a design and an implementation. Such implementations can be large, expensive to produce, conceptually
  • 51. complex, and hard to scale. An implementation for even a simple language often does not scale as the language evolves to meet newer demands. Because of this, domain experts, and even most programmers, are not comfortable taking on the tasks of language design and implementation. Engineering issues, like these, have kept DSLs from being widely as used as they ought to be. 2.1 Domain-Specific Language Evolution Although past research has produced methodologies and tools that aid in various stages of language design and implementation process, the level of automation available for building DSLs is low. Moreover, DSLs are seldom static. Initially, the domain may not be well- understood, so a period of experimental design is required while ideas are developed. Constraints on the implementation tend to become stronger over time, as the experimental DSL is moved into production. In addition, the problem domain itself can change over the intended lifetime of the DSL. Hence, the design and implementation of a DSL must be evolved over time. 2 DSL implementations are not only complex and difficult to build, but they are also difficult to evolve. Implementation techniques that are most cost-effective at early stages in the DSL lifecycle may be less appro- priate at later stages, and vice-versa. Embedding the DSL as a library within an existing host programming language may work well for an initial implementation. Syntactic extensions, perhaps implemented with macros or a pre-processor, might be added later. Such
  • 52. implementations are quick and easier to build (than non-embedded ones), but inherit all the limitations, as well as the capabilities, of the host language and its implementation. More mature versions of the DSL may benefit from a stand-alone implementation, but this is typically expensive to build, as it must handle all the tasks of a conventional language translator, such as parsing, semantic analysis, optimization, and code generation. Current technologies provide no smooth transition path from embedded to stand-alone implementations. It is a goal of this proposal to identify and smooth this transition from embedded DSL to stand-alone DSL. 2.2 Supporting the DSL Lifecycle The life cycle of a DSL may be thought of as having three stages, each calling for different implementation approaches. Infancy. In this stage, language design ideas are in flux and domain abstractions are not well understood. While some domains have existing conceptual frameworks and notations that can be used for problem specification, many domains must be analyzed and studied to identify the key abstractions. At this stage, modeling flexibility is at a premium, performance is less important, and embedded implementations are likely to be cost-effective. In an embedded approach, each domain concept is realized directly as a host-language construct: domain operators are host-language procedures, domain types are host- language user-defined data types, etc. Thus, creating or modifying a DSL is relatively cheap, provided a suitably powerful host language is used. Our experience with scores of DSLs has shown that successful
  • 53. embedding of a DSL is made much easier by the use of higher-order functions. Thus, languages such as O’Caml, Haskell, or Lisp have made suitable host languages [16, 14, 12]. Embedding may be thought of as rapid prototyping. Even if the domain ultimately requires generating code for a specialized target environment, the embedded implementation can be used for modeling and simulation. Many language features needed by a typical DSL (e.g., support for procedural abstraction) will already exist in the host language; moreover, it is straightforward to integrate code from multiple DSLs if they share the same host implementation. Adolescence. In this stage, domain abstractions stabilize and become better-understood as the language begins to be exercised by end-users. Implementation flexibility is still necessary. Actual use uncovers the limits of the implementation, which leads to new features, abstractions, and implementation techniques. In this stage users begin to develop domain-specific logics for their DSLs. Such logics describe how the features of the DSL interact. They allow the DSL programmers to calculate equivalences, and to reason about the behavior of each solution at a high level of abstraction. Domain-specific logics also enable domain-specific optimizations, and efficient implementations. While embedded languages remain important at this stage, their limitations begin to outweigh their advantages. As more users exercise the system, the need for specialized language syntax, type discipline, and error reporting become more apparent. These are difficult or impossible to provide in a conventional
  • 54. host language. Because most host languages lack the ability to reflect on their own code – i.e., to analyze it as well as execute it – they make it very difficult to enhance an embedded DSL implementation with validation or optimization passes. Few host language implementations provide much flexibility about the target environment in which they 3 run, though this may be essential in domains involving embedded systems or requiring close integration with legacy code. Currently, these demands often force implementers to switch to a stand-alone implementation at this stage, but improvements to host languages (i.e. foreign function interfaces) may enable embedded implementation to remain a viable choice longer in the adolescence stage. Maturity. In this stage, domain abstractions are well- understood, and the DSL design is fixed. Features that enhance usability – such as type systems, debuggers, and programming environments – become impor- tant as the DSL gains adherents. Performance of the DSL implementation itself may become a problem as DSL program size and complexity increase. A stand-alone implementation for a DSL can have its own syntax and type system appropriate for just that domain. The DSL can be “restricted” to enforce constraints on what can be expressed, and it can have its own optimizer that relies on domain-specific knowledge so that performance bottlenecks can be
  • 55. addressed. If investing in a stand-alone DSL implementation ever makes sense, it will be at this phase, when the language has become relatively stable and user tools are important. Automated construction tools for interpreters and compilers can make this process cheaper; while many such tools exist, some important ones are still missing. It is still desirable to re-use any existing embedded or stand-alone implementation as far as possible; again, improvements in embedded host language design may help this goal. 2.3 Example: Maturation of the Hawk DSL At OGI we have developed a number of DSLs, including MSL [35, 13], a DSL for specifying radar message formats and invariants; Stratego [33], a language for describing transformations on computer programs; and Hawk [16], a DSL for describing computer chip microarchitecture. As an example of DSL maturation we will consider the Hawk DSL, developed by Prof. John Launchbury at OGI. Hawk is used to describe computer microarchitectures at the level of abstraction used in an architecture textbook, i.e., in terms of register files, memories, delays, pipeline elements and reorder buffers. A Hawk program specifies a particular microarchitecture; it can be executed to simulate the architecture; it can be transformed, say to increase throughput, and it can be analyzed for various properties. Hawk has been used to build simulators for several pipelined architectures, as well as superscalar architectures. The infant Hawk implementation was embedded in Haskell. An embedded approach was used to help determine the most appropriate abstractions. Hawk language constructs describe circuits as a set of equations describing data flow relationships. The equations were
  • 56. implemented as higher-order functions in Haskell. Through experimentation we found very successful abstractions such as transactions and bypasses that succinctly capture complex functionality at the level of detail needed to understand microarchitectures. As Hawk reached adolescence, we began to run into the characteristic limitations of the embedded approach. • Syntax. We found a need for a language construct to express mutual recursion at the monadic level [10], something not available in Haskell. Such a construct would have been very useful to describe cyclic architectures with feedback loops using monadic descriptions. • Analysis. Because Hawk represents its abstractions as Haskell functions (rather than data structures) we found it impossible to implement some useful analyses on Hawk programs. Performance analysis of pipelines requires knowing the length of the pipeline, and correctness analysis of feedback loops requires knowing that every feedback loop is delayed at least once by the use of a latch. Neither of these things could be discovered in the embedded Hawk implementation. • Error messages. Error messages in Hawk are expressed in terms of the Haskell implementation rather than the abstractions they represent. This, coupled with extensive use of Haskell’s class mechanism, makes error messages from the type checker nearly incomprehensible to end-users. 4
  • 57. • Interoperation. Hawk could benefit by interoperation or integration with existing compilations of microarchitecture units, e.g., memories, written in other languages such as VHDL, or by interoperating with existing vector-based simulation tools. However, Haskell’s interoperability facilities are limited. • Performance. Although Hawk’s expressiveness and verification ability are unparalleled, its simulation performance (in comparison with competing simulation technology such as SimpleScalar [1]) leaves much to be desired. A stand-alone implementation for Hawk could address these problems directly, but a stand-alone im- plementation also requires an order of magnitude more work. Either we need richer host languages that solve some of these problems, delaying the need for a stand- alone implementation, or we need better tools for building stand-alone implementations. 3 Tools for DSL Design – Typeful Meta-Programming Systems Meta-programming systems can play an important role in both building stand-alone implementations and enriching host languages. Meta-programming is the study of programs that generate and/or manipulate other programs. While our work on meta-programming systems was not explicitly motivated by the issue of DSL design and evolution, meta-programming systems seem to provide many of the capabilities that are needed to implement DSLs in a direct and efficient manner. Their ability to manipulate programs as data
  • 58. objects also allows them to support the evolution of existing embedded implementations into stand-alone ones. Our interests in the area of meta-programming fall roughly into two broad categories: 1. Homogeneous meta-programming. In homogeneous meta- programming languages (MetaML [24, 32]) the object-language and the meta-language are the same. The representation of programs (both syn- tactic, and semantic) are chosen by the language designer and cannot be observed by meta-programs. This relieves the meta-programmer from the responsibility to represent object-programs, and allows the meta-language itself to ensure that important properties (such as static typing) of object-languages are automatically enforced in all meta-programs. Our work in homogeneous meta-programming languages was directed primarily at staged programming languages. Staged programs proceed in stages. Each stage “writes” a program that is executed in the next stage. Staging can be exploited to build efficient implementations whenever constants or program invariants only become available after compile-time. 2. Heterogeneous meta-programming. In heterogeneous meta- programming, the meta- and the object- language are not (necessarily) the same. We are particularly interested in open heterogeneous meta- programming languages. Such languages equip the programmer with the tools to specify both the syntactic representation and semantics of arbitrary object- languages. The ability of the meta-language to ensure object-language types and properties in a heterogeneous system is considerably complicated by the difference between the meta- and object-languages. If the type system of the object-language
  • 59. differs significantly from the type system of the meta-language, we cannot use the single type system approach used in homogeneous meta-programming. Instead, we must rely on an extension mechanism, that allows the embedding of new (object-language) type systems inside the type system of the meta- language. Our work on two previous NSF supported projects Type Safe Program Generators (CCR-9625462, 10/96-03/00), and Heterogeneous Meta Programming Systems (CCR-0098126 10/01-06/04) has given us the opportunity to think deeply on the issues of language representation and manipulation. We have reported on the design[32], use[24, 25], semantics[30], type systems[3, 18, 29], implementation[17, 26, 4], and open problems[27] of meta-programming systems. With the help of his colleagues the proposer has built and distributed two different meta-programming systems: MetaML[24] and Template Haskell[26]. Recently, we completed the preliminary design for Ωmega, an open heterogeneous meta-programming language. Ωmega gives special support to the description of new object-languages. One of the unique 5 features of Ωmega is its ability to define object-language representations that use Ωmega’s meta-level types to guarantee programmer-specified properties of object- language syntax. Ωmega is also a homogeneous meta-programming system that inherits MetaML’s ability to write staged programs. This combination of
  • 60. heterogeneous and homogeneous features provides the tools needed for implementation and evolution of domain specific languages. 4 Example: A language of geometric regions. In this section, we shall apply Ωmega to a small example as a way of presenting our ideas for evolving domain specific languages. We define a small DSL that describes geometric objects. This DSL has its own syntax and semantics. It includes a simple type system in which the type of an object specifies the measuring units (i.e. inches, centimeters, pixels) used to describe it. The type system ensures that as we combine objects, units remain consistent. There are several important points that this example makes and addresses. • We shall illustrate the difference between an embedded and stand-alone implementation. The embedded DSL is implemented by combinators that directly manipulate the denotations of the DSL expressions. These combinators are typed in a way that forces the programmer to combine them in a way that preserves the typing of DSL expressions. This is an important advantage of using the embedding approach for many DSL’s. • The embedded Region language is easy to implement but hard to manipulate. A region program is implemented as an Ωmega function (or closure), and its internal structure cannot be observed. This denies the DSL implementer the opportunity of writing potentially useful meta-programs that manipulate and optimize DSL programs. • Representing a DSL program as a piece of data recovers the
  • 61. ability to manipulate the program, but has two drawbacks. First, we lose the ability to encode the type system of the DSL in the type system of the host language. Second, implementing the DSL in the host language now introduces several layers of interpretive overhead. • A stand-alone implementation usually involves representing the object-language as a data structure and writing separate type checking and code generation functionality. Code generation usually targets a language different from the language used to build the stand- alone implementation. This is done mostly for efficiency or interoperability/compatibility reasons. • Reusing the work invested in an embedded implementation in a stand-alone implementation is hard because of the vastly different strategies employed. Ωmega has several unique features that we believe allows us to overcome all of the weakness described in the bullets. The example illustrates a potential transition pathway, from an embedded implementation of the DSL to a stand-alone implementation, with many reuse possibilities. We shall also demonstrate and emphasize the importance of a meta-language that can transform the DSL expressions into another language to support efficient and effective implementation, thus automating some of the steps in the evolution. 4.1 The Region DSL The Region DSL provides a notation to describe subsets of the two-dimensional plane, which we shall call Regions. Regions are specified in a planar coordinate system using magnitudes that can be measured in
  • 62. different kinds of units (centimeters, pixels, inches, and so on). The Region language is typed. Its type system is designed to prevent combining regions defined with mismatched measuring units. Once defined, a region can be drawn or displayed. The region language is very simple, but rich enough that we can illustrate the points made in the previous section. 6 u ∈ U ::= cm | pix units d ∈ D ::= n u magnitudes r ∈ R ::= univ | empty | circle d | rect d d | trans (d, d) r regions | r1 ∪ r2 | r1 ∩ r2 | conv u1 u2 r � n u : u � empty : R u � univ : R u � d : u � (circle d) : R u � d1 : u � d2 : u � (rect d1 d2) : R u � r1 : R � r2 : R u � r1 ∪ r2 : R u � r1 : R � r2 : R u � r1 ∩ r2 : R u � r : R u1 � (conv u1 u2 r) : R u2
  • 63. � d1 : u � d2 : u � r : R u � trans (d1, d2) r : R u Figure 1: The language of geometric regions: syntax and type system. In Figure 1 we give the syntax and static semantics (typing rules) for the Region DSL. There are three syntactic categories: units, magnitudes (i.e. 10 cm, or 2 pix) and regions. There are four primitive regions: empty is the empty region; univ is the full plane; (circle d) is a circular region with center at the origin of the coordinate system and radius specified by d; (rect d1 d2) is a rectangular region centered at the origin with length d1 and height d2. Regions can be combined using “set- operations:” union (r1 ∪ r2) and intersection (r1 ∩ r2). Translation (trans (d1, d2) r) translates a region by an amount specified by a pair of magnitudes (d1, d2). Finally, a region of type u1 can be coerced to a region of type u2 using the explicit conversion expression (conv u1 u2 r). An embedded region DSL. In Figure 2 we implement regions as an embedded DSL using Ωmega as the host language. The meaning of a region is a characteristic function: for each coordinate in the plain, the meaning function should answer whether that points lies inside the region. In the embedding, regions are defined in terms of host language functions (notice the type synonym for Region as a function type). Magnitudes are defined in terms of an algebraic datatype (Mag). The embedded implementation is just a series of definitions, one for each of the DSL’s operators. This embedding uses the type system of the host language to implement the type system of the Region
  • 64. language. The types of the region combinators are parameterized by the type of the unit used to construct it. For example a region of type (Region Centimeter) must have been constructed using centimeters as units. This requires a certain amount of overloading in the implementation. Consider the definition of the basic region for a circle. circle :: Mag u -> Region u circle r = x y -> leq (plus (square x)(square y)) (square r) A point (x,y) lies within the circular region of radius r, if and only if x2 + y2 ≤ r2. We need some way of overloading leq, plus and square to work on different representations of magnitudes. There are several ways one could do this, but we choose to use two of Ωmega’s unique features (kind extension and equality qualified types) because we will find them useful later on. We use kind extension to define a new kind Unit. Kinds[2, 11, 19] are similar to types in that, while types classify values, kinds classify types. We indicate this by the classifies relation (::). For example: 5 :: Int :: *0 . We say 5 is classified by Int, and Int is classified by *0 (star-zero). The kind *0 classifies all the types that classify values that can be computed (like Int, Char, pairs, functions etc). The kind definition also introduces two new types Pixel and Centimeter that are classified by Unit (Pixel::Unit, Centimeter::Unit). We then define a new type constructor Mag::Unit -> *0 which is parameterized by types of kind Unit. This allows us to define data- constructor functions Pix and Cm that inject Int values into (Mag Pixel) values, and Float values into (Mag Centimeter) values respectively. Consider the clause
  • 65. 7 -- Units and Magnitudes kind Unit = Pixel | Centimeter data Mag u = Pix Int where u = Pixel | Cm Float where u = Centimeter -- Regions as Characteristic Function type Region u = Mag u -> Mag u -> Bool -- basic regions circle::Mag u -> Region u circle r = x y -> leq (plus (square x) (square y)) (square r) rect::Mag u -> Mag u -> Region u rect w h = x y -> between (neg w) (plus x x) w && between (neg h) (plus y y) h univ::Region u
  • 66. univ = x y -> True empty :: Region u empty = x y -> False -- region combinators trans::(Mag u,Mag u) -> Region u -> Region u trans (a,b) r = x y -> r (minus x a) (minus y b) convert::(Mag v -> Mag u)-> Region u -> Region v convert t r = x y -> r (t x) (t y) intersect::Region u -> Region u -> Region u intersect r1 r2 = x y -> r1 x y && r2 x y union::Region u -> Region u -> Region u union r1 r2 = x y -> r1 x y || r2 x y plus :: Mag u -> Mag u -> Mag u plus (Pix a) (Pix b) = Pix (intAdd a b) plus (Cm a) (Cm b) = Cm (floatAdd a b) Figure 2: An embedded implementation of regions with combinators. Pix Int where u = Pixel The use of equality qualification in the
  • 67. where clause gives the constructor function Pix the qualified type (forall (u:Unit).(u = Pixel)=>Int -> Mag u). On one hand whenever we apply the function Pix we are required to establish that (u=Pixel). On the other hand, whenever we pattern match against a value constructed by (Pix n) we can assume that (u=Pixel). These two features allow us to define type constructors whose type parameters encode information about the internal structure of the values they classify. For example, a value of type (Mag Pixel) must have been constructed by the Pix constructor function. This ability will play an important role later on in the example. We can now see how leq, plus and square are overloaded. Consider the first clause of the definition of plus in Figure 2. To type check this clause we must be able to derive: {Pix a :: Mag u, Pix b :: Mag u, } � (Pix(intAdd a b)) :: Mag u. Because Pix has an equality qualified type Pix::(u=Pixel)=>Int -> Mag u we can derive a type for a and b plus an equality between u and Pixel. {(u = Pixel), a :: Int, b :: Int, } � (Pix(intAdd a b)) :: Mag u. We can derive the type Mag u for the expression (Pix (intAdd a b)) provided (intAdd a b)::Int and we can show (u=Pixel), but this is precisely the guarantee we obtained from the pattern (Pix a), so we can derive the required type. An important feature of this embedding is that the type-system of the host language ensures type-correct combination of regions, i.e., it disallows the mixing of units of magnitude that the type system of the regions language is designed to prevent. For example, the expression union (circle (Cm 2)) (circle (Pix 4)) is simply rejected by Ωmega’s type checker as ill-typed.
  • 68. Syntax as data. The essential feature of the Region embedding is that it directly manipulates the denotations of regions as function-values. The disadvantage of this kind of embedding is that Region programs cannot be syntactically analyzed since they are just functions. We cannot even write a pretty-printer for Region expressions. In this section we present an approach to implementing the Regions DSL that will allow us to ensure that only well-typed regions are combined, and still let us analyze Region expressions’ syntax. Usually, object-languages such as Region expressions are represented as algebraic data-types. However, a näıve data-type representation would not allow us to statically enforce the invariant that only regions with the same type of units are combined. In Figure 3, we define a new type constructor RegExp which encodes Region expressions. The important thing to note is that RegExp is a type constructor whose argument type 8 data RegExp u = Univ | Empty | Circle (Mag u) | Rect (Mag u) (Mag u) | Union (RegExp u) (RegExp u)
  • 69. | Inter (RegExp u) (RegExp u) | Trans (Mag u, Mag u) (RegExp u) | forall v. Convert (CoordTrans v u) (RegExp v) evalTrans :: CoordTrans u v -> Mag v -> Mag u evalTrans CM2PX = pxTOcm evalTrans PX2CM = cmTOpx eval :: RegExp u -> Region u eval Univ = univ eval Empty = empty eval (Circle r) = circle r eval (Rect w h) = rect w h eval (Union a b) = union (eval a) (eval b) eval (Inter a b) = intersect (eval a) (eval b) eval (Trans xy r) = trans xy (eval r) eval (Convert trans r) = convert (evalTrans trans) (eval r) Figure 3: data structure embedding. indicates the unit of magnitude for the region it represents. In
  • 70. the previous section, we defined the Mag type with equality qualified units because we wanted to parameterize Region expressions by a unit type. As in the embedded combination approach, this parameterized data constructor approach also requires arguments of constructor functions like Union, to have matching units. In a similar fashion a syntactic representation of unit transformers is also defined: data CoordTrans u v = CM2PX where u = Centimeter, v = Pixel | PX2CM where u = Pixel, v = Centimeter This uses the equality qualified types in the same way they were used in the type constructor Mag. The where clauses force the constructors CM2PX and PX2CM to have the type CoordTrans Centimeter Pixel and CoordTrans Pixel Centimeter, respectively. The values of CoordTrans are used to encode the Trans regions: given a region of some unit type v, and a unit transformer CoordTrans v u, we can obtain a region of unit type u. At first glance, the technique presented here does not seem much different from a standard data-type representation of syntax available in functional languages such as ML or Haskell. This is because the type system of the Regions DSL is quite primitive. Expressing the typing invariants on the region expressions is not very difficult. Embedding richer typing invariants in algebraic syntax becomes quite hard without using equality qualified types or other mechanisms. Using this mechanism we have accumulated considerable experience in expressing significantly more complex object- language typing invariants (e.g., the simply typed λ-calculus, security type systems[22, 34], modal types[8, 9],
  • 71. and so on) [20]. The next step in developing a stand-alone implementation of the Region DSL is to define a function mapping Region expressions to their denotations (i.e., the appropriate characteristic functions). The type of eval is most interesting – it maps (RegExp u) to (Region u). This type ensures that the meaning of regions preserves magnitude unit types. The function evalTrans maps a CoordTrans u v to a function that converts between magnitudes of type u and v. Moreover, from the definition of eval, we can see that each clause of eval simply reuses the embedding combinators to give meanings to region expressions. This style of definition provides a syntactic and semantic connection between embedded and stand-alone implementation of regions. Using domain specific optimizations. A benefit of representing Region programs as data is the ability to improve their efficiency by applying domain specific, meaning-preserving transformations. Figure 4 shows several equivalences of Region programs that we can use to improve their efficiency. It is a simple step to code up the algebra of equivalences in Ωmega. Some sample equivalences are found in Figure 5. The function normalize transforms its argument by applying these equivalences to regions. It does so by replacing all of the region’s constructors with smart constructors in a bottom-up fashion. Each smart constructor applies some domain specific equivalences for that constructor. The figure shows only one smart constructor, but the others are similarly defined. 9
  • 72. univ ∪ r = univ empty ∩ r = empty empty ∪ r = r = univ ∩ r (r1 ∪ r2) ∩ (r1 ∪ r3) = r1 ∪ (r2 ∩ r3) conv τ1 τ2 (conv τ2 τ1 r) = r conv τ1 τ2 (r1 ∪ r2) = conv τ1 τ2 r1 ∪ trans τ1 τ2 r2 trans δ (trans δ ′ r) = trans (δ + δ ′ ) r trans δ (r1 ∪ r2) = trans δ r1 ∪ trans δ r2 Figure 4: Some algebraic identities on region expressions. normalize :: RegExp u -> RegExp u normalize (Union a b) = mkUnion (normalize a) (normalize b) normalize (Inter a b) = mkInter (normalize a) (normalize b) normalize (Trans xy r) = mkTrans xy (normalize r) normalize (Convert t r) = mkConvert t (normalize r) normalize t = t
  • 73. mkUnion :: RegExp u -> RegExp u -> RegExp u mkUnion a Univ = Univ mkUnion Univ b = Univ mkUnion a Empty = a mkUnion Empty b = b mkUnion a (Inter x y) = Inter (mkUnion a x) (mkUnion a y) mkUnion a b = Union a b Figure 5: The expression normalize r puts r into a normal form. Removing interpretive overhead. Using the syntax as data approach injects a layer of interpretive overhead into the implementation. Executing a Region program now involves a call to the eval function. This call must walk over the structure of the data representing the Region program. In the embedded approach this extra overhead was not present. To remove this overhead[31] we employ the staging capabilities of Ωmega. There are four staging annotations in Ωmega. Bracket [| |], escape $( ), lift (lift ) and run (run ). Brackets introduce a new code template, and specify that the expression inside the brackets should be generated as a program for potential execution in the next stage. Within brackets, escape specifies a hole within a template. The escaped expression is executed (resulting in a piece of code), and the resultant
  • 74. code is spliced into that hole. The escape annotation may only appear inside code brackets. Escape is the mechanism used for building larger pieces of code from smaller pieces. The lift annotation evaluates an expression of primitive type to a literal value and builds some trivial code that returns that value. The final annotation is run. It evaluates its argument – the result should then be a piece of code, and this code is then run. The run annotation is how Ωmega transitions from one stage to the next. The staged version appears in Figure 6. The staged interpreter is implemented by the function comp which is very similar in structure to the unstaged interpreter eval. The transition from eval to comp is accomplished in a few simple steps: 1. Stage the Mag datatype so that it contains (Code Int) instead of Int and (Code Float) instead of Float. This allows us to know the units of a magnitude at “compile time”, even though we will not know its value until “run time”. The definition of CodeMag accomplishes this staging. 2. Stage the primitive functions on Mag values to work with this new type. We follow the convention that the staged versions of helper functions are given the same name but suffixed with an “S”. Each of these staged functions is derived from the original in a systematic way. For example, the pixel versions of the original and staged version of the circle combinators are as follows: circle :: Mag u -> Region u circle (Pix r) = (x y -> ((x * x) + (y * y)) <= (r * r))
  • 75. circleS :: Mag u -> CodeMag u -> CodeMag u -> Code Bool circleS (Pix r) = (CodePx x) (CodePx y) -> [| (($x * $x) + ($y * $y)) <= $(lift (r*r)) |] 10 -- staged magnitude datatype data CodeMag u = CodePix (Code Int) where u = Pixel | CodeCm (Code Float) where u = Centimeter -- compiled region type synonym type CReg u = CodeMag u -> CodeMag u -> Code Bool -- helper function compTrans :: CoordTrans u v -> CodeMag v -> CodeMag u compTrans CM2PX (CodePix i) = CodeCm [| rem $i $(lift scale) |] compTrans PX2CM (CodeCm i) =