May 7-11, 2003: Giddings, M. C. and Long, J. “Applying a New Software Development Paradigm to Biology: Developing applications that handle complexity and stand the test of time”. Poster session presented with Dr. M. C. Giddings, of the University of North Carolina, Chapel Hill, at the Genome Informatics Conference, sponsored by Cold Spring Harbor Laboratory.
Applying a new software development paradigm to biology
1. Cover Page
Applying a New
Software Development
Paradigm to Biology
Authors: M. C. Giddings and Jeffrey G. Long (jefflong@aol.com)
Date: May 7, 2003
Forum: Poster session presented the Genome Informatics Conference, sponsored
by Cold Spring Harbor Laboratory.
Contents
Page 1: Abstract
Pages 2‐20: Slides (but no text) for presentation
License
This work is licensed under the Creative Commons Attribution‐NonCommercial
3.0 Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by‐nc/3.0/ or send a letter to Creative
Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
Uploaded June 26, 2011
2. Genome Informatics Long
Preference: Oral presentation
APPLYING A NEW SOFTWARE DEVELOPMENT PARADIGM TO
BIOLOGY
M.C. Giddings, University of North Carolina; J. Long
Rules are typically hard-coded into software applications, and the
maintenance of these rules as they change, due to updated domain
knowledge or user requirements, results in a significant time and cost
expenditure. Subject experts must communicate the rules they wish to
see automated to programmers who often are not experts in the subject
matter of the application; much can be lost in the translation. As this
process continues through time, software systems become large and
unwieldy, such that no one involved in a project can comprehend or
manage it as a whole. There have been numerous initiatives directed at
solving these problems, but the solutions have been only partially useful
because the problems they address are actually secondary and
symptomatic rather than primary.
The premise of Ultra-Structure theory is that these issues can be
addressed by removing most rules and all knowledge of the world from
software and instead representing them the same way we represent data,
i.e. as tables in a relational database. This approach combines key
features of the normally disparate areas of management information
systems, expert systems, and simulations, borrowing the strengths of
each and potentially eliminating some of the known problems of each.
Ultra-Structure has been applied to a variety of rule-based systems, and
we are investigating its utility for biology. In particular, we’ve been
building a multi-function prototype that can be used to store, in an
integrated and manageable way, laboratory results, simulations, and
general biological knowledge pertaining to microbial genomics and
proteomics research efforts. Based on results thus far we believe the
approach warrants further investigation. The presentation is intended to
introduce Ultra-Structure theory, discuss the prototype biological system
being developed, and generate discussion with our peers about the
benefits and pitfalls of this approach.
3. Applying a New Software Development
Paradigm to Biology: Developing applications that handle
P di t Bi l
complexity and stand the test of time
Morgan Giddings and Jeff Long
Genome Informatics Conference
giddings@unc.edu, jefflong@aol.com
4. Fundamental Hypothesis of Notational
Engineering
Many problems in government, science, business, the
arts, and engineering exist solely because of the way
we currently represent them. These problems present
an apparent “complexity barrier” and cannot be
complexity barrier
resolved with more computing power or more money.
Their resolution requires a new abstraction, which
becomes the basis of a notational revolution and
solves a whole class of previously-intractable
problems.
2 May 2003
5. A New Notational System Often
Requires a Change of Paradigm
A way of looking at a subject
An example, pattern, archetype, or model
A set of unconscious assumptions we have
about a subject
3 May 2003
6. Current Paradigm Assumption 1
Computer applications are defined in terms of
algorithms and data
Algorithms are the rules which are used to manipulate
the d
h data; ddata and rules are di i
d l distinct
The model for this is the abacus
When using computer systems, algorithms are
systems
implemented as software
But all knowledge should be stored in a formal
(executable), public
(executable) “public”, and readily updateable format
4 May 2003
7. Current Paradigm Assumption 2
Software can be designed using the same approaches
as other engineering fields
– e.g. civil, electrical, or aeronautical engineering, using the
“waterfall” development methodology
– but it’s not the same: in addition to being complex, software
and the requirements it supports are dynamic and change
greatly over short periods of time
A new design approach is required that can handle
both complexity and changing requirements
5 May 2003
8. Current Paradigm Assumption 3
Subject experts can communicate their requirements to
programmers
– but their expertise took many years to acquire
– their own understanding will evolve
But subject experts must see working prototypes, not
paper representations (e.g. flowcharts, OO diagrams),
in order to truly understand what they will be getting
Subject experts must be able to directly and
continuously update an application’s rules as needed
6 May 2003
9. Ultra-Structure Addresses These Issues
Remove 99% of all rules from the software
Represent them in a standard If/Then form
R t th i t d d If/Th f
(multiple ‘Ifs’, multiple ‘Thens’)
Represent them as records of data within a
very small set of tables
Distinction between rules and data largely
disappears!
7 May 2003
10. We Need a More Insightful Way to Look at
Complex Systems and Processes
observables surface structure
generates
rules middle structure
constrains
groups of rules
f l deep structure
8 May 2003
11. The Ruleform Hypothesis
Complex system structures are created by not-necessarily
complex processes; and these processes are created by the
animation of competency rules. Competency rules can be
grouped into a small number of classes whose form is
prescribed b " l f
ib d by "ruleforms". Whil the competency rules of a
" While th t l f
system change over time, the ruleforms remain constant. A
well-designed collection of ruleforms can anticipate all logically
possible competency rules that might apply to the system and
system,
constitutes the deep structure of the system.
9 May 2003
12. How are Rules Best Represented?
Statement of rules and device for executing them can
be different; need not be software for both
Rules can be reformulated into a canonical form of “If a
and b and c... then consider x and y and z”
Thousands or millions of rules can b grouped i
Th d illi f l be d into 10
10-
50 ruleforms (classes of rules) based on their syntax
and semantics
These ruleforms can be implemented as tables in a
RDBMS and managed easily by standard RDBMS
tools; the application essentially becomes an Expert
; pp y p
System using a RDBMS
10 May 2003
13. What is the Design Process?
Design proceeds by iterative prototype with
monthly f db k f
thl feedback from users; smallll
prototypes can easily evolve to any
necessary level of complexity
Basic design process is to:
– define what exists (existential rules)
– define relations between these (network &
authorization rules)
– define processes (protocol & meta-protocol rules)
11 May 2003
14. Ultra-Structure Benefits
Software size is reduced by 2+ orders of magnitude
– simpler to create, manage, understand, t t document, and
i l t t d t d test, d t d
teach
– remaining software has no knowledge of the world; it provides
basic
b i control l i th t k
t l logic that knows what t bl t check i what
h t tables to h k in h t
order, how to resolve conflicts, etc.
The development team is very small (e.g. <10 people)
and is therefore much more manageable than a large
team of dozens or hundreds of developers, and it does
a better job by any metric
12 May 2003
15. Ultra-Structure Benefits (cont’d)
Most knowledge is externalized and is in a
g
form anyone can see and understand
Subject experts can enter, change, and
j g
otherwise manage rules (knowledge) directly,
without going to programmers for assistance
Knowledge is actionable not only by subject
experts (e.g. as an encyclopedia) but also by
the
th computer, for reasoning, simulations,
t f i i l ti
decision support, etc.
13 May 2003
16. Ultra-Structure Benefits (cont’d)
Programmers do not need to know or
understand all rules, j t enough t d t
d t d ll l just h to determine
i
the classes of rules and the proper animation
procedures
Serious prototyping becomes feasible;
communications with users improves
Testing & QA can be far more rigorous
Documentation can be more complete
14 May 2003
17. Early Prototype of Biology Model
An integrated prototype has been developed to:
– simulate simple RNA->polypeptide process
RNA polypeptide
– store and analyze laboratory results
– store general biological and chemical knowledge
– compare simulated and actual lab results
– track sources of knowledge
Key conceptual components of model include:
– BioEntities (chemical elements and compounds, biological
compounds
objects such as amino acids and RNA, lab techs)
– BioEvents (activities engaged in by BioEntities)
– resources (people books lab equipment that provided
(people, books,
information used in model)
15 May 2003
20. Hopefully, this model can be
H f ll thi d l b
generalized (The CoRE Hypothesis)
We can create “Competency Rule Engines”, or CoREs, consisting
of <50 ruleforms, that are sufficient to represent all rules found
among systems sharing broad family resemblances, e.g. all
corporations. Their definitive deep structure will be permanent,
unchanging, and robust f all members of th f il whose
h i d b t for ll b f the family, h
differences in manifest structures and behaviors will be
represented entirely as differences in competency rules. The
animation procedures for each engine will be relatively simple
compared to current applications, requiring less than 100,000
lines of code in a third generation language.
18 May 2003
21. References
Long, J., and Denning, D., “Ultra-Structure: A design theory for complex
systems and processes.” In Communications of the ACM (January 1995)
y p ( y )
Long, J., “A new notation for representing business and other rules.” In
Long, J. (guest editor), Semiotica Special Issue on Notational
Engineering, Volume 125-1/3 (1999)
Long, J., “How could the notation be the limitation?” In Long, J. (guest
editor), Semiotica Special Issue on Notational Engineering, Volume 125-
1/3 (1999)
Long, J., Automated
Long J "Automated Identification of Sensitive Information in Documents
Using Ultra-Structure". In Proceedings of the 20th Annual ASEM
Conference, American Society for Engineering Management (October
1999)
19 May 2003