SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Anchor Modeling

                                                   O. Regardt
                                      Teracom, Kakn¨stornet, 102 52 Stockholm
                                                   a

                                                  L. R¨nnb¨ck
                                                      o   a
                                 re!sight, Kungsholms Strand 179, 112 48 Stockholm

                                   M. Bergholtz, P. Johannesson, P. Wohed
                                         DSV, Stockholms Universitet, Kista




Abstract
Maintaining and evolving data warehouses is a complex, error prone, and time consuming activity. The main
reason for this state of affairs is that the environment of a data warehouse is in constant change, while the
warehouse itself needs to provide a stable and consistent interface to information spanning extended periods
of time. In this paper, we propose a modeling technique, called Anchor Modeling, that offers non-destructive
extensibility mechanisms, thereby enabling robust and flexible management of changes. A key benefit of
Anchor Modeling is that changes in a data warehouse environment only require extensions, not modifications,
to the data warehouse. This ensures that existing data warehouse applications will remain unaffected by the
evolution of the data warehouse, i.e. existing views and functions will not have to be modified as a result
of changes in the warehouse model. We provide a formal and technology independent definition of anchor
models and show how anchor models can be realised as relational databases. We also investigate performance
issues and report on a number of lab experiments, which show that under certain conditions anchor databases
perform substantially better than databases constructed using traditional modeling techniques.
Keywords: anchor modeling, database modeling, normalization, 6NF, data warehousing, agile development,
temporal databases, table elimination



1. Introduction

    Maintaining and evolving data warehouses is a complex, error prone, and time consuming activity. The
main reason for this state of affairs is that the environment of a data warehouse is in constant change, while
the warehouse itself needs to provide a stable and consistent interface to information spanning extended
periods of time. Sources that deliver data to the warehouse change continuously over time and sometimes
dramatically. The information retrieval needs, such as analytical and reporting needs, also change. In order
to address these challenges, data models of warehouses have to be modular, flexible, and track changes in
the handled information [16]. However, many existing warehouses suffer from having a model that does not
fulfill these requirements. One third of implemented warehouses have at some point, usually within the first
four years, changed their architecture, and less than a third quotes their warehouses as being a success [24].
    In this paper, we propose a modeling technique, called Anchor Modeling, that offers non-destructive
extensibility mechanisms, thereby enabling robust and flexible representations of changes. A positive
consequence of this is that all previous versions of a schema will be available at any given time as parts of


    Email addresses: olle.regardt@teracom.se (O. Regardt), lars.ronnback@resight.se (L. R¨nnb¨ck), maria@dsv.su.se
                                                                                         o   a
(M. Bergholtz), pajo@dsv.su.se (P. Johannesson), petia@dsv.su.se (P. Wohed)

Preprint submitted to DKE                                                                         February 27, 2010
the complete schema [2]. Using Anchor Modeling results in data models in which only small changes are
needed when large changes occur in the environment, like adding or switching a source system or analytical
tool. This reduced need for redesign extends the longevity of a data warehouse, reduces the implementation
time, and simplifies maintenance [23].
    A key benefit of Anchor Modeling is that changes in a data warehouse environment only require extensions,
not modifications, to the data model. Applications thereby remain unaffected by the evolution of the data
model and thus do not have to be modified [21]. Furthermore, evolution through extensions rather than
modifications results in modularity, making it possible to decompose data models into small, stable and
manageable components. This modularity is also of great value in agile development where short iterations
are required. It is possible to construct an initial model with a small number of agreed upon business terms,
which later can be seamlessly extended into a final model. This way of working can improve on the state of
the art in data warehouse design, where close to half of current data warehouse projects are either behind
schedule or over budget [24, 4], partly due to having a too large initial project scope.
    An anchor model that is realised as a relational database schema will have a high degree of normalization,
provide reuse of data and offer the ability to store historical data. The high decomposition of the database
stems from the fact that attributes become separate tables in the schema. This differs significantly from the
currently popular approach of using de-normalized multidimensional models [15] for data warehouses. Even
though the origin of Anchor Modeling were the requirements found in such environments, it is a generic
modeling approach, also suitable for other types of systems.
    The paper is organized as follows. Section 2 defines Anchor Modeling in a technology independent way
and proposes a naming convention, Section 3 introduces a running example, Section 5 shows how an anchor
model can be realised as a relational database, Section 4 suggests a number of Anchor Modeling guidelines,
and physical implementation is described in Section 6. Section 7 investigates performance issues of anchor
databases and introduces a number of conditions that influence performance. In Section 8 advantages
of Anchor Modeling are discussed, Section 9 contrasts Anchor Modeling to alternative approaches in the
literature, and Section 10 concludes the paper and gives directions for further research.

2. Basic Notions of Anchor Modeling

    In this section, we introduce the basic notions of Anchor Modeling by first explaining them informally
and then giving formal definitions using the relational model. The basic building blocks in Anchor Modeling
are anchors, knots, attributes, and ties. A meta model for Anchor Modeling in UML can be seen in Figure 1.
Definition 2.1 (Identities). Let I be an infinite set of symbols, which are used as identities.
Definition 2.2 (Data type). Let D be a data type. The domain of D is a set of data values.
Definition 2.3 (Time type). Let T be a time type. The domain of T is a set of time values.


2.1. Anchors
    An anchor represents a set of entities, such as a set of actors or events. See Figure 2.
Definition 2.4 (Anchor). An anchor A is a string. An extension of an anchor is a subset of I.
  An example of an anchor is AC Actor, see Figure 2. An example of an extension of AC Actor is {#4711,
#4712, #4713}, a set of identities.

2.2. Knots
    A knot is used to represent a fixed set of entities that do not change over time. While anchors are used
to represent arbitrary entities, knots are used to manage properties that are shared by many instances of
some anchor. A typical example of a knot is GEN Gender, see Figure 3, which includes two instances, ‘Male’
and ‘Female’. This property, gender, is shared by many instances of the AC Actor anchor, thus using a knot
minimizes redundancy. Rather than repeating the strings a single bit per instance is sufficient.
                                                       2
ANCHOR
                                                                               1                    1
        DATA TYPE                   range                                                                            type
    1                  1
                                                1..*                                                                                         0..*
              1                                         domain               0..*        KNOT
                                   ATTRIBUTE                                                                        TIE     1 consists_of            ANCHOR ROLE
                                                                                    1                                                       2..*
                                                              range                            1
          range
                                                                                        type
                                                       0..*                              0..*
range
                                      KNOTTED ATTRIBUTE                             KNOT ROLE       consists_of KNOTTED TIE
                                                                                                    1..*      1

    0..*
   STATIC         HISTORIZED   KNOTTED STATIC KNOTTED HISTORIZED                                      KNOTTED       KNOTTED        HISTORIZED TIE       STATIC TIE
  ATTRIBUTE        ATTRIBUTE     ATTRIBUTE        ATTRIBUTE                                        HISTORIZED TIE   STATIC TIE

                                                                      0..*      time range           0..*                                     0..*
              0..*     0..*
                                                                              1 TIME TYPE
                                                                                           1
                                       time range                             1              time range
                                                                                               1                    time range


                                  Figure 1: A meta model for Anchor Modeling in UML.

                                                                             AC_Actor

                                      Figure 2: An anchor is shown as a filled square.

                                                                       GEN_Gender

                                     Figure 3: A knot is shown as an outlined square.


Definition 2.5 (Knot). A knot K is a string. A knot has a domain, which is I. A knot has a range, which
is a data type D. An extension of a knot K with range D is a bijective relation over I × D.
   An example of a knot is GEN Gender, see Figure 3. Its domain is I and its range is the data type string.
An example of an extension of this knot is the set of identity-value pairs { #0, ‘Male’ , #1, ‘Female’ }.

2.3. Attributes
    Attributes are used to represent properties of anchors. We distinguish between four kinds of attributes:
static, historized, knotted static, and knotted historized, see Figure 4. A static attribute is used to represent
arbitrary properties of entities (anchors), where it is not needed to keep the history of changes to the attribute
values. A historized attribute is used when changes of the attribute values need to be recorded. A value is
considered valid until it is replaced by one with a later time. Valid time [6] is hence represented as an open
interval with an explicitly specified beginning. The interval is implicitly closed when an instance with a later
valid time is added for the same anchor instance. A knotted static attribute is used to represent relationships
between anchors and knots, i.e. to relate an anchor to properties that can take on only a fixed, typically
small, number of values. Finally a knotted historized attribute is used when the relation to a value in the
knot is not stable but may change over time.
Definition 2.6 (Static Attribute). A static attribute BS is a string. A static attribute BS has an anchor A
for domain and a data type D for range. An extension of a static attribute BS is a relation over I × D.
   An example of a static attribute is ST LOC Stage Location, see Figure 4, with domain ST Stage and
the data type date as range. An example of an extension of this static attribute is the set of identity-value
pairs { #55, ‘Maiden Lane’ , #56, ‘Drury Lane’ }.
Definition 2.7 (Historized Attribute). A historized attribute BH is a string. A historized attribute BH
has an anchor A for domain, a data type D for range, and a time type T as time range. An extension of a
historized attribute BH is a relation over I × D × T.
                                                                              3
ST_LOC_Stage_Location                          ST_NAM_Stage_Name

                                   (a)                                         (b)

                                                                      AC_PLV_Actor_ProfessionalLevel
                       AC_GEN_Actor_Gender


                       GEN_Gender                                     PLV_ProfessionalLevel
                                   (c)                                         (d)

Figure 4: A static attribute (a) is shown as an outlined circle and a historized attribute (b) is shown as a
circle with a double outline. A knotted static attribute (c) is shown as an outlined circle and a knotted
historized attribute (d) is shown as a circle with a double outline, both of which must reference a knot.


    An example of a historized attribute is ST NAM Stage Name, see Figure 4, with the domain ST Stage,
the data type string as range, and the time type date as time range. An example of an extension of a
historized attribute is the set of identity-value-time point triples { #55, ‘The Globe Theatre’, 1599-01-01 ,
 #55, ‘Shakespeare’s Globe’, 1997-01-01 , #56, ‘Cockpit’, 1609-01-01 }.
Definition 2.8 (Knotted Static Attribute). A knotted static attribute BKS is a string. A knotted static
attribute BKS has an anchor A for domain and a knot K for range. An extension of a knotted static
attribute BKS is a relation over I × I.

   An example of a knotted static attribute is AC GEN Actor Gender, see Figure 4, with domain AC Actor
and range GEN Gender. An example of an extension of a knotted static attribute is the set of identity-identity
pairs { #4711, #0 , #4712, #1 }.
Definition 2.9 (Knotted Historized Attribute). A knotted historized attribute BKH is a string. A knotted
historized attribute BKH has an anchor A for domain, a knot K for range, and a time type T for time range.
An extension of a knotted historized attribute BKH is a relation over I × I × T.
   An example of a knotted historized attribute is AC PLV Actor ProfessionalLevel, see Figure 4, with
domain AC Actor, range PLV ProfessionalLevel, and the time type date for time range. An example of
an extension of a knotted historized attribute is the set of identity-identity-time point triples { #4711, #4,
1999-04-21 , #4711, #5, 2003-08-21 , #4712, #3, 1999-04-21 }.

2.4. Ties
    A tie represents associations between two or more anchor entities and optional knot entities. Similarly to
attributes, ties come in four variants, static, historized, knotted static, and knotted historized. See Figure 5.
As the same entity may appear more than once in a tie a occurrences need to be qualified using the concept
of roles.
Definition 2.10 (Anchor Role). An anchor role is a string. Every anchor role has a type, which is an
anchor.
   An example of an anchor role is atLocation, with type ST Stage.
Definition 2.11 (Knot Role). A knot role is a string. Every knot role has a type, which is a knot.
   An example of a knot role is having, with type PAT ParentalType.

Definition 2.12 (Static Tie). A static tie TS is a set of anchor roles. An instance tS of a static tie
TS = {R1 , . . . , Rn } is a set of pairs Ri , vi , i = 1, . . . , n, where vi ∈ I and n ≥ 2. An extension of a static
tie TS is a set of instances of TS .

                                                          4
PE_wasHeld_ST_atLocation                           ST_atLocation_PR_isPlaying

                                      (a)                                              (b)


                          AC_parent_AC_child_PAT_having                      AC_part_PR_in_RAT_got



                          PAT_ParentalType                                   RAT_Rating
                                      (c)                                              (d)

Figure 5: A static tie (a) is shown as a filled diamond and a historized tie (b) is shown as a filled diamond
with an extra outline. A knotted static tie (c) is shown as a filled diamond and a knotted historized tie (d) is
shown as a filled diamond with an extra outline, both of which must reference at least one knot.


    An example of a static tie is PE wasHeld ST atLocation = {wasHeld, atLocation}, see Figure 5, where
the type of wasHeld is PE Performance and the type of atLocation is ST Stage. An example of an extension
of a static tie is a set of sets of role-identity pairs {{ wasHeld, #911 , atLocation, #55 }, { wasHeld, #912 ,
 atLocation, #55 }, { wasHeld, #913 , atLocation, #56 }}.
Definition 2.13 (Historized Tie). A historized tie TH is a set of anchor roles and a time type T. An
instance tH of a historized tie TH = {R1 , . . . , Rn , T} is a set of pairs Ri , vi , i = 1, . . . , n and a time point
p, where vi ∈ I, p ∈ T, and n ≥ 2. An extension of a historized tie TH is a set of instances of TH .

    An example of a historized tie is ST atLocation PR isPlaying = {atLocation, isPlaying, date}, see
Figure 5, where the type of atLocation is ST Stage and the type of isPlaying is PR Program. An example
of an extension of a historized tie is a set of sets of role-identity pairs and a time point {{ atLocation,
#55 , isPlaying, #17 , 2003-12-13}, { atLocation, #55 , isPlaying, #23 , 2004-04-01}, { atLocation, #56 ,
 isPlaying, #17 , 2003-12-31}}.

Definition 2.14 (Knotted Static Tie). A knotted static tie TKS is a set of anchor and knot roles. An
instance tKS of a static tie TKS = {R1 , . . . , Rn , S1 , . . . , Sm } is a set of pairs Ri , vi , i = 1, . . . , n and Sj , wj ,
j = 1, . . . , m, where vi ∈ I, wj ∈ I, n ≥ 2, and m ≥ 1. An extension of a knotted static tie TKS is a set of
instances of TKS .
   An example of a knotted static tie is AC parent AC child PAT having = {parent, child, having}, see
Figure 5, where the type of parent and child is AC Actor and the type of having is PAT ParentalType. An
example of an extension of a knotted static tie is a set of sets of role-identity pairs {{ parent, #4711 , child,
#4713 , having, #1 }, { parent, #4712 , child, #4713 , having, #0 }}.
Definition 2.15 (Knotted Historized Tie). A knotted historized tie TKH is a set of anchor and knot roles
and a time type T. An instance tKH of a historized tie TKH = {R1 , . . . , Rn , S1 , . . . , Sm , T} is a set of pairs
 Ri , vi , i = 1, . . . , n, Sj , wj , j = 1, . . . , m, and a time point p, where vi ∈ I, wj ∈ I, p ∈ T, n ≥ 2, and
m ≥ 1. An extension of a knotted historized tie TKH is a set of instances of TKH .
   An example of a knotted historized tie is AC part PR in RAT got = {part, in, got, date}, see Figure 5,
where the type of part is AC Actor, the type of in is PR Program and the type of got is RAT Rating. An
example of an extension of a knotted historized tie is a set of sets of role-identity pairs and a time point
{{ part, #4711 , in, #17 , got, #8 , 2005-03-12}, { part, #4711 , in, #17 , got, #10 , 2008-08-29},
{ part, #4712 , in, #17 , got, #10 , 2006-10-17}}.
Definition 2.16 (Identifier). Let T be a (static, historized, knotted, or knotted historized) tie. An identifier
for T is a subset of T. Further, if T is a historized or knotted historized tie, where T is the time type in T,
every identifier for T must contain T.
                                                                5
An identifier is similar to a key in relational databases, i.e. it should be a minimal set of roles that
uniquely identifies instances of a tie.
Definition 2.17 (Anchor Schema). An anchor schema is a 13-tuple A, K, BS , BH , BKS , BKH , RA , RK , TS ,
TH , TKS , TKH , I , where A is a set of anchors, K is a set of knots, BS is a set of static attributes, BH is a set
of historized attributes, BKS is a set of knotted static attributes, BKH is a set of knotted historized attributes,
RA is a set of anchor roles, RK is a set of knot roles, TS is a set of static ties, TH is a set of historized ties,
TKS is a set of knotted static ties, TKH is a set of knotted historized ties, and I is a set of identifiers. The
following must hold for an anchor schema:
  (i)   for   every   attribute B ∈ BS ∪ BH ∪ BKS ∪ BKH , domain(B) ∈ A
 (ii)   for   every   attribute B ∈ BKS ∪ BKH , range(B) ∈ K
(iii)   for   every   anchor role RA ∈ RA , type(RA ) ∈ A
(iv)    for   every   knot role RK ∈ RK , type(RK ) ∈ K
 (v)    for   every   tie T in TS ∪ TH ∪ TKS ∪ TKH , T ⊆ RA ∪ RK
Definition 2.18 (Anchor Model). Let A = A, K, BS , BH , BKS , BKH , RA , RK , TS , TH , TKS , TKH , I be an
anchor schema. An anchor model for A is a quadruple extA , extK , extB , extT , where extA is a function
from anchors to extensions of anchors, extK is a function from knots to extensions of knots, extB is a
function from attributes to extensions of attributes, and extT is a function from ties to extensions of ties.
Let proji be the i th projection map. An anchor model must fulfil the following conditions:
  (i)   for every attribute B ∈ BS ∪ BH ∪ BKS ∪ BKH , proj1 (extB (B)) ⊆ extA (domain(B))
 (ii)   for every attribute B ∈ BKS ∪ BKH , proj2 (extB (B)) ⊆ proj1 (extK (range(B)))
(iii)   for every tie T and anchor role RA ∈ T , {v | RA , v ∈ extT (T )} ⊆ extA (type(RA ))
(iv)    for every tie T and knot role RK ∈ T , {v | RK , v ∈ extT (T )} ⊆ proj1 (extK (type(RK )))
 (v)    every extension of a tie T ∈ TS ∪ TH ∪ TKS ∪ TKH shall respect the identifiers in I, where an extension
        extT (T ) of a tie respects an identifier I ∈ I if and only if it holds that whenever two instances t1 and
        t2 of extT (T ) agree on all roles in I then t1 = t2 .

2.5. Naming Convention
    Entities in an anchor model can be named in any manner, but having a naming convention outlining how
names should be constructed increases understandability and simplifies working with a model. If the same
convention is used consistently over many installations the induced familiarity will speed up recognition by
several orders of magnitude. A good naming convention should fulfil a number of criteria, some of which
may conflict with others. Names should be short, but long enough be intelligible. They should be unique,
but have parts in common for related entities. They should be unambiguous, without too many rules having
to be introduced. Further, in order to be used in many different representations, the less “exotic” characters
they contain the better.

Definition 2.19 (Naming functions). The following are naming functions, further defined in Figure 6:
  (i)   NI : A ∪ K → string is a total injection
 (ii)   NIA : BS ∪ BH ∪ BKS ∪ BKH ∪ TS ∪ TH ∪ TKS ∪ TKH → string is a total injection
(iii)   NIK : BKS ∪ BKH ∪ TKS ∪ TKH → string is a total injection
(iv)    ND : BS ∪ BH ∪ K → string is a total injection
 (v)    NT : BH ∪ BKH ∪ TH ∪ TKH → string is a total injection

    The convention suggested in the table in Figure 6 use only regular letters, numbers, and the underscore
character. This ensures that the same names can be used in a variety of representations, such as in a relational
database, in XML, or in a programming language. Names consist of two parts, a mnemonic followed by a
descriptor. From the mnemonic, both the type of entity and relations to other entities can be deduced, and it
also serves as a unique identifier for an entity, which can be used as a shorter alias for the full name. While

                                                         6
Parts
                NT (T )    ::=   f : MT , dl, vf
                NIK (T )   ::=   f : MK, dl, id, dl, f : rK
                NIA (T )   ::=   f : MA, dl, id, dl, f : rA
                NT (B)     ::=   f : MB, dl, vf
                ND (B)     ::=   f: B
                NIK (B)    ::=   f : MK, dl, id
                NIA (B)    ::=   f : MA, dl, id
                ND (K)     ::=   f: K
                NI (K)     ::=   f : MK, dl, id
                NI (A)     ::=   f : MA, dl, id
                                                          Entities
                T          ::=   f : MT
                B          ::=   f : MB, dl, f : DB
                K          ::=   f : MK, dl, f : DK
                A          ::=   f : MA, dl, f : DA
                                                        Descriptors
                DB         ::= f : DA, dl, uwords
                DK         ::= uwords
                DA         ::= uwords
                                                        Mnemonics
                MT         ::=   f : MA, dl, f : rA , (dl, f : MA, dl, f : rA )+, (dl, f : MK, dl, f : rK )*
                MB         ::=   f : MA, dl, (upper | number){3}
                MK         ::=   (upper | number){3}
                MA         ::=   (upper | number){2}
                                                           Roles
                rK         ::= lwords
                rA         ::= lwords
                                                        Primitives
                vf         ::=   ‘ValidFrom’
                id         ::=   ‘ID’
                dl         ::=   ‘’
                lwords     ::=   (lower)+, (uword)?
                uwords     ::=   (upper, (lower | number)*)+
                number     ::=   [0-9]
                upper      ::=   [A-Z]
                lower      ::=   [a-z]

Figure 6: A suggested naming convention described using Extended Backus-Naur Form and a context
sensitive function f that fetches already defined strings while respecting the connections between entities in
an anchor model.


the mnemonic helps with the overview it does little for understanding what a model actually represents. The
descriptors, on the other hand, should be intelligible enough to give this understanding.
    The naming convention will yield unique but not unambiguous names for entities, i. e. the order of
strings in a tie may change. Further, the data type parts are named identically to the entity in which they
are contained and foreign keys keep the same name as the key they are referencing.



                                                              7
3. Running Example

    The scenario in this example is based on a business arranging stage performances. It is an extension
of the example discussed in [14]. An anchor schema for this example is shown in Figure 7. The business
entities are the stages on which performances are held, programs defining the actual performance, i e what is
performed on the stage, and actors who carry out the performances. Stages have two attributes, a name and
a location. The name may change over time, but not the location. Programs just have name, which describes
the act. If a program changes name it is considered to be a different act. Actors have a name, but since this
may be a pseudonym it could change over time. They also have a gender, which we assume does not change.
The business also keeps track of an actor’s professional level, which changes over time as the actor gains
experience. Actors get a rating for each program they have performed. If they perform consistently better
or worse, the rating changes to reflect this. Stages are playing exactly one program at any given time and
they change program every now and then. To simplify casting the parenthood between actors is needed so
that it is possible to determine if an actor is the mother or father of another actor. A performance is an
event bound in space and time that involves a stage on which a certain program is performed by one or more
actors. It is uniquely identified by when and where it was held. The managers of the business also need to
know how large the audience was at every performance and the total revenue gained from it.
    Four anchors PE Performance, ST Stage, PR Program and AC Actor capture the present entities.
Attributes such as PR NAM Program Name and PE DAT Performance Date capture properties of those
entities. Some attributes, e.g. AC NAM Actor Name and ST NAM Stage Name are historized, to capture
the fact that they are subject to changes. The fact that an actor has a gender, which is one of two values, is
captured through the knot GEN Gender and a knotted attribute called AC GEN Actor Gender. Similarly,
since the business also keeps tracks of the professional level of actors, the knot PLV ProfessionalLevel and
the knotted attribute AC PLV Actor ProfessionalLevel are introduced, which in addition is historized to
capture attribute value changes.
    Furthermore, the relationships between the anchors are captured through ties. In the example the
following ties are introduced: PE in AC wasCast, PE wasHeld ST atLocation, PE at PR wasPlayed, ST at-
Location PR isPlaying, AC part PR in RAT got, and AC parent AC child PAT having to capture the ex-
isting binary relationships.
    The historized tie ST atLocation PR isPlaying is used to capture the fact that stages change programs.
The tie AC part PR in RAT got is knotted to show that actors get ratings on the programs they are playing
and historized for capturing the changes in these ratings.
    A small black circle on a tie edge in Figure 7 indicates that the connected anchor is part of the primary
key for the tie, while a white circle indicates that it is not.


4. Guidelines for Designing Anchor Models

    Anchor Modeling has been used in a number of industrial projects in the insurance business and retail
business, and for the most part in data warehouse environments. Based on this experience and a number of
well-known modeling problems such as reification of relationships [11], power types [8], update-anomalies
in databases due to functional dependencies between too many attributes modeled within one entity in
conceptual schemas [30], the following guidelines have been formulated. These provide a deeper understanding
of the approach as well as support the building of anchor models in a way that gives desired properties.

4.1. Modeling Core Entities and Transactions
    Core entities in the domain of interest should be represented as anchors in an anchor model. A well-known
problem in conceptual modeling is to determine whether a transaction should be modeled as a relationship or
as an entity [11]. In anchor models, the question is formulated as determining whether a transaction should
be modeled as an anchor or as a tie. When a transaction has some property, like PE DAT Performance Date
in Figure 7, it should be modeled as an anchor. It can be modeled as a tie only if the transaction has no
properties.

                                                      8
PLV_ProfessionalLevel
                                                                                                 GEN_Gender

                           AC_PLV_Actor_ProfessionalLevel
                                                                                        AC_GEN_Actor_Gender
                     PAT_ParentalType

                                                                                              AC_NAM_Actor_Name
              AC_parent_AC_child_PAT_having
                                                                              AC_Actor

          PE_in_AC_wasCast

                                        RAT_Rating                            AC_part_PR_in_RAT_got




                   PE_at_PR_wasPlayed                                         PR_Program



              PE_Performance                                                   ST_atLocation_PR_isPlaying




              PE_wasHeld_ST_atLocation                                        ST_Stage



                               ST_NAM_Stage_Name                                    ST_LOC_Stage_Location



                             Anchor   Knot   (Knotted) Static Attribute         (Knotted) Static Tie

                                             (Knotted) Historized Attribute     (Knotted) Historized Tie



               Figure 7: An example anchor model illustrating different modeling concepts.


Guideline 1: Use anchors for modeling core entities and transactions. A transaction can only be modeled
as a tie if it has no properties.

4.2. Modeling Attribute Values
    Historized attributes are used when versioning of attribute values are of importance. A data warehouse,
for instance, is not solely built to integrate data but also to keep a history of changes that have taken place.
In anchor models, historized attributes take care of versioning by coupling versioning/history information to
an attribute value.
Guideline 2a: Use a historized attribute if versioning of attribute values are of importance, otherwise use a
static attribute.
   A knot represents a type with a fixed set of instances that do not change over time. Conceptually a
knot can be thought of as a an anchor with exactly one static attribute combined into a single entity. In
many respects, knots are similar to the concept of power types [8], which are types with a fixed and small

                                                            9
set of instances representing categories of allowed values. In Figure 4 the anchor AC Actor gets its gender
attribute via a knotted static attribute, AC GEN Actor Gender, rather than storing the actual gender value
(i.e. the string ‘Male’ or ‘Female’) of an actor directly in a static attribute. The advantage of using knots is
reuse, as attributes can be specified through references to a knot identifier instead of a value. The latter is
undesirable because of the redundancy it introduces, i.e. long attribute values have to be repeated resulting
in increased storage requirements and update anomalies.
Guideline 2b: Use a knotted attribute if attribute values represent categories or can take on only a fixed
small set of values, otherwise use a regular attribute.
   Guidelines 2a and 2b may be combined, i.e. when both attribute values represent categories and the
versioning of these categories are of importance.
Guideline 2c: Use a knotted historized attribute if attribute values represent categories or a fixed small set
of values and the versioning of these are of importance.

4.3. Modeling Relationships
    A static tie is used for relationships that do not change over time. For example, the actors who took part
in a certain performance will never change. Typically, anchors modeling transactions or events statically relate
to other anchors, i.e. a transaction or event happens at a particular time and does not change afterwards. A
historized tie is used for relationships that change over time. For example, the program played at a specific
stage will change over time. At any point in time, exactly one relationship will be valid.
Guideline 3a: Use a historized tie if a relationship may change over time, otherwise use a static tie.
   A knotted tie is used for relationships where the instances fall within certain categories. For example, if
we have two anchors, AC Actor and PR Program, a relationship between the two may be categorized as
‘Good’, ‘Bad’ or ‘Mediocre’ indicating how well the actor performed the program. A knot RAT Rating is
used to model such categories.
Guideline 3b: Use a knotted tie if the instances of a relationship belong to certain categories, otherwise use
a regular tie.
    Guidelines 3a and 3b can also be combined in the case when the category a relationship belongs to may
change over time.
Guideline 3c: Use a knotted historized tie if the instances of a relationship belong to certain categories and
the relationship may change over time.

4.4. Modeling Changing States
    To be able to manage states of instances in the extensions of anchors, attributes, and ties in an anchor
model, knots have to be used that model the different states. For example, the example in Section 3 could be
extended to handle the situations when an actor resigns, when an actor stops having a professional level due
to inactivity, or when a stage is not playing any program at all. The proper way to handle such situations in
an anchor model is to introduce a knot encoding values of the type ‘valid’ or ‘not valid’. Since historized
attributes and ties only model the valid time of a relationship they cannot capture the fact that an attribute
value or relationship is no longer valid.
Guideline 4a: Use a knotted historized attribute if an entity has changing states.
    For example the knot ACT Active with the extension { #0, ‘Resigned’ , #1, ‘Active’ } and a knotted
historized attribute AC ACT Actor Active can be used to determine whether an actor is still active or not.
Guideline 4b: Use an additional knotted historized attribute if an attribute has changing states, otherwise
use a single attribute.
   For example the knot PLE ProfessionalLevelExpired with the extension { #0, ‘Expired’ , #1, ‘Valid’ }
and a knotted historized attribute AC PLE Actor ProfessionalLevelExpired can be used to determine whether
an actor’s professional level has expired or not.


                                                      10
Guideline 4c: Use a knotted historized tie if a relationship has changing states, otherwise use a (knotted)
static tie.
   For example the knot PST PlayingStatus with the extension { #0, ‘Started’ , #1, ‘Stopped’ } and a
knotted historized tie ST atLocation PR isPlaying PST currentStatus can be used to determine the durations
with which programs are playing at the stages.

4.5. Modeling Large Relationships
    A lot of the advantages of Anchor Modeling such as no partial attributes [? ] (which translate to no
null-values in a relational database), no update anomalies in relational databases based on Anchor Modeling
etc. relies on a high degree of decomposition, where attributes are modelled as entities of their own and
relationships are made as small as possible.
    For any tie having three or more roles, it may in some cases be necessary to model it as several smaller ties.
Three of the ties in the example PE in AC wasCast, PE wasHeld ST atLocation, and PE at PR wasPlayed,
see Section 3, could have been modeled as a single tie PE in AC wasCast ST atLocation PR wasPlayed.
However, this is not possible if the instances may be only partially complete, i.e. lacking information for
some of the roles. By breaking this relationship into three ties, the information about which actors partook
in a certain performance can be added independently and at different times from where it was held and what
was played.
Guideline 5a: Use several ties, if partially complete instances may result from a large relationship.
    Another tie AC part PR in RAT got modeling the rating that an actor has got for playing a part in a cer-
tain program could be extended to having two different kinds of ratings: artistic performance and technical per-
formance, both of which could be modeled as references to one and the same knot RAT Rating with the exten-
sion { #1, ‘Bad’ , #2, ‘Mediocre’ #2, ‘Good’ }. The new tie AC part PR in RAT artistic RAT technical
having the extension {{ part, #4711 , in, #17 , artistic, #2 , technical, #2 , 2005-03-12}, { part, #4711 ,
 in, #17 , artistic, #3 , technical, #2 , 2008-08-29}}}, suffers from the modeling counterpart of update
anomalies [30], e.g. we will not know to which of the two ratings (artistic or technical) the historization
date refers. The situation occurs in historized ties when more than one role is not in the identifier for the
tie. If this situation occurs it is better to model the relationship as several ties for which those that are
still historized have exactly one role outside the identifier. In the example given this would correspond to
breaking up the tie into two ties AC part PR in RAT artistic and AC part PR in RAT technical, for which
there can be no doubt as to what has changed between versions.
Guideline 5b: Include in a (knotted) historized tie at most one role outside the identifier.
    For a knotted historized tie having one or more knot roles, it is sometimes desirable to remodel the tie
such that it only contains anchor roles. This can be achieved by the introduction of a new anchor on which
the knots instead appear through knotted attributes and a role for that anchor that replaces the knot roles
in the tie.
    Consider a situation in which we have a knotted historized tie of high cardinality with all anchor roles and all
but one knot role in its identifier. For every new version that is added only the historization part of the identifier
change and the rest of it will be duplicated, and if that remainder is long it leads to unnecessary duplication of
data. For example, let an extension of the tie be {{ RA1 , #1 , . . . , RAn , #n , RK , #21 , 2010-02-25 10:52:21},
{ RA1 , #1 , . . . , RAn , #n , RK , #22 , 2010-02-25 10:52:22}}. A remodeling to a static tie is possible. The
new tie has {{ RA1 , #1 , . . . , RAn , #n , RA new , #538 }}, the introduced anchor has {#538}, and the intro-
duced knotted historized attribute has { #538, #21, 2010-02-25 10:52:21 , #538, #22, 2010-02-25 10:52:22 }
as extensions. The new model duplicates less data and makes the retrieval of the history over changes in the
knotted identities (and values if joined) optional, but it also puts the values “farther away” from the tie.
Whether or not to remodel will depend on the amount of duplication and how fast this duplication will occur
by the speed with which new versions arrive.
Guideline 5c: Replace the knot roles in a knotted historized tie with an anchor role and introduce its type
as a new anchor with corresponding knotted historized attributes if it has one or more knot roles outside the
identifier and the introduction of new versions leads to excessive duplication of data.

                                                        11
5. From Anchor Model to Relational Database

    An anchor schema can be transformed into a relational database schema through the translation rules
given below. For each construct in the anchor model there exists a translation rule that maps the construct
into a relational database construct such as relation schema (table definition), relation (table), column, row
etc. The names used in the relational database are derived from the naming convention in Section 2.5.
The highly decomposed relational database schemas that result from anchor models facilitate traceability
through metadata, capturing information such as creator, source, and time of creation. Although important,
metadata is not discussed further since its use does not differ from that of other modeling techniques.

5.1. Anchors
   An anchor A is mapped to a relation scheme A(S), where S = NI (A). The domain of S is I. S is the
primary key of the relation scheme A(S).

                                                AC Actor
                                                AC ID (PK)
                                                #4711
                                                #4712
                                                #4713
                                                integer

                           Figure 8: Example rows from the anchor AC Actor.

   For example the anchor AC Actor with integer as domain, see Figure 8, corresponds to the relation
scheme AC Actor(AC ID).

5.2. Knots
   A knot K with range D is mapped to a relation scheme K(I, V ), where I = NI (K) and V = ND (K). The
domain of I is I, and the domain of V is D. I is the primary key.

                                         GEN Gender
                                         GEN ID (PK)       GEN Gender
                                         #0                ‘Male’
                                         #1                ‘Female’
                                         bit               string

                           Figure 9: Example rows from the knot GEN Gender.

   For example the knot GEN Gender with string as range and an extension over bit × string, see
Figure 9, corresponds to the relation scheme GEN Gender(GEN ID, GEN Gender).

5.3. Static Attributes
   A static attribute BS with domain A and range D is mapped to a relation scheme B(I, V ), where
I = NIA (A) and V = ND (B). The domain of I is I and the domain of V is D. The column I is a primary
key of B and a foreign key with respect to the relation scheme A(S).
   For example the attribute PE DAT Performance Date with domain PE Performance, datetime as
range, and an extension over integer × datetime, see Figure 10, corresponds to the relation scheme
PE DAT Performance Date(PE ID, PE DAT Performance Date).




                                                      12
PE DAT Performance Date
                                 PE ID (PK, FK)    PE DAT Performance Date
                                 #911              1994-12-10 20:00
                                 #912              1994-12-15 21:30
                                 #913              1994-12-22 19:30
                                 integer           datetime

             Figure 10: Example rows from the static attribute PE DAT Performance Date.


5.4. Historized Attributes
   A historized attribute BH with domain A, range D and time range T is mapped to a relation scheme
B(I, V, P ), where I = NIA (A), V = ND (B) and P = NT (B). The domain of I is I, the domain of V is D
and the domain of P is T. The columns I and P together constitute a primary key of the relation scheme B.
The column I is a foreign key with respect to the relation scheme A(S).

                      AC NAM Actor Name
                      AC ID (PK, FK)    AC NAM Actor Name        AC NAM ValidFrom (PK)
                      #4711             ‘Arfle B. Gloop’          1972-08-20
                      #4711             ‘Voider Foobar’          1999-09-19
                      #4712             ‘Nathalie Roba’          1989-09-21
                      integer           string                   date

              Figure 11: Example rows from the historized attribute AC NAM Actor Name.

   For example the historized attribute AC NAM Actor Name with AC Actor as domain, date as time
range, and an extension over integer × string × date, see Figure 11, corresponds to the relation scheme
AC NAM Actor Name(AC ID, AC NAM Actor Name, AC NAM ValidFrom).

5.5. Knotted Static Attributes
   A knotted static attribute BKS with domain A and range K is mapped to a relation scheme B(I1 , I2 ),
where I1 = NIA (A) and I2 = NIK (K). The domain of I1 is I and the domain of I2 is I. The column I1 is a
primary key of B and a foreign key with respect to the relation scheme A(S). The column I2 is a foreign key
with respect to the relation scheme K(I, V ).

                                        AC GEN Actor Gender
                                        AC ID (PK, FK)     GEN ID (FK)
                                        #4711              #0
                                        #4712              #1
                                        #4713              #1
                                        integer            bit

           Figure 12: Example rows from the knotted static attribute AC GEN Actor Gender.

    For example the knotted static attribute AC GEN Actor Gender with AC Actor as domain, GEN Gender
as range, and an extension over integer×bit, see Figure 12 corresponds to the relation scheme AC GEN Ac-
tor Gender(AC ID, GEN ID).

5.6. Knotted Historized Attributes
   A knotted historized attribute BKH with domain A, range K, and time range T is mapped to a relation
scheme B(I1 , I2 , P ), where I1 = NIA (A), I2 = NIK (K) and P = NT (B). The domain of I1 is I, the domain

                                                      13
of I2 is I and the domain of P is T. The columns I1 and P together constitute a primary key of the relation
scheme B. The column I1 is a foreign key with respect to the relation scheme A(S). The column I2 is a
foreign key with respect to the relation scheme K(I, V ).

                             AC PLV Actor ProfessionalLevel
                             AC ID (PK, FK)      PLV ID (FK)       AC PLV ValidFrom (PK)
                             #4711               #4                1999-04-21
                             #4711               #5                2003-08-21
                             #4712               #3                1999-04-21
                             integer             tinyint           date

     Figure 13: Example rows from the knotted historized attribute AC PLV Actor ProfessionalLevel.

   For example the knotted historized attribute AC PLV Actor ProfessionalLevel with AC Actor as domain,
PLV ProfessionalLevel as range, date as time range, and an extension over integer×tinyint×date, see Fig-
ure 13, corresponds to the relation scheme AC PLV Actor ProfessionalLevel(AC ID, PLV ID, AC PLV Valid-
From).

5.7. Static Ties
   A static tie TS = {R1 , . . . , Rn }, n ≥ 2, is mapped to a relation scheme T (M1 , . . . , Mn ) using an order-
preserving map, such that Ri is mapped to Mi given a total order on TS . Every column Mi is a foreign key
with respect to the relation scheme Ai (S), where Ai is the type of the anchor role Ri and Mi = NIA (Ai ).
The primary key of the relation scheme T is I where I is the identifier of the static tie TS .

                                     PE in AC wasCast
                                     PE ID in (PK, FK)     AC ID wasCast (PK, FK)
                                     #911                  #4711
                                     #912                  #4711
                                     #913                  #4712
                                     integer               integer

                       Figure 14: Example rows from the static tie PE in AC wasCast.

   For example the static tie PE in AC wasCast = {in, wasCast} having the type of in as the anchor
PE Performance, the type of wasCast as the anchor PE Performance, and an extension over integer ×
integer, see Figure 14, corresponds to the relation scheme PE in AC wasCast(PE ID in, AC ID wasCast),
with the primary key PE ID in, AC ID wasCast.

5.8. Historized Ties
    A historized tie TH = {R1 , . . . , Rn , P }, n ≥ 2 is mapped to a relation scheme T (M1 , . . . , Mn , P ) using an
order-preserving map, such that Ri is mapped to Mi given a total order on TH . Every column Mi is a foreign
key with respect to the relation scheme Ai (S), where Ai is the type of the anchor role Ri , Mi = NIA (Ai ),
and P is a column with the domain T. The primary key of the relation scheme T is I where I is the identifier
of the historized tie TH .
    For example the historized tie ST atLocation PR isPlaying = {atLocation, isPlaying, date} having the
type of atLocation as the anchor ST Stage, the type of isPlaying as the anchor PR Program, and an extension
over integer × integer × date, see Figure 15, corresponds to the relation scheme ST atLocation PR is-
Playing(ST ID atLocation, PR ID isPlaying, ST atLocation PR isPlaying ValidFrom), with the primary
key ST ID atLocation, ST atLocation PR isPlaying ValidFrom.



                                                           14
ST atLocation PR isPlaying
          ST ID atLocation (PK, FK)      PR ID isPlaying (FK)   ST atLocation PR isPlaying ValidFrom (PK)
          #55                            #17                    2003-12-13
          #55                            #23                    2004-04-01
          #56                            #17                    2003-12-31
          integer                        integer                date

                Figure 15: Example rows from the historized tie ST atLocation PR isPlaying.


5.9. Knotted Static Ties
    A knotted static tie TKS = {R1 , . . . , Rn , S1 , . . . , Sm }, n ≥ 2 and m ≥ 1, is mapped to a relation scheme
T (M1 , . . . , Mn , N1 , . . . , Nm ) using an order-preserving map, such that Ri is mapped to Mi and Sj is mapped
to Nj given a total order on TKS . Every column Mi is a foreign key with respect to the relation scheme
Ai (S), where Ai is the type of the anchor role Ri and Mi = NIA (Ai ). Every column Nj is a foreign key with
respect to the relation scheme Kj (I, V ), where Kj is the type of the knot role Sj and Nj = NIK (Kj ). The
primary key of the relation scheme T is I where I is the identifier of the knotted static tie TKS .

                     AC parent AC child PAT having
                     AC ID parent (PK, FK)     AC ID child (PK, FK)    PAT ID having (PK, FK)
                     #4711                     #4713                   #1
                     #4712                     #4713                   #0
                     #4712                     #4714                   #0
                     integer                   integer                 bit

          Figure 16: Example rows from the knotted static tie AC parent AC child PAT having.

   For example the knotted static tie AC parent AC child PAT having = {parent, child, having} having
the type of parent and child as the anchor AC Actor, the type of having as the knot PAT ParentalType,
and an extension over integer × integer × bit, see Figure 16, corresponds to the relation scheme AC par-
ent AC child PAT having(AC ID parent, AC ID child, PAT ID having), with the primary key AC ID parent,
AC ID child, PAT ID having.

5.10. Knotted Historized Ties
   A knotted historized tie TKH = {R1 , . . . , Rn , S1 , . . . , Sm , P }, n ≥ 2 and m ≥ 1, is mapped to a relation
scheme T (M1 , . . . , Mn , N1 , . . . , Nm ) using an order-preserving map, such that Ri is mapped to Mi and Sj is
mapped to Nj given a total order on TKH . Every column Mi is a foreign key with respect to the relation
scheme Ai (S), where Ai is the type of the anchor role Ri and Mi = NIA (Ai ). Every column Nj is a foreign
key with respect to the relation scheme Kj (I, V ), where Kj is the type of the knot role Sj and Nj = NIK (Kj ).
The column P has the domain T. The primary key of the relation scheme T is I where I is the identifier of
the knotted historized tie TKH .

      AC part PR in RAT got
      AC ID part (PK, FK)      PR ID in (PK, FK)    RAT ID got (FK)    AC part PR in RAT got ValidFrom (PK)
      #4711                    #17                  #8                 2005-03-12
      #4711                    #17                  #10                2008-08-29
      #4712                    #17                  #10                2006-10-17
      integer                  integer              tinyint            date

              Figure 17: Example rows from the knotted historized tie AC part PR in RAT got.



                                                          15
For example the knotted historized tie AC part PR in RAT got = {part, in, got, date} having the type of
part as the anchor AC Actor, the type of in as the anchor PR Program, the type of got as the knot RAT Rating,
and an extension over integer × integer × tinyint × date, see Figure 17, corresponds to the relation
scheme AC part PR in RAT got(AC ID part, PR ID in, RAT ID got, AC part PR in RAT got ValidFrom),
with the primary key AC ID part, PR ID in, AC part PR in RAT got ValidFrom.

6. Physical Implementation

    In this section the physical implementation of an anchor database is discussed.

6.1. Indexes in an Anchor Database
    Indexes in general, and clustered indexes in particular, are used to delimit the range of a table scan or
remove the need for a table scan altogether as well as removing or delimiting the need to sort searched-for-data.
A clustered index is an index with a sort order equal to the physical sort order on disc of the table to which
the clustered index refers. In this section we will discuss what indexes on the various constructs in an anchor
database will look like and why it is important to have these indexes from a performance point of view. See
Figure 21 to see how some indexes are created for different table types in an anchor database.

6.1.1. Indexes on Anchor and Knot Tables
    Since both anchor and knot tables are referenced from a number of other tables (attribute and tie tables),
a unique clustered index should be created on the primary key in each anchor and knot table to ensure that
the number of records that have to be scanned in the queried tables will not be too large, e.g. the number of
time-costly disc-accesses will be minimal. During query execution a unique index over the values in a knot
table helps to improve performance when a condition is set on the column. Searching can then stop once a
match has been found.

6.1.2. Indexes on Attribute Tables
    An attribute table contains a foreign key against an anchor table and optionally information on historization.
A composite unique clustered index should be created on the foreign key column in ascending order together
with the historization column in descending order1 . This means that on the physical media where the table
data is stored the latest version for any given identity comes first. This will ensure more efficient retrieval of
data since a good query optimizer will realize that a sort operation is unnecessary. The optimizer will also
utilize the created clustered indexes when joining attribute tables with their anchor table, ensuring that no
sorting has to be made since both the anchor and attribute tables are already physically ordered on disc.
This also means that the accesses to the storage media can be done in large sequential chunks, provided that
data is not fragmented on the storage media itself.

6.1.3. Indexes on Tie Tables
    Tie tables contain two or more foreign keys against an anchor table, one or more foreign keys towards
knot tables, and optionally information on historization. Differently from anchor, knot and attribute tables,
the indexes on tie tables are not given. A tie table should always have a unique clustered index over the
primary key, however the order in which the columns appear can not be unambiguously determined from the
identifier of the tie. An order has to be selected that fits the need best, perhaps according to what type of
queries will be the most common. In some cases an additional index with another ordering of the columns
may be necessary.
    In our running example in Section 3 one historized knotted tie AC part PR in RAT got models the
current rating that an actor has got for performing a certain program, together with the time at which this

   1 Note that the indexing order of the columns is of less importance. Database engines read data in large chunks into memory

and reading forwards or backwards in memory is equally effective. We do, however, in order to be consistent, use the same
ordering in all of our indexes.

                                                             16
rating became relevant. The unique clustered index of the corresponding tie table is composed of the foreign
keys against the anchor tables in ascending order, together with the historization column in descending order.
This index will ensure that we do not have to scan the entire AC Actor table for each row of the entire
PR Program table in order to find the values that we are looking for.

6.1.4. Partitioning of Tables
    In addition to indexes, access to the tables in the anchor database may be improved through partitioning.
For example, if the number of actors and programs are very large and you mostly access the ones with
the highest rating (see the tie table above), you can further improve performance by partitioning the tie
table based on the rating. Partitioning, if the database engines support it, will split the underlying data on
the physical media into as many chunks as the cardinality of the knot. This allows for having frequently
accessed partitions on faster discs and less data to scan during query execution. The two best candidates for
partitioning in an anchor database are knot values and the time type used for historization (i.e. we assume
that users tend to access certain values of knots more frequently than other values, as well as certain dates
being of more or less importance).

6.2. Loading Practices
    When loading data into an anchor database a zero update strategy is used. This means that only insert
statements are allowed and that data is always added, never updated. Delete statements are allowed only
when applied to remove erroneous data. A complete history is thereby stored for accurate information [20].
If you have non-persistent entities or relations in your model you should model these with a state knot. That
way you will be keeping a history of when they ceased to exist. You must keep this strategy in mind when
building your model.
    A zero update strategy together with properly maintained metadata also alleviates the need for transactions.
Through metadata it is always possible to find the rows that belong to a certain batch, came from a specific
source system, were loaded by a particular account, or were added between some given dates. Instead of
having a fault recovery system in place every time data is added, it can be prepared and used only after an
error is found. Since nothing has been updated, the introduced errors are in the form of rows, and these can
easily be found. Scripts can be made that allow for removal of erroneous rows, regardless of when they were
added. Data will never change once it is in place. Its integrity is guaranteed.
    A disadvantage that is avoided by not using updates is their higher cost in terms of performance when
compared to insert statements. By not doing updates there is also no row locking on existing rows, a fact
that lessens the impact on read performance when writing is done in the database.

6.3. Views and Functions
    Due to the large number of tables and the added complexity from handling historical data, an abstraction
layer in the form of views and functions is added to simplify querying in an anchor database. It de-normalizes
the anchor database and retrieves data from a given temporal perspective. There are three different types
of views and functions for each anchor table corresponding to the most common use cases when querying
data: latest view, point-in-time function, and interval function [26, 20], all of which are based on an abstract
complete view. Views and functions for tie tables are created in a way analogous with those for anchor tables.
See the right side of Figure 21 for a couple of examples of how these views and functions are created.

6.3.1. Complete View
    The complete view of an anchor table is a de-normalization of it and its associated attribute tables. It is
constructed by left outer joining the anchor table with all its associated attribute tables.

6.3.2. Latest View
    The latest view of an anchor table is a view based on the complete view, where only the latest values for
historized attributes are included. In order to find the latest version a sub-select is added that ensures that
the time point used for historization is the latest one for each identity. See Figure 18.

                                                      17
select * from lST Stage
                 ST ID      ST NAM Stage Name         ST NAM ValidFrom   ST LOC Stage Location
                 #55        ‘Shakespeare’s Globe’     1997-01-01         ‘Maiden Lane’
                 #56        ‘Cockpit’                 1609-01-01         ‘Drury Lane’
                 integer    string                    date               string

            Figure 18: Example rows from the latest view lST Stage for the anchor ST Stage.


6.3.3. Point-in-time Function
    The point-in-time function is a function for an anchor table with a time point as an argument returning
a data set. It is based on the complete view where for each attribute only its latest value before or at the
given time point is included. A sub-select is added that ensures that the historization time is the latest one
earlier than or on the given time point for each identity. See Figure 19.

                 select * from pST Stage(‘1608-01-01’) where ST NAM Stage Name is not null
                 ST ID      ST NAM Stage Name         ST NAM ValidFrom   ST LOC Stage Location
                 #55        ‘The Globe Theatre’       1599-01-01         ‘Maiden Lane’
                 integer    string                    date               string

Figure 19: Example rows from the point-in-time function pST Stage given the time point ‘1608-01-01’ for
the anchor ST Stage.

    The absence of a row for the stage on Drury Lane (see the previous Figure 18) in the returned data set of
Figure 19 indicates that no data is present for it (which in this case depends on the fact that the theatre was
not yet built in 1608). Without the condition on the column ST NAM Stage Name a row with a null value
in that column would have been shown.

6.3.4. Interval Function
    The interval function is a function for an anchor table taking two time points as arguments and returning
a data set. It is based on the complete view where for each attribute only values between the given time
points are included. Here the sub-select must ensure that the time point used for historization lies within the
two provided time points. See Figure 20.

                 select * from dST Stage(‘1598-01-01’, ‘1998-01-01’)
                 ST ID      ST NAM Stage Name         ST NAM ValidFrom   ST LOC Stage Location
                 #55        ‘The Globe Theatre’       1599-01-01         ‘Maiden Lane’
                 #55        ‘Shakespeare’s Globe’     1997-01-01         ‘Maiden Lane’
                 #56        ‘Cockpit’                 1609-01-01         ‘Drury Lane’
                 integer    string                    date               string

 Figure 20: Example rows from the interval function dST Stage in the interval given by the time points
‘1598-01-01’ and ‘1998-01-01’ for the anchor ST Stage.


6.3.5. Natural Key View
    The natural key of the anchor PE Performance is composed of when and where it was held. When it
was held can be found in the attribute PE DAT Performance Date and where it was held in the attribute
ST LOC Stage Location. A performance is uniquely identified by the values in these two attributes. However,
these attributes belong to different anchors, which is commonplace in an anchor model. The parts that make
up a natural key may be found in attributes spread out over several anchors.

                                                          18
1    create table AC Actor (                             58     create view lAC part PR in RAT got as
2       AC ID int not null,                              59     select
3       primary key clustered (                          60          [AC PR RAT].AC ID part,
4          AC ID asc                                     61          [AC PR RAT].PR ID in,
5       )                                                62          [AC PR RAT].RAT ID got,
6    );                                                  63          [RAT].RAT Rating,
7    create table GEN Gender (                           64          [AC PR RAT].AC part PR in RAT got ValidFrom
8       GEN ID tinyint not null,                         65     from
9       GEN Gender varchar(42) not null unique,          66         AC part PR in RAT got [AC PR RAT]
10      primary key clustered (                          67      left join
11         GEN ID asc                                    68         RAT Rating [RAT]
12      )                                                69     on
13   );                                                  70          [RAT].RAT ID = [AC PR RAT].RAT ID got
14   create table AC GEN Actor Gender (                  71     where
15      AC ID int not null foreign key                   72          [AC PR RAT].AC part PR in RAT got ValidFrom = (
16         references AC Actor ( AC ID ),                73          select
17      GEN ID tinyint not null foreign key              74             max(sub.AC part PR in RAT got ValidFrom)
18         references GEN Gender ( GEN ID ),             75         from
19      primary key clustered ( AC ID asc )              76             AC part PR in RAT got [sub]
20   );                                                  77         where
21   create table AC NAM Actor Name (                    78             [sub ]. AC ID part = [AC PR RAT].AC ID part
22      AC ID int not null foreign key                   79         and
23         references AC Actor ( AC ID ),                80             [sub ]. PR ID in = [AC PR RAT].PR ID in
24      AC NAM Actor Name varchar(42) not null,          81     );
25      AC NAM ValidFrom date not null,                  82     create function pAC Actor (
26      primary key clustered (                          83         @timepoint date
27         AC ID asc,                                    84     ) returns table return
28         AC NAM ValidFrom desc                         85     select
29      )                                                86          [AC].AC ID,
30   );                                                  87          [AC GEN].GEN ID,
31   create table PR Program (                           88          [GEN].GEN Gender,
32      PR ID int not null,                              89          [AC NAM].AC NAM Actor Name,
33      primary key clustered (                          90          [AC NAM].AC NAM ValidFrom
34         PR ID asc                                     91     from
35      )                                                92         AC Actor [AC]
36   );                                                  93      left join
37   create table RAT Rating (                           94         AC GEN Actor Gender [AC GEN]
38      RAT ID tinyint not null,                         95     on
39      RAT Rating varchar(42) not null unique,          96          [AC GEN].AC ID = [AC].AC ID
40      primary key clustered (                          97      left join
41         RAT ID asc                                    98         GEN Gender [GEN]
42      )                                                99     on
43   );                                                  100         [GEN].GEN ID = [AC GEN].GEN ID
44   create table AC part PR in RAT got (                101     left join
45      AC ID part int not null foreign key              102        AC NAM Actor Name [AC NAM]
46         references AC Actor ( AC ID ),                103    on
47      PR ID in int not null foreign key                104         [AC NAM].AC ID = [AC].AC ID
48         references PR Program ( PR ID ),              105    and
49      RAT ID got tinyint not null foreign key          106         [AC NAM].AC NAM ValidFrom = (
50         references RAT Rating ( RAT ID ),             107         select
51      AC part PR in RAT got ValidFrom date not null,   108            max([sub].AC NAM ValidFrom)
52      primary key clustered (                          109        from
53         AC ID part asc,                               110            AC NAM Actor Name [sub]
54         PR ID in asc,                                 111        where
55         AC part PR in RAT got ValidFrom desc          112            [sub ]. AC ID = [AC].AC ID
56      )                                                113        and
57   );                                                  114            [sub ]. AC NAM ValidFrom <= @timepoint
                                                         115    );



     Figure 21: Definitions in Transact-SQL, illustrating how primary keys, clustered indexes, and other constraints
     are defined in order to enable table elimination.


         When data is added to an anchor database it has to be determined if it is for an already known entity or
     for a new one. This is done by looking at the natural key, and as this often has to be done frequently, views
     for each anchor table is provided that act as a look-ups for its natural keys. Such a view joins the necessary
     anchor, attribute, and tie tables in order to translate natural key values into their corresponding identifiers.
     Note that it is possible for a single anchor to have several differently composed natural keys. Parts of the



                                                               19
key may also consist of historized attributes, in which case the view may have several rows for the same
identifier. An example of a natural key view can be seen in Figure 22.

                         select * from nPE Performance
                         PE ID      PE DAT Performance Date    ST LOC Stage Location
                         #911       1994-12-10 20:00           ‘Maiden Lane’
                         #912       1994-12-15 21:30           ‘Maiden Lane’
                         #913       1994-12-22 19:30           ‘Drury Lane’
                         integer    date                       string

Figure 22: Example rows from a natural key view nPE Performance for the anchor PE Performance, mapping
the identity PE ID to its natural key, which is composed of the attributes PE DAT Performance Date and
ST LOC Stage Location.


6.4. Utilizing Table Elimination
    Modern query optimizers utilize a technique called table (or join) elimination [18], which in practice
implies that tables containing not selected attributes in queries are automatically eliminated. The optimizer
will remove table T from the execution plan of a query if the following two conditions are fulfilled:

  (i) no column from T is explicitly selected
 (ii) the number of rows in the returned data set is not affected by the join with T

    In order to take advantage of table elimination primary and foreign keys have to be defined in the database.
Some examples showing how this is done for anchor, knot, attribute and tie tables can be seen in the left
side of Figure 21. The views and functions defined in Section 6.3 are created in order to take advantage of
table elimination, see the right side of Figure 21 for a couple of examples. The anchor table is used as the
left table in the view (or function) with the attribute tables left outer joined. The left join ensures that the
number of rows retrieved is at least as many as in the anchor table, provided that no conditions are set in
the query. Furthermore, since the join is based on the primary key in the attribute table, uniqueness is also
ensured, hence the number of resulting rows is equal to the number of the rows in the anchor table. If for an
anchor having some attribtues, a subset of these are used in a query over the defined views or functions, the
others can be eliminated and not touched during execution.
    If a condition on the values in an attribute table is set in the query, such that it selects a proper subset
of the rows, the foreign key constraint ensures that all rows must have a corresponding instance in the
anchor table. The optimizer can use this fact to eliminate the anchor table and work with the subset as an
intermediate result set. Once this is done, the same logic as above applies, and attributes not present in the
query can be eliminated.
    Typical queries only retrieve a small number of attributes, which implies that table elimination is
frequently used, yielding reduced access time.


7. Performance in Anchor Databases

   There are several considerations when it comes to performance in anchor databases, in particular how
they perform when reading and writing data in data warehouse and OLTP environments. Experiments have
been carried out that test the behaviour of reading data in data warehouse environments, using queries that
scan large amounts of data. As for writing data in data warehouse environments as well as reading and
writing data in OLTP environments we do not have any experimental results but only our experience from
actual running environments. In these cases, performance has not been an issue.




                                                         20
7.1. Performance Influencing Conditions
    To summarize the results when reading data in a data warehouse environment there are situations where
an anchor database performs relatively better and worse than databases in which other modeling techniques
have been used. Given a body of information and the intended searches over it, the following conditions
comparatively improves performance in an anchor database with respect to a less normalized model. It is
assumed that this body of information can be described using entities, identifiers, attributes and values.
(a) The data is time dependent and its changes need to be kept.
(b) The data is sparse, i. e. if many entities lack some of their attribute values.
(c) The ratio between unique data values and their occurrences is low.
(d) The ratio between the amount of data and the amount of data residing in identifiers is high.
(e) There are many identifiers.
(f) There are many attributes.
(g) Most searches address only a small proportion of the attributes in the entities over which the search is
    done.
(h) Most searches include conditions that impose bounds on values.
    For data warehouse environments, many of the conditions above are typically fulfilled. It is often required
that a history is kept, that data is sparse by nature or by asynchronous arrival, that there are data that
can be used for categorization and have only a few unique values, that some data values are of considerable
length, that there are many rows in some tables, that some entities have many attributes, that most queries
use only a few attributes, and that most queries have conditions limiting the number of rows in the result
sets. However, these conditions can be fulfilled to different degrees. Therefore, we carried out an experiment
in order to determine at what degree an anchor database performs better than a less normalized database.

                                                                                    Scenario
                          Condition                                     1       2       3      4
                          (a)    Number of versions per instance        1       1       1      2
                          (b)    Degree of sparsity                     0%      0%      50%    50%
                          (c)    Number of knots per attributes         0       1/3     1/3    1/3
                          (d)    Average data/identifiers ratio          1/4     3/1     3/1    3/1
                                                                            Tested combinations
                          (e)    Number   of   rows                           0.1, 1, 10 million
                           (f)   Number   of   attributes                          6, 12, 24
                          (g)    Number   of   queried attributes              1, 1/3, 2/3, all
                          (h)    Number   of   conditioned attributes              none, all

Figure 23: A table showing to what degree different conditions were fulfilled in the four scenarios used in the
experiment and which combinations have been tested for each scenario. The scenarios and combinations
yield a total of 288 tests.


7.2. Experimental Verification
    The query times for varying degrees of condition fulfillment have been measured in a series of tests in
which an anchor database was compared with a less normalized database containing the same information.
The experiment consisted of four scenarios, see Figure 23 for which the data and requirements changed. In
each scenario a number of combinations were tested. In total 288 measurements per model were made and
the query times compared2 . The queries were run against the latest view in an anchor database and in a

   2 The tests were performed using an Intel Core 2 Quad 2.6GHz CPU, 8GB of RAM, and two SATA hard disks. The database

used was Microsoft SQL Server 2008, set up with data and tempdb on separate disks. Scripts and detailed results are available
from http://www.anchormodeling.com.

                                                             21
database where the anchor and all its attribute tables have been de-normalized into a single table. Ratios
that compare query times in the anchor database with those for the single table were calculated and can be
seen in the tables in Figure 24.
    In the first scenario the information consisted solely of one byte attributes together with a four byte key.
The number of unique attribute values were assumed to be at least as many as the number of identifiers,
which prevented us from using knotted attribute tables in the anchor database. In practice there will of
course be several repetitions of values since one byte can store much fewer unique values than the maximum
number of rows in the tests, but for the sake of getting a high ratio between unique values and the number of
their occurrences we decided not to make use of this fact. In the second scenario, the information consisted
of six different data types together with a four byte key. The average size of a data value was 12 bytes. In
the cases when the database had 12 and 24 attributes, these six types were repeated. In the third scenario,
to simulate sparseness half of the rows in every attribute table found in the second scenario was removed,
but the all instances in the anchor table kept. Finally, the fourth scenario was based on the third with the
addition of historization. One out of every six attributes was historized and one extra version was added for
each instance of the key in the table, effectively doubling the number of rows.

7.3. Explanation of the Results
    From the results it is obvious that there are situations where an anchor database outperforms a less
normalized database as well as situations where it does not. The more conditions that are fulfilled to some
degree the better the anchor database performs. In some cases, a single condition fulfilled to a degree
high enough is sufficient for an anchor database to outperform the single table database. This behaviour
can be seen in the graphs in Figure 25, which are extrapolated from the results of the experiment. Note
that the number of measurements in some cases are few and that the graphs should be seen as showing an
approximative behaviour. In the graphs a single condition is allowed to change, while all others remain fixed.
What is seen in the graphs can be explained as follows.
    In (a), see Figure 25a, historizing a column value in the single table database will duplicate information.
In an anchor database, only the table for the affected attribute will gain extra rows as new versions are
added. The impact on performance is evident when using time dependent queries, such as finding the latest
version or the version that was valid at a certain point in time. This is due to the fact that self-joins or
sub-selects have to be made and the narrow and optimally ordered attribute tables perform much better
than the wide single table, even though it had indexes on the historization columns.
    In (b), see Figure 25b, sparse data is represented by the absence of a row in an anchor database, thereby
reducing the amount of data that needs to be scanned during a query. In the single table database,“holes”
in the data will be represented by null values, which a query manager must inspect in order to take the
appropriate action. Therefore, the query time is constant for the single table database, and decreasing down
to zero as all rows go absent in the anchor database.
    In (c), see Figure 25c, the knot construction is beneficial for performance when there is a small number of
unique values shared by many instances, and these values are longer than the key introduced in the knot.
Rather than having to repeat a redundant value, a knot table can be used and its key referenced using a
smaller data type3 . If all attribute tables are knotted, the query time becomes significantly shorter in the
anchor database compared to the single table database, in which no changes are made and the query time
remains constant.
    In (d), see Figure 25d, for every instance of an entity its identity is propagated into many tables in an
anchor database, increasing the total size of the database. The less the overhead is relative the information
not stored in keys, the less the negative performance impact will be. The single table is not affected, since all
attributes reside in the same row as the identity, compared to several attribute tables. If the amount of data
is much larger than the amount residing in keys the query time in an anchor database will approach that in
the single table database.

   3 Although not examined in the experiment, in cases where a knot is unsuitable, perhaps due to values that cannot be

predicted, the candidate attribute table can be compressed. This provides a higher degree of selectivity than can be achieved in
a corresponding single table database. Note that compressed tables trade off less disk space for higher cpu utilization.

                                                              22
Scenario                     Number of attributes in the query and millions of rows
                             Single attribute     1/3 of attributes     2/3 of attributes      All attributes
                Modeled
                attributes   0.1    1       10    0.1       1     10    0.1       1    10     0.1         1    10
           1        6        −2    −2       −2    −2       −4     −3    −4       −8    −6     −4         −9    −9
                   12         1    −2        1    −3       −6     −4    −5      −13   −10     −7        −17   −16
                   24        −2    −2        1    −6      −10     −7    −8      −21    −7     −9        −27   −16
           2        6         1    +2       +2    −7      −2       1    −7      −3     −5     −5         −2   −12
                   12        +2    +3       +4    −6      −2      −3    −8      −2    −12     −9         −4   −26
                   24        +2    +6       +5    −3      −2      −6    −5      −4     −6     −7        −14   −22
           3        6         1    +2       +3    −2        1      1    −3       1     1      −3        −2     −4
                   12        +2    +4       +5    −2        1      1    −3      −2    −5      −4        −2    −13
                   24        +3    +7       +8    −2        1     −3    −3      −2    −3      −5        −3    −13
           4        6        +2    +2       +3     1      +2      +2     1        1   +2      −2          1    1
                   12        +2    +4       +3     1      +2      +2     1        1   −2      −2          1   −5
                   24        +4    +6       +6     1       1       1     1        1    1      −2          1   −4
                                                            (a)


            Scenario                    Number of attributes in the query and millions of rows
                              Single attribute      1/3 of attributes     2/3 of attributes     All attributes
                Modeled
                attributes   0.1        1    10     0.1     1      10     0.1     1     10     0.1        1   10
            1        6       −2      1       −2     −2     −2      −2     −2     −2     −3     −2        −2   −4
                    12       −2     −3        1      1     −2      −2     −2     −3     −4     −4        −4   −7
                    24        1     −3        1     −2     −4      −4     −2     −6     −4     −4        −8   −6
            2        6       +2      1       +3     +2      1      +2      1     −2      1      1        −3   −2
                    12       +2     +2       +5      1     −2       1     −2     −4     −3     −2        −4   −4
                    24       +3     +3      +10     −2     −2      −2     −3     −4     −2     −3        −4   −3
            3        6       +2      1       +4     +2     +5      +5     +2     −2     +2     +2        −2    1
                    12       +2     +2       +7      1      1      +3      1     −2      1      1        −3    1
                    24       +3     +3      +14      1     −2      +2     −2     −2     +2     −2        −3    1
            4        6       +3     +3       +7     +3     +7      +9      1      1     +3          1     1   +2
                    12       +4     +4       +8     +2     +2      +3      1      1      1          1     1    1
                    24       +8    +11      +14     +2      1      +2     +2      1     +2          1     1    1
                                                            (b)

Figure 24: Query times in an anchor database compared to a de-normalized single table database for queries
without (a) and with (b) conditions. When the number is positive +n the anchor database is n times faster
than the single table database, when negative −n the anchor database is n times slower, and when 1 they
are roughly the same.


   In (e), see Figure 25e, the larger size of a row in the single table compared to the normalized tables in
the anchor database causes more data having to be read during a query, provided that not all attributes are
used in the query. Eventually the time it takes to read this data from the single table will be longer than the
time it takes to do a join in the corresponding anchor database. Furthermore, the less attributes that are
used in the query the sooner the anchor database will overtake the single table database.
   In (f), see Figure 25f, the query time in an anchor database is independent of the number of attribute
tables for a query that does not change between tests. In contrast, for the single table the amount of data
that needs to be scanned during a query grows with the number of attributes it contains and queries take

                                                            23
(a)                      (b)                      (c)                      (d)




                  (e)                      (f)                      (g)                      (h)

Figure 25: Extrapolated graphs comparing query times in an anchor database (dashed line) to a de-normalized
single table (solid line) for (a) increased level of historization, (b) increased sparsity, (c) decreased number of
unique values, (d) increased amount of data per key, (e) increased number of rows, (f) increased number
of attributes in the model, (g) increased number of attributes in the query, and (h) increased number of
conditions in the query.


longer to execute. As long as a query does not use all attributes the anchor database will be faster than the
single table. Furthermore, the more rows that are scanned the sooner the anchor database will overtake the
single table database, due to the difference in scanned data size being multiplied by the number of rows.
    In (g), see Figure 25g, the performance will increase for the anchor database the fewer attributes that are
used in the query. This is thanks to table elimination, which effectively reduces the number of joins having
to be done and the amount of data having to be scanned. If the attributes after table elimination are few
enough, the extra cost of the remaining joins with narrow attribute tables will take less time than scanning
the wide single table, in which data not queried for is also contained.
    In (h), see Figure 25h, for every added condition limiting the number of rows in the final result set, the
intermediate result sets stemming from the join operations in an anchor database will become smaller and
smaller. In other words, progressively less data will have to be handled, which speeds up any remaining joins.
In the single table database, the whole table will be scanned regardless of the limiting conditions, unless it is
indexed. However, it is not common that a wide table has indexes matching every condition, which is why
no such indexes were used in the experiment.

7.4. Conclusions from the Experiment
    One limitation of the experiment is the small number of steps in which the degrees of condition fulfilment
are allowed to vary. Due to the large number of conditions, a set of tests had to be chosen that would
yield indicative results without taking too much time to actually perform. A deeper investigation into the
relationships between the conditions may be done as further research. The behaviour in other RDBMS could
be different and therefore need to be verified. Furthermore, the performance on different types of hardware
should also be measured and compared.
    The conclusion is that in many situations anchor databases are less I/O intense under normal querying
operations than in less normalized databases. This makes anchor databases suitable in situations where I/O
tends to become a bottleneck, such as in data warehousing [17]. Some of the results can also be carried over
to the domain of OLTP, where queries often pinpoint and retrieve a small amount of data through conditions.
In a situation where an anchor database performs worse, the impact of that disadvantage will have to be
weighed against other advantages when using Anchor Modeling.

                                                        24
8. Benefits

    The Anchor Modeling approach offers several benefits. The most important of them are categorized and
listed in the following subsections.

8.1. Ease of Modeling
Simple concepts and notation Anchor models are constructed using a small number of simple concepts
    (Section 2). This simplicity and the use of modeling guidelines (Section 4) reduce the number of options
    available when solving a modeling problem, thereby reducing the risk of introducing errors in an anchor
    model.
Historization by design Managing different versions of information is simple, as Anchor Modeling offers na-
     tive constructs for information versioning in the form of historized attributes and ties (Sections 2.3, 2.4).
Iterative and incremental development Anchor Modeling facilitates iterative and agile development,
     as it allows independent work on small subsets of the model under consideration, which later can be
     integrated into a global model. Changing requirements is handled by additions without affecting the
     existing parts of a model (cf. bus architecture [15]).

Reduced translation logic The symbols introduced to graphically represent tables in an anchor model
    (Section 2) can also be used for conceptual and logical modeling. This gives a near 1-1 relationship
    between all levels of modeling, which simplifies, or even eliminates, the need for translation logic in
    order to move between them.
Reusability and automation The small number of modeling constructs together with the naming conven-
    tion (Section 2.5) in an anchor model yield a high degree of structure, which can be taken advantage of
    in the form of reusability and automation. For example, ready-made templates for recurring tasks can
    be made and automatic code generation is possible, speeding up development.

8.2. Simplified Database Maintenance
Ease of attribute changes In an anchor database, data is historized on attribute level rather than row
     level. This facilitates tracing attribute changes directly instead of having to analyze an entire row in
     order to derive which of its attributes have changed. In addition, the predefined views and functions
    (Section 6.3) also simplify temporal querying. With the additional use of metadata it is also possible to
     trace when and what caused an attribute value to change.
Absence of null values There are no null values in an anchor database. This eliminates the need to
    interpret null values [21] as well as waste of storage space.

Simple index design The clustered indexes in an anchor database are given for attribute, anchor and knot
    tables. No analysis of what columns to index need be undertaken as there are unambiguous rules for
    determining what indexes are relevant.
Asynchronous arrival of data In an anchor database asynchronous arrival of data can be handled in
    a simple way. Late arriving data will lead to additions rather than updates, as data for a single
    attribute is stored in a table of its own (compared to other approaches where a table may include
    several attributes) [15, pp. 271–274].




                                                       25
Anchor Modeling27 Feb Paul
Anchor Modeling27 Feb Paul
Anchor Modeling27 Feb Paul
Anchor Modeling27 Feb Paul
Anchor Modeling27 Feb Paul
Anchor Modeling27 Feb Paul
Anchor Modeling27 Feb Paul

Mais conteúdo relacionado

Semelhante a Anchor Modeling27 Feb Paul

NUMERICAL SIMULATION OF FLOW INSIDE THE SQUARE CAVITY
NUMERICAL SIMULATION OF FLOW INSIDE THE SQUARE CAVITYNUMERICAL SIMULATION OF FLOW INSIDE THE SQUARE CAVITY
NUMERICAL SIMULATION OF FLOW INSIDE THE SQUARE CAVITY
International Journal of Technical Research & Application
 
vlsi qb.docx imprtant questions for all units
vlsi qb.docx imprtant questions for all unitsvlsi qb.docx imprtant questions for all units
vlsi qb.docx imprtant questions for all units
nitcse
 
Wavelet Analysis of Vibration Signature of a Bevel Gear Box in a Stand of St...
Wavelet Analysis of Vibration Signature of a Bevel Gear Box in a  Stand of St...Wavelet Analysis of Vibration Signature of a Bevel Gear Box in a  Stand of St...
Wavelet Analysis of Vibration Signature of a Bevel Gear Box in a Stand of St...
IJMER
 
Structural behavior of reed evaluation of tensilestrength, elasticityand stress
Structural behavior of reed evaluation of tensilestrength, elasticityand stressStructural behavior of reed evaluation of tensilestrength, elasticityand stress
Structural behavior of reed evaluation of tensilestrength, elasticityand stress
IAEME Publication
 
Structural behavior of reed evaluation of tensilestrength, elasticityand stress
Structural behavior of reed evaluation of tensilestrength, elasticityand stressStructural behavior of reed evaluation of tensilestrength, elasticityand stress
Structural behavior of reed evaluation of tensilestrength, elasticityand stress
IAEME Publication
 
用R语言做曲线拟合
用R语言做曲线拟合用R语言做曲线拟合
用R语言做曲线拟合
Wenxiang Zhu
 

Semelhante a Anchor Modeling27 Feb Paul (20)

NUMERICAL SIMULATION OF FLOW INSIDE THE SQUARE CAVITY
NUMERICAL SIMULATION OF FLOW INSIDE THE SQUARE CAVITYNUMERICAL SIMULATION OF FLOW INSIDE THE SQUARE CAVITY
NUMERICAL SIMULATION OF FLOW INSIDE THE SQUARE CAVITY
 
vlsi qb.docx imprtant questions for all units
vlsi qb.docx imprtant questions for all unitsvlsi qb.docx imprtant questions for all units
vlsi qb.docx imprtant questions for all units
 
Experimental Study on Tuned Liquid Damper and Column Tuned Liquid Damper on a...
Experimental Study on Tuned Liquid Damper and Column Tuned Liquid Damper on a...Experimental Study on Tuned Liquid Damper and Column Tuned Liquid Damper on a...
Experimental Study on Tuned Liquid Damper and Column Tuned Liquid Damper on a...
 
Application of Nanotechnology in Civil Engineering
Application of Nanotechnology in Civil EngineeringApplication of Nanotechnology in Civil Engineering
Application of Nanotechnology in Civil Engineering
 
Wavelet Analysis of Vibration Signature of a Bevel Gear Box in a Stand of St...
Wavelet Analysis of Vibration Signature of a Bevel Gear Box in a  Stand of St...Wavelet Analysis of Vibration Signature of a Bevel Gear Box in a  Stand of St...
Wavelet Analysis of Vibration Signature of a Bevel Gear Box in a Stand of St...
 
Ecet 380 Success Begins / snaptutorial.com
Ecet 380  Success Begins / snaptutorial.comEcet 380  Success Begins / snaptutorial.com
Ecet 380 Success Begins / snaptutorial.com
 
FSM 8108 VITE presented at Spie Optics Photonics 2017
FSM 8108 VITE presented at Spie Optics Photonics 2017FSM 8108 VITE presented at Spie Optics Photonics 2017
FSM 8108 VITE presented at Spie Optics Photonics 2017
 
Wires and Cables - A Productive View
Wires and Cables - A Productive ViewWires and Cables - A Productive View
Wires and Cables - A Productive View
 
My VLSI.pptx
My VLSI.pptxMy VLSI.pptx
My VLSI.pptx
 
130 133
130 133130 133
130 133
 
Performance Analysis of Corporate Feed Rectangular Patch Element and Circular...
Performance Analysis of Corporate Feed Rectangular Patch Element and Circular...Performance Analysis of Corporate Feed Rectangular Patch Element and Circular...
Performance Analysis of Corporate Feed Rectangular Patch Element and Circular...
 
Final Report2
Final Report2Final Report2
Final Report2
 
Thermoluminsence dosimeter
Thermoluminsence dosimeterThermoluminsence dosimeter
Thermoluminsence dosimeter
 
Structural behavior of reed evaluation of tensilestrength, elasticityand stress
Structural behavior of reed evaluation of tensilestrength, elasticityand stressStructural behavior of reed evaluation of tensilestrength, elasticityand stress
Structural behavior of reed evaluation of tensilestrength, elasticityand stress
 
Structural behavior of reed evaluation of tensilestrength, elasticityand stress
Structural behavior of reed evaluation of tensilestrength, elasticityand stressStructural behavior of reed evaluation of tensilestrength, elasticityand stress
Structural behavior of reed evaluation of tensilestrength, elasticityand stress
 
15 04-0662-02-004a-channel-model-final-report-r1
15 04-0662-02-004a-channel-model-final-report-r115 04-0662-02-004a-channel-model-final-report-r1
15 04-0662-02-004a-channel-model-final-report-r1
 
Anteena Design M.Tech/B.Tech/P.Hd Project
Anteena Design M.Tech/B.Tech/P.Hd ProjectAnteena Design M.Tech/B.Tech/P.Hd Project
Anteena Design M.Tech/B.Tech/P.Hd Project
 
Optical fibres by Raveendra Bagade
Optical fibres by Raveendra BagadeOptical fibres by Raveendra Bagade
Optical fibres by Raveendra Bagade
 
Unit 8
Unit 8Unit 8
Unit 8
 
用R语言做曲线拟合
用R语言做曲线拟合用R语言做曲线拟合
用R语言做曲线拟合
 

Mais de Department of Computer and Systems Sciences (6)

Göran Goldkuhl
Göran GoldkuhlGöran Goldkuhl
Göran Goldkuhl
 
Anchor Modeling27 Feb Paul
Anchor Modeling27 Feb PaulAnchor Modeling27 Feb Paul
Anchor Modeling27 Feb Paul
 
Businessbehaviormodeljuinpub
BusinessbehaviormodeljuinpubBusinessbehaviormodeljuinpub
Businessbehaviormodeljuinpub
 
Risk-aware i* models
Risk-aware i* modelsRisk-aware i* models
Risk-aware i* models
 
Anchor Modeling
Anchor ModelingAnchor Modeling
Anchor Modeling
 
Coordination Services with REA
Coordination Services with REACoordination Services with REA
Coordination Services with REA
 

Anchor Modeling27 Feb Paul

  • 1. Anchor Modeling O. Regardt Teracom, Kakn¨stornet, 102 52 Stockholm a L. R¨nnb¨ck o a re!sight, Kungsholms Strand 179, 112 48 Stockholm M. Bergholtz, P. Johannesson, P. Wohed DSV, Stockholms Universitet, Kista Abstract Maintaining and evolving data warehouses is a complex, error prone, and time consuming activity. The main reason for this state of affairs is that the environment of a data warehouse is in constant change, while the warehouse itself needs to provide a stable and consistent interface to information spanning extended periods of time. In this paper, we propose a modeling technique, called Anchor Modeling, that offers non-destructive extensibility mechanisms, thereby enabling robust and flexible management of changes. A key benefit of Anchor Modeling is that changes in a data warehouse environment only require extensions, not modifications, to the data warehouse. This ensures that existing data warehouse applications will remain unaffected by the evolution of the data warehouse, i.e. existing views and functions will not have to be modified as a result of changes in the warehouse model. We provide a formal and technology independent definition of anchor models and show how anchor models can be realised as relational databases. We also investigate performance issues and report on a number of lab experiments, which show that under certain conditions anchor databases perform substantially better than databases constructed using traditional modeling techniques. Keywords: anchor modeling, database modeling, normalization, 6NF, data warehousing, agile development, temporal databases, table elimination 1. Introduction Maintaining and evolving data warehouses is a complex, error prone, and time consuming activity. The main reason for this state of affairs is that the environment of a data warehouse is in constant change, while the warehouse itself needs to provide a stable and consistent interface to information spanning extended periods of time. Sources that deliver data to the warehouse change continuously over time and sometimes dramatically. The information retrieval needs, such as analytical and reporting needs, also change. In order to address these challenges, data models of warehouses have to be modular, flexible, and track changes in the handled information [16]. However, many existing warehouses suffer from having a model that does not fulfill these requirements. One third of implemented warehouses have at some point, usually within the first four years, changed their architecture, and less than a third quotes their warehouses as being a success [24]. In this paper, we propose a modeling technique, called Anchor Modeling, that offers non-destructive extensibility mechanisms, thereby enabling robust and flexible representations of changes. A positive consequence of this is that all previous versions of a schema will be available at any given time as parts of Email addresses: olle.regardt@teracom.se (O. Regardt), lars.ronnback@resight.se (L. R¨nnb¨ck), maria@dsv.su.se o a (M. Bergholtz), pajo@dsv.su.se (P. Johannesson), petia@dsv.su.se (P. Wohed) Preprint submitted to DKE February 27, 2010
  • 2. the complete schema [2]. Using Anchor Modeling results in data models in which only small changes are needed when large changes occur in the environment, like adding or switching a source system or analytical tool. This reduced need for redesign extends the longevity of a data warehouse, reduces the implementation time, and simplifies maintenance [23]. A key benefit of Anchor Modeling is that changes in a data warehouse environment only require extensions, not modifications, to the data model. Applications thereby remain unaffected by the evolution of the data model and thus do not have to be modified [21]. Furthermore, evolution through extensions rather than modifications results in modularity, making it possible to decompose data models into small, stable and manageable components. This modularity is also of great value in agile development where short iterations are required. It is possible to construct an initial model with a small number of agreed upon business terms, which later can be seamlessly extended into a final model. This way of working can improve on the state of the art in data warehouse design, where close to half of current data warehouse projects are either behind schedule or over budget [24, 4], partly due to having a too large initial project scope. An anchor model that is realised as a relational database schema will have a high degree of normalization, provide reuse of data and offer the ability to store historical data. The high decomposition of the database stems from the fact that attributes become separate tables in the schema. This differs significantly from the currently popular approach of using de-normalized multidimensional models [15] for data warehouses. Even though the origin of Anchor Modeling were the requirements found in such environments, it is a generic modeling approach, also suitable for other types of systems. The paper is organized as follows. Section 2 defines Anchor Modeling in a technology independent way and proposes a naming convention, Section 3 introduces a running example, Section 5 shows how an anchor model can be realised as a relational database, Section 4 suggests a number of Anchor Modeling guidelines, and physical implementation is described in Section 6. Section 7 investigates performance issues of anchor databases and introduces a number of conditions that influence performance. In Section 8 advantages of Anchor Modeling are discussed, Section 9 contrasts Anchor Modeling to alternative approaches in the literature, and Section 10 concludes the paper and gives directions for further research. 2. Basic Notions of Anchor Modeling In this section, we introduce the basic notions of Anchor Modeling by first explaining them informally and then giving formal definitions using the relational model. The basic building blocks in Anchor Modeling are anchors, knots, attributes, and ties. A meta model for Anchor Modeling in UML can be seen in Figure 1. Definition 2.1 (Identities). Let I be an infinite set of symbols, which are used as identities. Definition 2.2 (Data type). Let D be a data type. The domain of D is a set of data values. Definition 2.3 (Time type). Let T be a time type. The domain of T is a set of time values. 2.1. Anchors An anchor represents a set of entities, such as a set of actors or events. See Figure 2. Definition 2.4 (Anchor). An anchor A is a string. An extension of an anchor is a subset of I. An example of an anchor is AC Actor, see Figure 2. An example of an extension of AC Actor is {#4711, #4712, #4713}, a set of identities. 2.2. Knots A knot is used to represent a fixed set of entities that do not change over time. While anchors are used to represent arbitrary entities, knots are used to manage properties that are shared by many instances of some anchor. A typical example of a knot is GEN Gender, see Figure 3, which includes two instances, ‘Male’ and ‘Female’. This property, gender, is shared by many instances of the AC Actor anchor, thus using a knot minimizes redundancy. Rather than repeating the strings a single bit per instance is sufficient. 2
  • 3. ANCHOR 1 1 DATA TYPE range type 1 1 1..* 0..* 1 domain 0..* KNOT ATTRIBUTE TIE 1 consists_of ANCHOR ROLE 1 2..* range 1 range type 0..* 0..* range KNOTTED ATTRIBUTE KNOT ROLE consists_of KNOTTED TIE 1..* 1 0..* STATIC HISTORIZED KNOTTED STATIC KNOTTED HISTORIZED KNOTTED KNOTTED HISTORIZED TIE STATIC TIE ATTRIBUTE ATTRIBUTE ATTRIBUTE ATTRIBUTE HISTORIZED TIE STATIC TIE 0..* time range 0..* 0..* 0..* 0..* 1 TIME TYPE 1 time range 1 time range 1 time range Figure 1: A meta model for Anchor Modeling in UML. AC_Actor Figure 2: An anchor is shown as a filled square. GEN_Gender Figure 3: A knot is shown as an outlined square. Definition 2.5 (Knot). A knot K is a string. A knot has a domain, which is I. A knot has a range, which is a data type D. An extension of a knot K with range D is a bijective relation over I × D. An example of a knot is GEN Gender, see Figure 3. Its domain is I and its range is the data type string. An example of an extension of this knot is the set of identity-value pairs { #0, ‘Male’ , #1, ‘Female’ }. 2.3. Attributes Attributes are used to represent properties of anchors. We distinguish between four kinds of attributes: static, historized, knotted static, and knotted historized, see Figure 4. A static attribute is used to represent arbitrary properties of entities (anchors), where it is not needed to keep the history of changes to the attribute values. A historized attribute is used when changes of the attribute values need to be recorded. A value is considered valid until it is replaced by one with a later time. Valid time [6] is hence represented as an open interval with an explicitly specified beginning. The interval is implicitly closed when an instance with a later valid time is added for the same anchor instance. A knotted static attribute is used to represent relationships between anchors and knots, i.e. to relate an anchor to properties that can take on only a fixed, typically small, number of values. Finally a knotted historized attribute is used when the relation to a value in the knot is not stable but may change over time. Definition 2.6 (Static Attribute). A static attribute BS is a string. A static attribute BS has an anchor A for domain and a data type D for range. An extension of a static attribute BS is a relation over I × D. An example of a static attribute is ST LOC Stage Location, see Figure 4, with domain ST Stage and the data type date as range. An example of an extension of this static attribute is the set of identity-value pairs { #55, ‘Maiden Lane’ , #56, ‘Drury Lane’ }. Definition 2.7 (Historized Attribute). A historized attribute BH is a string. A historized attribute BH has an anchor A for domain, a data type D for range, and a time type T as time range. An extension of a historized attribute BH is a relation over I × D × T. 3
  • 4. ST_LOC_Stage_Location ST_NAM_Stage_Name (a) (b) AC_PLV_Actor_ProfessionalLevel AC_GEN_Actor_Gender GEN_Gender PLV_ProfessionalLevel (c) (d) Figure 4: A static attribute (a) is shown as an outlined circle and a historized attribute (b) is shown as a circle with a double outline. A knotted static attribute (c) is shown as an outlined circle and a knotted historized attribute (d) is shown as a circle with a double outline, both of which must reference a knot. An example of a historized attribute is ST NAM Stage Name, see Figure 4, with the domain ST Stage, the data type string as range, and the time type date as time range. An example of an extension of a historized attribute is the set of identity-value-time point triples { #55, ‘The Globe Theatre’, 1599-01-01 , #55, ‘Shakespeare’s Globe’, 1997-01-01 , #56, ‘Cockpit’, 1609-01-01 }. Definition 2.8 (Knotted Static Attribute). A knotted static attribute BKS is a string. A knotted static attribute BKS has an anchor A for domain and a knot K for range. An extension of a knotted static attribute BKS is a relation over I × I. An example of a knotted static attribute is AC GEN Actor Gender, see Figure 4, with domain AC Actor and range GEN Gender. An example of an extension of a knotted static attribute is the set of identity-identity pairs { #4711, #0 , #4712, #1 }. Definition 2.9 (Knotted Historized Attribute). A knotted historized attribute BKH is a string. A knotted historized attribute BKH has an anchor A for domain, a knot K for range, and a time type T for time range. An extension of a knotted historized attribute BKH is a relation over I × I × T. An example of a knotted historized attribute is AC PLV Actor ProfessionalLevel, see Figure 4, with domain AC Actor, range PLV ProfessionalLevel, and the time type date for time range. An example of an extension of a knotted historized attribute is the set of identity-identity-time point triples { #4711, #4, 1999-04-21 , #4711, #5, 2003-08-21 , #4712, #3, 1999-04-21 }. 2.4. Ties A tie represents associations between two or more anchor entities and optional knot entities. Similarly to attributes, ties come in four variants, static, historized, knotted static, and knotted historized. See Figure 5. As the same entity may appear more than once in a tie a occurrences need to be qualified using the concept of roles. Definition 2.10 (Anchor Role). An anchor role is a string. Every anchor role has a type, which is an anchor. An example of an anchor role is atLocation, with type ST Stage. Definition 2.11 (Knot Role). A knot role is a string. Every knot role has a type, which is a knot. An example of a knot role is having, with type PAT ParentalType. Definition 2.12 (Static Tie). A static tie TS is a set of anchor roles. An instance tS of a static tie TS = {R1 , . . . , Rn } is a set of pairs Ri , vi , i = 1, . . . , n, where vi ∈ I and n ≥ 2. An extension of a static tie TS is a set of instances of TS . 4
  • 5. PE_wasHeld_ST_atLocation ST_atLocation_PR_isPlaying (a) (b) AC_parent_AC_child_PAT_having AC_part_PR_in_RAT_got PAT_ParentalType RAT_Rating (c) (d) Figure 5: A static tie (a) is shown as a filled diamond and a historized tie (b) is shown as a filled diamond with an extra outline. A knotted static tie (c) is shown as a filled diamond and a knotted historized tie (d) is shown as a filled diamond with an extra outline, both of which must reference at least one knot. An example of a static tie is PE wasHeld ST atLocation = {wasHeld, atLocation}, see Figure 5, where the type of wasHeld is PE Performance and the type of atLocation is ST Stage. An example of an extension of a static tie is a set of sets of role-identity pairs {{ wasHeld, #911 , atLocation, #55 }, { wasHeld, #912 , atLocation, #55 }, { wasHeld, #913 , atLocation, #56 }}. Definition 2.13 (Historized Tie). A historized tie TH is a set of anchor roles and a time type T. An instance tH of a historized tie TH = {R1 , . . . , Rn , T} is a set of pairs Ri , vi , i = 1, . . . , n and a time point p, where vi ∈ I, p ∈ T, and n ≥ 2. An extension of a historized tie TH is a set of instances of TH . An example of a historized tie is ST atLocation PR isPlaying = {atLocation, isPlaying, date}, see Figure 5, where the type of atLocation is ST Stage and the type of isPlaying is PR Program. An example of an extension of a historized tie is a set of sets of role-identity pairs and a time point {{ atLocation, #55 , isPlaying, #17 , 2003-12-13}, { atLocation, #55 , isPlaying, #23 , 2004-04-01}, { atLocation, #56 , isPlaying, #17 , 2003-12-31}}. Definition 2.14 (Knotted Static Tie). A knotted static tie TKS is a set of anchor and knot roles. An instance tKS of a static tie TKS = {R1 , . . . , Rn , S1 , . . . , Sm } is a set of pairs Ri , vi , i = 1, . . . , n and Sj , wj , j = 1, . . . , m, where vi ∈ I, wj ∈ I, n ≥ 2, and m ≥ 1. An extension of a knotted static tie TKS is a set of instances of TKS . An example of a knotted static tie is AC parent AC child PAT having = {parent, child, having}, see Figure 5, where the type of parent and child is AC Actor and the type of having is PAT ParentalType. An example of an extension of a knotted static tie is a set of sets of role-identity pairs {{ parent, #4711 , child, #4713 , having, #1 }, { parent, #4712 , child, #4713 , having, #0 }}. Definition 2.15 (Knotted Historized Tie). A knotted historized tie TKH is a set of anchor and knot roles and a time type T. An instance tKH of a historized tie TKH = {R1 , . . . , Rn , S1 , . . . , Sm , T} is a set of pairs Ri , vi , i = 1, . . . , n, Sj , wj , j = 1, . . . , m, and a time point p, where vi ∈ I, wj ∈ I, p ∈ T, n ≥ 2, and m ≥ 1. An extension of a knotted historized tie TKH is a set of instances of TKH . An example of a knotted historized tie is AC part PR in RAT got = {part, in, got, date}, see Figure 5, where the type of part is AC Actor, the type of in is PR Program and the type of got is RAT Rating. An example of an extension of a knotted historized tie is a set of sets of role-identity pairs and a time point {{ part, #4711 , in, #17 , got, #8 , 2005-03-12}, { part, #4711 , in, #17 , got, #10 , 2008-08-29}, { part, #4712 , in, #17 , got, #10 , 2006-10-17}}. Definition 2.16 (Identifier). Let T be a (static, historized, knotted, or knotted historized) tie. An identifier for T is a subset of T. Further, if T is a historized or knotted historized tie, where T is the time type in T, every identifier for T must contain T. 5
  • 6. An identifier is similar to a key in relational databases, i.e. it should be a minimal set of roles that uniquely identifies instances of a tie. Definition 2.17 (Anchor Schema). An anchor schema is a 13-tuple A, K, BS , BH , BKS , BKH , RA , RK , TS , TH , TKS , TKH , I , where A is a set of anchors, K is a set of knots, BS is a set of static attributes, BH is a set of historized attributes, BKS is a set of knotted static attributes, BKH is a set of knotted historized attributes, RA is a set of anchor roles, RK is a set of knot roles, TS is a set of static ties, TH is a set of historized ties, TKS is a set of knotted static ties, TKH is a set of knotted historized ties, and I is a set of identifiers. The following must hold for an anchor schema: (i) for every attribute B ∈ BS ∪ BH ∪ BKS ∪ BKH , domain(B) ∈ A (ii) for every attribute B ∈ BKS ∪ BKH , range(B) ∈ K (iii) for every anchor role RA ∈ RA , type(RA ) ∈ A (iv) for every knot role RK ∈ RK , type(RK ) ∈ K (v) for every tie T in TS ∪ TH ∪ TKS ∪ TKH , T ⊆ RA ∪ RK Definition 2.18 (Anchor Model). Let A = A, K, BS , BH , BKS , BKH , RA , RK , TS , TH , TKS , TKH , I be an anchor schema. An anchor model for A is a quadruple extA , extK , extB , extT , where extA is a function from anchors to extensions of anchors, extK is a function from knots to extensions of knots, extB is a function from attributes to extensions of attributes, and extT is a function from ties to extensions of ties. Let proji be the i th projection map. An anchor model must fulfil the following conditions: (i) for every attribute B ∈ BS ∪ BH ∪ BKS ∪ BKH , proj1 (extB (B)) ⊆ extA (domain(B)) (ii) for every attribute B ∈ BKS ∪ BKH , proj2 (extB (B)) ⊆ proj1 (extK (range(B))) (iii) for every tie T and anchor role RA ∈ T , {v | RA , v ∈ extT (T )} ⊆ extA (type(RA )) (iv) for every tie T and knot role RK ∈ T , {v | RK , v ∈ extT (T )} ⊆ proj1 (extK (type(RK ))) (v) every extension of a tie T ∈ TS ∪ TH ∪ TKS ∪ TKH shall respect the identifiers in I, where an extension extT (T ) of a tie respects an identifier I ∈ I if and only if it holds that whenever two instances t1 and t2 of extT (T ) agree on all roles in I then t1 = t2 . 2.5. Naming Convention Entities in an anchor model can be named in any manner, but having a naming convention outlining how names should be constructed increases understandability and simplifies working with a model. If the same convention is used consistently over many installations the induced familiarity will speed up recognition by several orders of magnitude. A good naming convention should fulfil a number of criteria, some of which may conflict with others. Names should be short, but long enough be intelligible. They should be unique, but have parts in common for related entities. They should be unambiguous, without too many rules having to be introduced. Further, in order to be used in many different representations, the less “exotic” characters they contain the better. Definition 2.19 (Naming functions). The following are naming functions, further defined in Figure 6: (i) NI : A ∪ K → string is a total injection (ii) NIA : BS ∪ BH ∪ BKS ∪ BKH ∪ TS ∪ TH ∪ TKS ∪ TKH → string is a total injection (iii) NIK : BKS ∪ BKH ∪ TKS ∪ TKH → string is a total injection (iv) ND : BS ∪ BH ∪ K → string is a total injection (v) NT : BH ∪ BKH ∪ TH ∪ TKH → string is a total injection The convention suggested in the table in Figure 6 use only regular letters, numbers, and the underscore character. This ensures that the same names can be used in a variety of representations, such as in a relational database, in XML, or in a programming language. Names consist of two parts, a mnemonic followed by a descriptor. From the mnemonic, both the type of entity and relations to other entities can be deduced, and it also serves as a unique identifier for an entity, which can be used as a shorter alias for the full name. While 6
  • 7. Parts NT (T ) ::= f : MT , dl, vf NIK (T ) ::= f : MK, dl, id, dl, f : rK NIA (T ) ::= f : MA, dl, id, dl, f : rA NT (B) ::= f : MB, dl, vf ND (B) ::= f: B NIK (B) ::= f : MK, dl, id NIA (B) ::= f : MA, dl, id ND (K) ::= f: K NI (K) ::= f : MK, dl, id NI (A) ::= f : MA, dl, id Entities T ::= f : MT B ::= f : MB, dl, f : DB K ::= f : MK, dl, f : DK A ::= f : MA, dl, f : DA Descriptors DB ::= f : DA, dl, uwords DK ::= uwords DA ::= uwords Mnemonics MT ::= f : MA, dl, f : rA , (dl, f : MA, dl, f : rA )+, (dl, f : MK, dl, f : rK )* MB ::= f : MA, dl, (upper | number){3} MK ::= (upper | number){3} MA ::= (upper | number){2} Roles rK ::= lwords rA ::= lwords Primitives vf ::= ‘ValidFrom’ id ::= ‘ID’ dl ::= ‘’ lwords ::= (lower)+, (uword)? uwords ::= (upper, (lower | number)*)+ number ::= [0-9] upper ::= [A-Z] lower ::= [a-z] Figure 6: A suggested naming convention described using Extended Backus-Naur Form and a context sensitive function f that fetches already defined strings while respecting the connections between entities in an anchor model. the mnemonic helps with the overview it does little for understanding what a model actually represents. The descriptors, on the other hand, should be intelligible enough to give this understanding. The naming convention will yield unique but not unambiguous names for entities, i. e. the order of strings in a tie may change. Further, the data type parts are named identically to the entity in which they are contained and foreign keys keep the same name as the key they are referencing. 7
  • 8. 3. Running Example The scenario in this example is based on a business arranging stage performances. It is an extension of the example discussed in [14]. An anchor schema for this example is shown in Figure 7. The business entities are the stages on which performances are held, programs defining the actual performance, i e what is performed on the stage, and actors who carry out the performances. Stages have two attributes, a name and a location. The name may change over time, but not the location. Programs just have name, which describes the act. If a program changes name it is considered to be a different act. Actors have a name, but since this may be a pseudonym it could change over time. They also have a gender, which we assume does not change. The business also keeps track of an actor’s professional level, which changes over time as the actor gains experience. Actors get a rating for each program they have performed. If they perform consistently better or worse, the rating changes to reflect this. Stages are playing exactly one program at any given time and they change program every now and then. To simplify casting the parenthood between actors is needed so that it is possible to determine if an actor is the mother or father of another actor. A performance is an event bound in space and time that involves a stage on which a certain program is performed by one or more actors. It is uniquely identified by when and where it was held. The managers of the business also need to know how large the audience was at every performance and the total revenue gained from it. Four anchors PE Performance, ST Stage, PR Program and AC Actor capture the present entities. Attributes such as PR NAM Program Name and PE DAT Performance Date capture properties of those entities. Some attributes, e.g. AC NAM Actor Name and ST NAM Stage Name are historized, to capture the fact that they are subject to changes. The fact that an actor has a gender, which is one of two values, is captured through the knot GEN Gender and a knotted attribute called AC GEN Actor Gender. Similarly, since the business also keeps tracks of the professional level of actors, the knot PLV ProfessionalLevel and the knotted attribute AC PLV Actor ProfessionalLevel are introduced, which in addition is historized to capture attribute value changes. Furthermore, the relationships between the anchors are captured through ties. In the example the following ties are introduced: PE in AC wasCast, PE wasHeld ST atLocation, PE at PR wasPlayed, ST at- Location PR isPlaying, AC part PR in RAT got, and AC parent AC child PAT having to capture the ex- isting binary relationships. The historized tie ST atLocation PR isPlaying is used to capture the fact that stages change programs. The tie AC part PR in RAT got is knotted to show that actors get ratings on the programs they are playing and historized for capturing the changes in these ratings. A small black circle on a tie edge in Figure 7 indicates that the connected anchor is part of the primary key for the tie, while a white circle indicates that it is not. 4. Guidelines for Designing Anchor Models Anchor Modeling has been used in a number of industrial projects in the insurance business and retail business, and for the most part in data warehouse environments. Based on this experience and a number of well-known modeling problems such as reification of relationships [11], power types [8], update-anomalies in databases due to functional dependencies between too many attributes modeled within one entity in conceptual schemas [30], the following guidelines have been formulated. These provide a deeper understanding of the approach as well as support the building of anchor models in a way that gives desired properties. 4.1. Modeling Core Entities and Transactions Core entities in the domain of interest should be represented as anchors in an anchor model. A well-known problem in conceptual modeling is to determine whether a transaction should be modeled as a relationship or as an entity [11]. In anchor models, the question is formulated as determining whether a transaction should be modeled as an anchor or as a tie. When a transaction has some property, like PE DAT Performance Date in Figure 7, it should be modeled as an anchor. It can be modeled as a tie only if the transaction has no properties. 8
  • 9. PLV_ProfessionalLevel GEN_Gender AC_PLV_Actor_ProfessionalLevel AC_GEN_Actor_Gender PAT_ParentalType AC_NAM_Actor_Name AC_parent_AC_child_PAT_having AC_Actor PE_in_AC_wasCast RAT_Rating AC_part_PR_in_RAT_got PE_at_PR_wasPlayed PR_Program PE_Performance ST_atLocation_PR_isPlaying PE_wasHeld_ST_atLocation ST_Stage ST_NAM_Stage_Name ST_LOC_Stage_Location Anchor Knot (Knotted) Static Attribute (Knotted) Static Tie (Knotted) Historized Attribute (Knotted) Historized Tie Figure 7: An example anchor model illustrating different modeling concepts. Guideline 1: Use anchors for modeling core entities and transactions. A transaction can only be modeled as a tie if it has no properties. 4.2. Modeling Attribute Values Historized attributes are used when versioning of attribute values are of importance. A data warehouse, for instance, is not solely built to integrate data but also to keep a history of changes that have taken place. In anchor models, historized attributes take care of versioning by coupling versioning/history information to an attribute value. Guideline 2a: Use a historized attribute if versioning of attribute values are of importance, otherwise use a static attribute. A knot represents a type with a fixed set of instances that do not change over time. Conceptually a knot can be thought of as a an anchor with exactly one static attribute combined into a single entity. In many respects, knots are similar to the concept of power types [8], which are types with a fixed and small 9
  • 10. set of instances representing categories of allowed values. In Figure 4 the anchor AC Actor gets its gender attribute via a knotted static attribute, AC GEN Actor Gender, rather than storing the actual gender value (i.e. the string ‘Male’ or ‘Female’) of an actor directly in a static attribute. The advantage of using knots is reuse, as attributes can be specified through references to a knot identifier instead of a value. The latter is undesirable because of the redundancy it introduces, i.e. long attribute values have to be repeated resulting in increased storage requirements and update anomalies. Guideline 2b: Use a knotted attribute if attribute values represent categories or can take on only a fixed small set of values, otherwise use a regular attribute. Guidelines 2a and 2b may be combined, i.e. when both attribute values represent categories and the versioning of these categories are of importance. Guideline 2c: Use a knotted historized attribute if attribute values represent categories or a fixed small set of values and the versioning of these are of importance. 4.3. Modeling Relationships A static tie is used for relationships that do not change over time. For example, the actors who took part in a certain performance will never change. Typically, anchors modeling transactions or events statically relate to other anchors, i.e. a transaction or event happens at a particular time and does not change afterwards. A historized tie is used for relationships that change over time. For example, the program played at a specific stage will change over time. At any point in time, exactly one relationship will be valid. Guideline 3a: Use a historized tie if a relationship may change over time, otherwise use a static tie. A knotted tie is used for relationships where the instances fall within certain categories. For example, if we have two anchors, AC Actor and PR Program, a relationship between the two may be categorized as ‘Good’, ‘Bad’ or ‘Mediocre’ indicating how well the actor performed the program. A knot RAT Rating is used to model such categories. Guideline 3b: Use a knotted tie if the instances of a relationship belong to certain categories, otherwise use a regular tie. Guidelines 3a and 3b can also be combined in the case when the category a relationship belongs to may change over time. Guideline 3c: Use a knotted historized tie if the instances of a relationship belong to certain categories and the relationship may change over time. 4.4. Modeling Changing States To be able to manage states of instances in the extensions of anchors, attributes, and ties in an anchor model, knots have to be used that model the different states. For example, the example in Section 3 could be extended to handle the situations when an actor resigns, when an actor stops having a professional level due to inactivity, or when a stage is not playing any program at all. The proper way to handle such situations in an anchor model is to introduce a knot encoding values of the type ‘valid’ or ‘not valid’. Since historized attributes and ties only model the valid time of a relationship they cannot capture the fact that an attribute value or relationship is no longer valid. Guideline 4a: Use a knotted historized attribute if an entity has changing states. For example the knot ACT Active with the extension { #0, ‘Resigned’ , #1, ‘Active’ } and a knotted historized attribute AC ACT Actor Active can be used to determine whether an actor is still active or not. Guideline 4b: Use an additional knotted historized attribute if an attribute has changing states, otherwise use a single attribute. For example the knot PLE ProfessionalLevelExpired with the extension { #0, ‘Expired’ , #1, ‘Valid’ } and a knotted historized attribute AC PLE Actor ProfessionalLevelExpired can be used to determine whether an actor’s professional level has expired or not. 10
  • 11. Guideline 4c: Use a knotted historized tie if a relationship has changing states, otherwise use a (knotted) static tie. For example the knot PST PlayingStatus with the extension { #0, ‘Started’ , #1, ‘Stopped’ } and a knotted historized tie ST atLocation PR isPlaying PST currentStatus can be used to determine the durations with which programs are playing at the stages. 4.5. Modeling Large Relationships A lot of the advantages of Anchor Modeling such as no partial attributes [? ] (which translate to no null-values in a relational database), no update anomalies in relational databases based on Anchor Modeling etc. relies on a high degree of decomposition, where attributes are modelled as entities of their own and relationships are made as small as possible. For any tie having three or more roles, it may in some cases be necessary to model it as several smaller ties. Three of the ties in the example PE in AC wasCast, PE wasHeld ST atLocation, and PE at PR wasPlayed, see Section 3, could have been modeled as a single tie PE in AC wasCast ST atLocation PR wasPlayed. However, this is not possible if the instances may be only partially complete, i.e. lacking information for some of the roles. By breaking this relationship into three ties, the information about which actors partook in a certain performance can be added independently and at different times from where it was held and what was played. Guideline 5a: Use several ties, if partially complete instances may result from a large relationship. Another tie AC part PR in RAT got modeling the rating that an actor has got for playing a part in a cer- tain program could be extended to having two different kinds of ratings: artistic performance and technical per- formance, both of which could be modeled as references to one and the same knot RAT Rating with the exten- sion { #1, ‘Bad’ , #2, ‘Mediocre’ #2, ‘Good’ }. The new tie AC part PR in RAT artistic RAT technical having the extension {{ part, #4711 , in, #17 , artistic, #2 , technical, #2 , 2005-03-12}, { part, #4711 , in, #17 , artistic, #3 , technical, #2 , 2008-08-29}}}, suffers from the modeling counterpart of update anomalies [30], e.g. we will not know to which of the two ratings (artistic or technical) the historization date refers. The situation occurs in historized ties when more than one role is not in the identifier for the tie. If this situation occurs it is better to model the relationship as several ties for which those that are still historized have exactly one role outside the identifier. In the example given this would correspond to breaking up the tie into two ties AC part PR in RAT artistic and AC part PR in RAT technical, for which there can be no doubt as to what has changed between versions. Guideline 5b: Include in a (knotted) historized tie at most one role outside the identifier. For a knotted historized tie having one or more knot roles, it is sometimes desirable to remodel the tie such that it only contains anchor roles. This can be achieved by the introduction of a new anchor on which the knots instead appear through knotted attributes and a role for that anchor that replaces the knot roles in the tie. Consider a situation in which we have a knotted historized tie of high cardinality with all anchor roles and all but one knot role in its identifier. For every new version that is added only the historization part of the identifier change and the rest of it will be duplicated, and if that remainder is long it leads to unnecessary duplication of data. For example, let an extension of the tie be {{ RA1 , #1 , . . . , RAn , #n , RK , #21 , 2010-02-25 10:52:21}, { RA1 , #1 , . . . , RAn , #n , RK , #22 , 2010-02-25 10:52:22}}. A remodeling to a static tie is possible. The new tie has {{ RA1 , #1 , . . . , RAn , #n , RA new , #538 }}, the introduced anchor has {#538}, and the intro- duced knotted historized attribute has { #538, #21, 2010-02-25 10:52:21 , #538, #22, 2010-02-25 10:52:22 } as extensions. The new model duplicates less data and makes the retrieval of the history over changes in the knotted identities (and values if joined) optional, but it also puts the values “farther away” from the tie. Whether or not to remodel will depend on the amount of duplication and how fast this duplication will occur by the speed with which new versions arrive. Guideline 5c: Replace the knot roles in a knotted historized tie with an anchor role and introduce its type as a new anchor with corresponding knotted historized attributes if it has one or more knot roles outside the identifier and the introduction of new versions leads to excessive duplication of data. 11
  • 12. 5. From Anchor Model to Relational Database An anchor schema can be transformed into a relational database schema through the translation rules given below. For each construct in the anchor model there exists a translation rule that maps the construct into a relational database construct such as relation schema (table definition), relation (table), column, row etc. The names used in the relational database are derived from the naming convention in Section 2.5. The highly decomposed relational database schemas that result from anchor models facilitate traceability through metadata, capturing information such as creator, source, and time of creation. Although important, metadata is not discussed further since its use does not differ from that of other modeling techniques. 5.1. Anchors An anchor A is mapped to a relation scheme A(S), where S = NI (A). The domain of S is I. S is the primary key of the relation scheme A(S). AC Actor AC ID (PK) #4711 #4712 #4713 integer Figure 8: Example rows from the anchor AC Actor. For example the anchor AC Actor with integer as domain, see Figure 8, corresponds to the relation scheme AC Actor(AC ID). 5.2. Knots A knot K with range D is mapped to a relation scheme K(I, V ), where I = NI (K) and V = ND (K). The domain of I is I, and the domain of V is D. I is the primary key. GEN Gender GEN ID (PK) GEN Gender #0 ‘Male’ #1 ‘Female’ bit string Figure 9: Example rows from the knot GEN Gender. For example the knot GEN Gender with string as range and an extension over bit × string, see Figure 9, corresponds to the relation scheme GEN Gender(GEN ID, GEN Gender). 5.3. Static Attributes A static attribute BS with domain A and range D is mapped to a relation scheme B(I, V ), where I = NIA (A) and V = ND (B). The domain of I is I and the domain of V is D. The column I is a primary key of B and a foreign key with respect to the relation scheme A(S). For example the attribute PE DAT Performance Date with domain PE Performance, datetime as range, and an extension over integer × datetime, see Figure 10, corresponds to the relation scheme PE DAT Performance Date(PE ID, PE DAT Performance Date). 12
  • 13. PE DAT Performance Date PE ID (PK, FK) PE DAT Performance Date #911 1994-12-10 20:00 #912 1994-12-15 21:30 #913 1994-12-22 19:30 integer datetime Figure 10: Example rows from the static attribute PE DAT Performance Date. 5.4. Historized Attributes A historized attribute BH with domain A, range D and time range T is mapped to a relation scheme B(I, V, P ), where I = NIA (A), V = ND (B) and P = NT (B). The domain of I is I, the domain of V is D and the domain of P is T. The columns I and P together constitute a primary key of the relation scheme B. The column I is a foreign key with respect to the relation scheme A(S). AC NAM Actor Name AC ID (PK, FK) AC NAM Actor Name AC NAM ValidFrom (PK) #4711 ‘Arfle B. Gloop’ 1972-08-20 #4711 ‘Voider Foobar’ 1999-09-19 #4712 ‘Nathalie Roba’ 1989-09-21 integer string date Figure 11: Example rows from the historized attribute AC NAM Actor Name. For example the historized attribute AC NAM Actor Name with AC Actor as domain, date as time range, and an extension over integer × string × date, see Figure 11, corresponds to the relation scheme AC NAM Actor Name(AC ID, AC NAM Actor Name, AC NAM ValidFrom). 5.5. Knotted Static Attributes A knotted static attribute BKS with domain A and range K is mapped to a relation scheme B(I1 , I2 ), where I1 = NIA (A) and I2 = NIK (K). The domain of I1 is I and the domain of I2 is I. The column I1 is a primary key of B and a foreign key with respect to the relation scheme A(S). The column I2 is a foreign key with respect to the relation scheme K(I, V ). AC GEN Actor Gender AC ID (PK, FK) GEN ID (FK) #4711 #0 #4712 #1 #4713 #1 integer bit Figure 12: Example rows from the knotted static attribute AC GEN Actor Gender. For example the knotted static attribute AC GEN Actor Gender with AC Actor as domain, GEN Gender as range, and an extension over integer×bit, see Figure 12 corresponds to the relation scheme AC GEN Ac- tor Gender(AC ID, GEN ID). 5.6. Knotted Historized Attributes A knotted historized attribute BKH with domain A, range K, and time range T is mapped to a relation scheme B(I1 , I2 , P ), where I1 = NIA (A), I2 = NIK (K) and P = NT (B). The domain of I1 is I, the domain 13
  • 14. of I2 is I and the domain of P is T. The columns I1 and P together constitute a primary key of the relation scheme B. The column I1 is a foreign key with respect to the relation scheme A(S). The column I2 is a foreign key with respect to the relation scheme K(I, V ). AC PLV Actor ProfessionalLevel AC ID (PK, FK) PLV ID (FK) AC PLV ValidFrom (PK) #4711 #4 1999-04-21 #4711 #5 2003-08-21 #4712 #3 1999-04-21 integer tinyint date Figure 13: Example rows from the knotted historized attribute AC PLV Actor ProfessionalLevel. For example the knotted historized attribute AC PLV Actor ProfessionalLevel with AC Actor as domain, PLV ProfessionalLevel as range, date as time range, and an extension over integer×tinyint×date, see Fig- ure 13, corresponds to the relation scheme AC PLV Actor ProfessionalLevel(AC ID, PLV ID, AC PLV Valid- From). 5.7. Static Ties A static tie TS = {R1 , . . . , Rn }, n ≥ 2, is mapped to a relation scheme T (M1 , . . . , Mn ) using an order- preserving map, such that Ri is mapped to Mi given a total order on TS . Every column Mi is a foreign key with respect to the relation scheme Ai (S), where Ai is the type of the anchor role Ri and Mi = NIA (Ai ). The primary key of the relation scheme T is I where I is the identifier of the static tie TS . PE in AC wasCast PE ID in (PK, FK) AC ID wasCast (PK, FK) #911 #4711 #912 #4711 #913 #4712 integer integer Figure 14: Example rows from the static tie PE in AC wasCast. For example the static tie PE in AC wasCast = {in, wasCast} having the type of in as the anchor PE Performance, the type of wasCast as the anchor PE Performance, and an extension over integer × integer, see Figure 14, corresponds to the relation scheme PE in AC wasCast(PE ID in, AC ID wasCast), with the primary key PE ID in, AC ID wasCast. 5.8. Historized Ties A historized tie TH = {R1 , . . . , Rn , P }, n ≥ 2 is mapped to a relation scheme T (M1 , . . . , Mn , P ) using an order-preserving map, such that Ri is mapped to Mi given a total order on TH . Every column Mi is a foreign key with respect to the relation scheme Ai (S), where Ai is the type of the anchor role Ri , Mi = NIA (Ai ), and P is a column with the domain T. The primary key of the relation scheme T is I where I is the identifier of the historized tie TH . For example the historized tie ST atLocation PR isPlaying = {atLocation, isPlaying, date} having the type of atLocation as the anchor ST Stage, the type of isPlaying as the anchor PR Program, and an extension over integer × integer × date, see Figure 15, corresponds to the relation scheme ST atLocation PR is- Playing(ST ID atLocation, PR ID isPlaying, ST atLocation PR isPlaying ValidFrom), with the primary key ST ID atLocation, ST atLocation PR isPlaying ValidFrom. 14
  • 15. ST atLocation PR isPlaying ST ID atLocation (PK, FK) PR ID isPlaying (FK) ST atLocation PR isPlaying ValidFrom (PK) #55 #17 2003-12-13 #55 #23 2004-04-01 #56 #17 2003-12-31 integer integer date Figure 15: Example rows from the historized tie ST atLocation PR isPlaying. 5.9. Knotted Static Ties A knotted static tie TKS = {R1 , . . . , Rn , S1 , . . . , Sm }, n ≥ 2 and m ≥ 1, is mapped to a relation scheme T (M1 , . . . , Mn , N1 , . . . , Nm ) using an order-preserving map, such that Ri is mapped to Mi and Sj is mapped to Nj given a total order on TKS . Every column Mi is a foreign key with respect to the relation scheme Ai (S), where Ai is the type of the anchor role Ri and Mi = NIA (Ai ). Every column Nj is a foreign key with respect to the relation scheme Kj (I, V ), where Kj is the type of the knot role Sj and Nj = NIK (Kj ). The primary key of the relation scheme T is I where I is the identifier of the knotted static tie TKS . AC parent AC child PAT having AC ID parent (PK, FK) AC ID child (PK, FK) PAT ID having (PK, FK) #4711 #4713 #1 #4712 #4713 #0 #4712 #4714 #0 integer integer bit Figure 16: Example rows from the knotted static tie AC parent AC child PAT having. For example the knotted static tie AC parent AC child PAT having = {parent, child, having} having the type of parent and child as the anchor AC Actor, the type of having as the knot PAT ParentalType, and an extension over integer × integer × bit, see Figure 16, corresponds to the relation scheme AC par- ent AC child PAT having(AC ID parent, AC ID child, PAT ID having), with the primary key AC ID parent, AC ID child, PAT ID having. 5.10. Knotted Historized Ties A knotted historized tie TKH = {R1 , . . . , Rn , S1 , . . . , Sm , P }, n ≥ 2 and m ≥ 1, is mapped to a relation scheme T (M1 , . . . , Mn , N1 , . . . , Nm ) using an order-preserving map, such that Ri is mapped to Mi and Sj is mapped to Nj given a total order on TKH . Every column Mi is a foreign key with respect to the relation scheme Ai (S), where Ai is the type of the anchor role Ri and Mi = NIA (Ai ). Every column Nj is a foreign key with respect to the relation scheme Kj (I, V ), where Kj is the type of the knot role Sj and Nj = NIK (Kj ). The column P has the domain T. The primary key of the relation scheme T is I where I is the identifier of the knotted historized tie TKH . AC part PR in RAT got AC ID part (PK, FK) PR ID in (PK, FK) RAT ID got (FK) AC part PR in RAT got ValidFrom (PK) #4711 #17 #8 2005-03-12 #4711 #17 #10 2008-08-29 #4712 #17 #10 2006-10-17 integer integer tinyint date Figure 17: Example rows from the knotted historized tie AC part PR in RAT got. 15
  • 16. For example the knotted historized tie AC part PR in RAT got = {part, in, got, date} having the type of part as the anchor AC Actor, the type of in as the anchor PR Program, the type of got as the knot RAT Rating, and an extension over integer × integer × tinyint × date, see Figure 17, corresponds to the relation scheme AC part PR in RAT got(AC ID part, PR ID in, RAT ID got, AC part PR in RAT got ValidFrom), with the primary key AC ID part, PR ID in, AC part PR in RAT got ValidFrom. 6. Physical Implementation In this section the physical implementation of an anchor database is discussed. 6.1. Indexes in an Anchor Database Indexes in general, and clustered indexes in particular, are used to delimit the range of a table scan or remove the need for a table scan altogether as well as removing or delimiting the need to sort searched-for-data. A clustered index is an index with a sort order equal to the physical sort order on disc of the table to which the clustered index refers. In this section we will discuss what indexes on the various constructs in an anchor database will look like and why it is important to have these indexes from a performance point of view. See Figure 21 to see how some indexes are created for different table types in an anchor database. 6.1.1. Indexes on Anchor and Knot Tables Since both anchor and knot tables are referenced from a number of other tables (attribute and tie tables), a unique clustered index should be created on the primary key in each anchor and knot table to ensure that the number of records that have to be scanned in the queried tables will not be too large, e.g. the number of time-costly disc-accesses will be minimal. During query execution a unique index over the values in a knot table helps to improve performance when a condition is set on the column. Searching can then stop once a match has been found. 6.1.2. Indexes on Attribute Tables An attribute table contains a foreign key against an anchor table and optionally information on historization. A composite unique clustered index should be created on the foreign key column in ascending order together with the historization column in descending order1 . This means that on the physical media where the table data is stored the latest version for any given identity comes first. This will ensure more efficient retrieval of data since a good query optimizer will realize that a sort operation is unnecessary. The optimizer will also utilize the created clustered indexes when joining attribute tables with their anchor table, ensuring that no sorting has to be made since both the anchor and attribute tables are already physically ordered on disc. This also means that the accesses to the storage media can be done in large sequential chunks, provided that data is not fragmented on the storage media itself. 6.1.3. Indexes on Tie Tables Tie tables contain two or more foreign keys against an anchor table, one or more foreign keys towards knot tables, and optionally information on historization. Differently from anchor, knot and attribute tables, the indexes on tie tables are not given. A tie table should always have a unique clustered index over the primary key, however the order in which the columns appear can not be unambiguously determined from the identifier of the tie. An order has to be selected that fits the need best, perhaps according to what type of queries will be the most common. In some cases an additional index with another ordering of the columns may be necessary. In our running example in Section 3 one historized knotted tie AC part PR in RAT got models the current rating that an actor has got for performing a certain program, together with the time at which this 1 Note that the indexing order of the columns is of less importance. Database engines read data in large chunks into memory and reading forwards or backwards in memory is equally effective. We do, however, in order to be consistent, use the same ordering in all of our indexes. 16
  • 17. rating became relevant. The unique clustered index of the corresponding tie table is composed of the foreign keys against the anchor tables in ascending order, together with the historization column in descending order. This index will ensure that we do not have to scan the entire AC Actor table for each row of the entire PR Program table in order to find the values that we are looking for. 6.1.4. Partitioning of Tables In addition to indexes, access to the tables in the anchor database may be improved through partitioning. For example, if the number of actors and programs are very large and you mostly access the ones with the highest rating (see the tie table above), you can further improve performance by partitioning the tie table based on the rating. Partitioning, if the database engines support it, will split the underlying data on the physical media into as many chunks as the cardinality of the knot. This allows for having frequently accessed partitions on faster discs and less data to scan during query execution. The two best candidates for partitioning in an anchor database are knot values and the time type used for historization (i.e. we assume that users tend to access certain values of knots more frequently than other values, as well as certain dates being of more or less importance). 6.2. Loading Practices When loading data into an anchor database a zero update strategy is used. This means that only insert statements are allowed and that data is always added, never updated. Delete statements are allowed only when applied to remove erroneous data. A complete history is thereby stored for accurate information [20]. If you have non-persistent entities or relations in your model you should model these with a state knot. That way you will be keeping a history of when they ceased to exist. You must keep this strategy in mind when building your model. A zero update strategy together with properly maintained metadata also alleviates the need for transactions. Through metadata it is always possible to find the rows that belong to a certain batch, came from a specific source system, were loaded by a particular account, or were added between some given dates. Instead of having a fault recovery system in place every time data is added, it can be prepared and used only after an error is found. Since nothing has been updated, the introduced errors are in the form of rows, and these can easily be found. Scripts can be made that allow for removal of erroneous rows, regardless of when they were added. Data will never change once it is in place. Its integrity is guaranteed. A disadvantage that is avoided by not using updates is their higher cost in terms of performance when compared to insert statements. By not doing updates there is also no row locking on existing rows, a fact that lessens the impact on read performance when writing is done in the database. 6.3. Views and Functions Due to the large number of tables and the added complexity from handling historical data, an abstraction layer in the form of views and functions is added to simplify querying in an anchor database. It de-normalizes the anchor database and retrieves data from a given temporal perspective. There are three different types of views and functions for each anchor table corresponding to the most common use cases when querying data: latest view, point-in-time function, and interval function [26, 20], all of which are based on an abstract complete view. Views and functions for tie tables are created in a way analogous with those for anchor tables. See the right side of Figure 21 for a couple of examples of how these views and functions are created. 6.3.1. Complete View The complete view of an anchor table is a de-normalization of it and its associated attribute tables. It is constructed by left outer joining the anchor table with all its associated attribute tables. 6.3.2. Latest View The latest view of an anchor table is a view based on the complete view, where only the latest values for historized attributes are included. In order to find the latest version a sub-select is added that ensures that the time point used for historization is the latest one for each identity. See Figure 18. 17
  • 18. select * from lST Stage ST ID ST NAM Stage Name ST NAM ValidFrom ST LOC Stage Location #55 ‘Shakespeare’s Globe’ 1997-01-01 ‘Maiden Lane’ #56 ‘Cockpit’ 1609-01-01 ‘Drury Lane’ integer string date string Figure 18: Example rows from the latest view lST Stage for the anchor ST Stage. 6.3.3. Point-in-time Function The point-in-time function is a function for an anchor table with a time point as an argument returning a data set. It is based on the complete view where for each attribute only its latest value before or at the given time point is included. A sub-select is added that ensures that the historization time is the latest one earlier than or on the given time point for each identity. See Figure 19. select * from pST Stage(‘1608-01-01’) where ST NAM Stage Name is not null ST ID ST NAM Stage Name ST NAM ValidFrom ST LOC Stage Location #55 ‘The Globe Theatre’ 1599-01-01 ‘Maiden Lane’ integer string date string Figure 19: Example rows from the point-in-time function pST Stage given the time point ‘1608-01-01’ for the anchor ST Stage. The absence of a row for the stage on Drury Lane (see the previous Figure 18) in the returned data set of Figure 19 indicates that no data is present for it (which in this case depends on the fact that the theatre was not yet built in 1608). Without the condition on the column ST NAM Stage Name a row with a null value in that column would have been shown. 6.3.4. Interval Function The interval function is a function for an anchor table taking two time points as arguments and returning a data set. It is based on the complete view where for each attribute only values between the given time points are included. Here the sub-select must ensure that the time point used for historization lies within the two provided time points. See Figure 20. select * from dST Stage(‘1598-01-01’, ‘1998-01-01’) ST ID ST NAM Stage Name ST NAM ValidFrom ST LOC Stage Location #55 ‘The Globe Theatre’ 1599-01-01 ‘Maiden Lane’ #55 ‘Shakespeare’s Globe’ 1997-01-01 ‘Maiden Lane’ #56 ‘Cockpit’ 1609-01-01 ‘Drury Lane’ integer string date string Figure 20: Example rows from the interval function dST Stage in the interval given by the time points ‘1598-01-01’ and ‘1998-01-01’ for the anchor ST Stage. 6.3.5. Natural Key View The natural key of the anchor PE Performance is composed of when and where it was held. When it was held can be found in the attribute PE DAT Performance Date and where it was held in the attribute ST LOC Stage Location. A performance is uniquely identified by the values in these two attributes. However, these attributes belong to different anchors, which is commonplace in an anchor model. The parts that make up a natural key may be found in attributes spread out over several anchors. 18
  • 19. 1 create table AC Actor ( 58 create view lAC part PR in RAT got as 2 AC ID int not null, 59 select 3 primary key clustered ( 60 [AC PR RAT].AC ID part, 4 AC ID asc 61 [AC PR RAT].PR ID in, 5 ) 62 [AC PR RAT].RAT ID got, 6 ); 63 [RAT].RAT Rating, 7 create table GEN Gender ( 64 [AC PR RAT].AC part PR in RAT got ValidFrom 8 GEN ID tinyint not null, 65 from 9 GEN Gender varchar(42) not null unique, 66 AC part PR in RAT got [AC PR RAT] 10 primary key clustered ( 67 left join 11 GEN ID asc 68 RAT Rating [RAT] 12 ) 69 on 13 ); 70 [RAT].RAT ID = [AC PR RAT].RAT ID got 14 create table AC GEN Actor Gender ( 71 where 15 AC ID int not null foreign key 72 [AC PR RAT].AC part PR in RAT got ValidFrom = ( 16 references AC Actor ( AC ID ), 73 select 17 GEN ID tinyint not null foreign key 74 max(sub.AC part PR in RAT got ValidFrom) 18 references GEN Gender ( GEN ID ), 75 from 19 primary key clustered ( AC ID asc ) 76 AC part PR in RAT got [sub] 20 ); 77 where 21 create table AC NAM Actor Name ( 78 [sub ]. AC ID part = [AC PR RAT].AC ID part 22 AC ID int not null foreign key 79 and 23 references AC Actor ( AC ID ), 80 [sub ]. PR ID in = [AC PR RAT].PR ID in 24 AC NAM Actor Name varchar(42) not null, 81 ); 25 AC NAM ValidFrom date not null, 82 create function pAC Actor ( 26 primary key clustered ( 83 @timepoint date 27 AC ID asc, 84 ) returns table return 28 AC NAM ValidFrom desc 85 select 29 ) 86 [AC].AC ID, 30 ); 87 [AC GEN].GEN ID, 31 create table PR Program ( 88 [GEN].GEN Gender, 32 PR ID int not null, 89 [AC NAM].AC NAM Actor Name, 33 primary key clustered ( 90 [AC NAM].AC NAM ValidFrom 34 PR ID asc 91 from 35 ) 92 AC Actor [AC] 36 ); 93 left join 37 create table RAT Rating ( 94 AC GEN Actor Gender [AC GEN] 38 RAT ID tinyint not null, 95 on 39 RAT Rating varchar(42) not null unique, 96 [AC GEN].AC ID = [AC].AC ID 40 primary key clustered ( 97 left join 41 RAT ID asc 98 GEN Gender [GEN] 42 ) 99 on 43 ); 100 [GEN].GEN ID = [AC GEN].GEN ID 44 create table AC part PR in RAT got ( 101 left join 45 AC ID part int not null foreign key 102 AC NAM Actor Name [AC NAM] 46 references AC Actor ( AC ID ), 103 on 47 PR ID in int not null foreign key 104 [AC NAM].AC ID = [AC].AC ID 48 references PR Program ( PR ID ), 105 and 49 RAT ID got tinyint not null foreign key 106 [AC NAM].AC NAM ValidFrom = ( 50 references RAT Rating ( RAT ID ), 107 select 51 AC part PR in RAT got ValidFrom date not null, 108 max([sub].AC NAM ValidFrom) 52 primary key clustered ( 109 from 53 AC ID part asc, 110 AC NAM Actor Name [sub] 54 PR ID in asc, 111 where 55 AC part PR in RAT got ValidFrom desc 112 [sub ]. AC ID = [AC].AC ID 56 ) 113 and 57 ); 114 [sub ]. AC NAM ValidFrom <= @timepoint 115 ); Figure 21: Definitions in Transact-SQL, illustrating how primary keys, clustered indexes, and other constraints are defined in order to enable table elimination. When data is added to an anchor database it has to be determined if it is for an already known entity or for a new one. This is done by looking at the natural key, and as this often has to be done frequently, views for each anchor table is provided that act as a look-ups for its natural keys. Such a view joins the necessary anchor, attribute, and tie tables in order to translate natural key values into their corresponding identifiers. Note that it is possible for a single anchor to have several differently composed natural keys. Parts of the 19
  • 20. key may also consist of historized attributes, in which case the view may have several rows for the same identifier. An example of a natural key view can be seen in Figure 22. select * from nPE Performance PE ID PE DAT Performance Date ST LOC Stage Location #911 1994-12-10 20:00 ‘Maiden Lane’ #912 1994-12-15 21:30 ‘Maiden Lane’ #913 1994-12-22 19:30 ‘Drury Lane’ integer date string Figure 22: Example rows from a natural key view nPE Performance for the anchor PE Performance, mapping the identity PE ID to its natural key, which is composed of the attributes PE DAT Performance Date and ST LOC Stage Location. 6.4. Utilizing Table Elimination Modern query optimizers utilize a technique called table (or join) elimination [18], which in practice implies that tables containing not selected attributes in queries are automatically eliminated. The optimizer will remove table T from the execution plan of a query if the following two conditions are fulfilled: (i) no column from T is explicitly selected (ii) the number of rows in the returned data set is not affected by the join with T In order to take advantage of table elimination primary and foreign keys have to be defined in the database. Some examples showing how this is done for anchor, knot, attribute and tie tables can be seen in the left side of Figure 21. The views and functions defined in Section 6.3 are created in order to take advantage of table elimination, see the right side of Figure 21 for a couple of examples. The anchor table is used as the left table in the view (or function) with the attribute tables left outer joined. The left join ensures that the number of rows retrieved is at least as many as in the anchor table, provided that no conditions are set in the query. Furthermore, since the join is based on the primary key in the attribute table, uniqueness is also ensured, hence the number of resulting rows is equal to the number of the rows in the anchor table. If for an anchor having some attribtues, a subset of these are used in a query over the defined views or functions, the others can be eliminated and not touched during execution. If a condition on the values in an attribute table is set in the query, such that it selects a proper subset of the rows, the foreign key constraint ensures that all rows must have a corresponding instance in the anchor table. The optimizer can use this fact to eliminate the anchor table and work with the subset as an intermediate result set. Once this is done, the same logic as above applies, and attributes not present in the query can be eliminated. Typical queries only retrieve a small number of attributes, which implies that table elimination is frequently used, yielding reduced access time. 7. Performance in Anchor Databases There are several considerations when it comes to performance in anchor databases, in particular how they perform when reading and writing data in data warehouse and OLTP environments. Experiments have been carried out that test the behaviour of reading data in data warehouse environments, using queries that scan large amounts of data. As for writing data in data warehouse environments as well as reading and writing data in OLTP environments we do not have any experimental results but only our experience from actual running environments. In these cases, performance has not been an issue. 20
  • 21. 7.1. Performance Influencing Conditions To summarize the results when reading data in a data warehouse environment there are situations where an anchor database performs relatively better and worse than databases in which other modeling techniques have been used. Given a body of information and the intended searches over it, the following conditions comparatively improves performance in an anchor database with respect to a less normalized model. It is assumed that this body of information can be described using entities, identifiers, attributes and values. (a) The data is time dependent and its changes need to be kept. (b) The data is sparse, i. e. if many entities lack some of their attribute values. (c) The ratio between unique data values and their occurrences is low. (d) The ratio between the amount of data and the amount of data residing in identifiers is high. (e) There are many identifiers. (f) There are many attributes. (g) Most searches address only a small proportion of the attributes in the entities over which the search is done. (h) Most searches include conditions that impose bounds on values. For data warehouse environments, many of the conditions above are typically fulfilled. It is often required that a history is kept, that data is sparse by nature or by asynchronous arrival, that there are data that can be used for categorization and have only a few unique values, that some data values are of considerable length, that there are many rows in some tables, that some entities have many attributes, that most queries use only a few attributes, and that most queries have conditions limiting the number of rows in the result sets. However, these conditions can be fulfilled to different degrees. Therefore, we carried out an experiment in order to determine at what degree an anchor database performs better than a less normalized database. Scenario Condition 1 2 3 4 (a) Number of versions per instance 1 1 1 2 (b) Degree of sparsity 0% 0% 50% 50% (c) Number of knots per attributes 0 1/3 1/3 1/3 (d) Average data/identifiers ratio 1/4 3/1 3/1 3/1 Tested combinations (e) Number of rows 0.1, 1, 10 million (f) Number of attributes 6, 12, 24 (g) Number of queried attributes 1, 1/3, 2/3, all (h) Number of conditioned attributes none, all Figure 23: A table showing to what degree different conditions were fulfilled in the four scenarios used in the experiment and which combinations have been tested for each scenario. The scenarios and combinations yield a total of 288 tests. 7.2. Experimental Verification The query times for varying degrees of condition fulfillment have been measured in a series of tests in which an anchor database was compared with a less normalized database containing the same information. The experiment consisted of four scenarios, see Figure 23 for which the data and requirements changed. In each scenario a number of combinations were tested. In total 288 measurements per model were made and the query times compared2 . The queries were run against the latest view in an anchor database and in a 2 The tests were performed using an Intel Core 2 Quad 2.6GHz CPU, 8GB of RAM, and two SATA hard disks. The database used was Microsoft SQL Server 2008, set up with data and tempdb on separate disks. Scripts and detailed results are available from http://www.anchormodeling.com. 21
  • 22. database where the anchor and all its attribute tables have been de-normalized into a single table. Ratios that compare query times in the anchor database with those for the single table were calculated and can be seen in the tables in Figure 24. In the first scenario the information consisted solely of one byte attributes together with a four byte key. The number of unique attribute values were assumed to be at least as many as the number of identifiers, which prevented us from using knotted attribute tables in the anchor database. In practice there will of course be several repetitions of values since one byte can store much fewer unique values than the maximum number of rows in the tests, but for the sake of getting a high ratio between unique values and the number of their occurrences we decided not to make use of this fact. In the second scenario, the information consisted of six different data types together with a four byte key. The average size of a data value was 12 bytes. In the cases when the database had 12 and 24 attributes, these six types were repeated. In the third scenario, to simulate sparseness half of the rows in every attribute table found in the second scenario was removed, but the all instances in the anchor table kept. Finally, the fourth scenario was based on the third with the addition of historization. One out of every six attributes was historized and one extra version was added for each instance of the key in the table, effectively doubling the number of rows. 7.3. Explanation of the Results From the results it is obvious that there are situations where an anchor database outperforms a less normalized database as well as situations where it does not. The more conditions that are fulfilled to some degree the better the anchor database performs. In some cases, a single condition fulfilled to a degree high enough is sufficient for an anchor database to outperform the single table database. This behaviour can be seen in the graphs in Figure 25, which are extrapolated from the results of the experiment. Note that the number of measurements in some cases are few and that the graphs should be seen as showing an approximative behaviour. In the graphs a single condition is allowed to change, while all others remain fixed. What is seen in the graphs can be explained as follows. In (a), see Figure 25a, historizing a column value in the single table database will duplicate information. In an anchor database, only the table for the affected attribute will gain extra rows as new versions are added. The impact on performance is evident when using time dependent queries, such as finding the latest version or the version that was valid at a certain point in time. This is due to the fact that self-joins or sub-selects have to be made and the narrow and optimally ordered attribute tables perform much better than the wide single table, even though it had indexes on the historization columns. In (b), see Figure 25b, sparse data is represented by the absence of a row in an anchor database, thereby reducing the amount of data that needs to be scanned during a query. In the single table database,“holes” in the data will be represented by null values, which a query manager must inspect in order to take the appropriate action. Therefore, the query time is constant for the single table database, and decreasing down to zero as all rows go absent in the anchor database. In (c), see Figure 25c, the knot construction is beneficial for performance when there is a small number of unique values shared by many instances, and these values are longer than the key introduced in the knot. Rather than having to repeat a redundant value, a knot table can be used and its key referenced using a smaller data type3 . If all attribute tables are knotted, the query time becomes significantly shorter in the anchor database compared to the single table database, in which no changes are made and the query time remains constant. In (d), see Figure 25d, for every instance of an entity its identity is propagated into many tables in an anchor database, increasing the total size of the database. The less the overhead is relative the information not stored in keys, the less the negative performance impact will be. The single table is not affected, since all attributes reside in the same row as the identity, compared to several attribute tables. If the amount of data is much larger than the amount residing in keys the query time in an anchor database will approach that in the single table database. 3 Although not examined in the experiment, in cases where a knot is unsuitable, perhaps due to values that cannot be predicted, the candidate attribute table can be compressed. This provides a higher degree of selectivity than can be achieved in a corresponding single table database. Note that compressed tables trade off less disk space for higher cpu utilization. 22
  • 23. Scenario Number of attributes in the query and millions of rows Single attribute 1/3 of attributes 2/3 of attributes All attributes Modeled attributes 0.1 1 10 0.1 1 10 0.1 1 10 0.1 1 10 1 6 −2 −2 −2 −2 −4 −3 −4 −8 −6 −4 −9 −9 12 1 −2 1 −3 −6 −4 −5 −13 −10 −7 −17 −16 24 −2 −2 1 −6 −10 −7 −8 −21 −7 −9 −27 −16 2 6 1 +2 +2 −7 −2 1 −7 −3 −5 −5 −2 −12 12 +2 +3 +4 −6 −2 −3 −8 −2 −12 −9 −4 −26 24 +2 +6 +5 −3 −2 −6 −5 −4 −6 −7 −14 −22 3 6 1 +2 +3 −2 1 1 −3 1 1 −3 −2 −4 12 +2 +4 +5 −2 1 1 −3 −2 −5 −4 −2 −13 24 +3 +7 +8 −2 1 −3 −3 −2 −3 −5 −3 −13 4 6 +2 +2 +3 1 +2 +2 1 1 +2 −2 1 1 12 +2 +4 +3 1 +2 +2 1 1 −2 −2 1 −5 24 +4 +6 +6 1 1 1 1 1 1 −2 1 −4 (a) Scenario Number of attributes in the query and millions of rows Single attribute 1/3 of attributes 2/3 of attributes All attributes Modeled attributes 0.1 1 10 0.1 1 10 0.1 1 10 0.1 1 10 1 6 −2 1 −2 −2 −2 −2 −2 −2 −3 −2 −2 −4 12 −2 −3 1 1 −2 −2 −2 −3 −4 −4 −4 −7 24 1 −3 1 −2 −4 −4 −2 −6 −4 −4 −8 −6 2 6 +2 1 +3 +2 1 +2 1 −2 1 1 −3 −2 12 +2 +2 +5 1 −2 1 −2 −4 −3 −2 −4 −4 24 +3 +3 +10 −2 −2 −2 −3 −4 −2 −3 −4 −3 3 6 +2 1 +4 +2 +5 +5 +2 −2 +2 +2 −2 1 12 +2 +2 +7 1 1 +3 1 −2 1 1 −3 1 24 +3 +3 +14 1 −2 +2 −2 −2 +2 −2 −3 1 4 6 +3 +3 +7 +3 +7 +9 1 1 +3 1 1 +2 12 +4 +4 +8 +2 +2 +3 1 1 1 1 1 1 24 +8 +11 +14 +2 1 +2 +2 1 +2 1 1 1 (b) Figure 24: Query times in an anchor database compared to a de-normalized single table database for queries without (a) and with (b) conditions. When the number is positive +n the anchor database is n times faster than the single table database, when negative −n the anchor database is n times slower, and when 1 they are roughly the same. In (e), see Figure 25e, the larger size of a row in the single table compared to the normalized tables in the anchor database causes more data having to be read during a query, provided that not all attributes are used in the query. Eventually the time it takes to read this data from the single table will be longer than the time it takes to do a join in the corresponding anchor database. Furthermore, the less attributes that are used in the query the sooner the anchor database will overtake the single table database. In (f), see Figure 25f, the query time in an anchor database is independent of the number of attribute tables for a query that does not change between tests. In contrast, for the single table the amount of data that needs to be scanned during a query grows with the number of attributes it contains and queries take 23
  • 24. (a) (b) (c) (d) (e) (f) (g) (h) Figure 25: Extrapolated graphs comparing query times in an anchor database (dashed line) to a de-normalized single table (solid line) for (a) increased level of historization, (b) increased sparsity, (c) decreased number of unique values, (d) increased amount of data per key, (e) increased number of rows, (f) increased number of attributes in the model, (g) increased number of attributes in the query, and (h) increased number of conditions in the query. longer to execute. As long as a query does not use all attributes the anchor database will be faster than the single table. Furthermore, the more rows that are scanned the sooner the anchor database will overtake the single table database, due to the difference in scanned data size being multiplied by the number of rows. In (g), see Figure 25g, the performance will increase for the anchor database the fewer attributes that are used in the query. This is thanks to table elimination, which effectively reduces the number of joins having to be done and the amount of data having to be scanned. If the attributes after table elimination are few enough, the extra cost of the remaining joins with narrow attribute tables will take less time than scanning the wide single table, in which data not queried for is also contained. In (h), see Figure 25h, for every added condition limiting the number of rows in the final result set, the intermediate result sets stemming from the join operations in an anchor database will become smaller and smaller. In other words, progressively less data will have to be handled, which speeds up any remaining joins. In the single table database, the whole table will be scanned regardless of the limiting conditions, unless it is indexed. However, it is not common that a wide table has indexes matching every condition, which is why no such indexes were used in the experiment. 7.4. Conclusions from the Experiment One limitation of the experiment is the small number of steps in which the degrees of condition fulfilment are allowed to vary. Due to the large number of conditions, a set of tests had to be chosen that would yield indicative results without taking too much time to actually perform. A deeper investigation into the relationships between the conditions may be done as further research. The behaviour in other RDBMS could be different and therefore need to be verified. Furthermore, the performance on different types of hardware should also be measured and compared. The conclusion is that in many situations anchor databases are less I/O intense under normal querying operations than in less normalized databases. This makes anchor databases suitable in situations where I/O tends to become a bottleneck, such as in data warehousing [17]. Some of the results can also be carried over to the domain of OLTP, where queries often pinpoint and retrieve a small amount of data through conditions. In a situation where an anchor database performs worse, the impact of that disadvantage will have to be weighed against other advantages when using Anchor Modeling. 24
  • 25. 8. Benefits The Anchor Modeling approach offers several benefits. The most important of them are categorized and listed in the following subsections. 8.1. Ease of Modeling Simple concepts and notation Anchor models are constructed using a small number of simple concepts (Section 2). This simplicity and the use of modeling guidelines (Section 4) reduce the number of options available when solving a modeling problem, thereby reducing the risk of introducing errors in an anchor model. Historization by design Managing different versions of information is simple, as Anchor Modeling offers na- tive constructs for information versioning in the form of historized attributes and ties (Sections 2.3, 2.4). Iterative and incremental development Anchor Modeling facilitates iterative and agile development, as it allows independent work on small subsets of the model under consideration, which later can be integrated into a global model. Changing requirements is handled by additions without affecting the existing parts of a model (cf. bus architecture [15]). Reduced translation logic The symbols introduced to graphically represent tables in an anchor model (Section 2) can also be used for conceptual and logical modeling. This gives a near 1-1 relationship between all levels of modeling, which simplifies, or even eliminates, the need for translation logic in order to move between them. Reusability and automation The small number of modeling constructs together with the naming conven- tion (Section 2.5) in an anchor model yield a high degree of structure, which can be taken advantage of in the form of reusability and automation. For example, ready-made templates for recurring tasks can be made and automatic code generation is possible, speeding up development. 8.2. Simplified Database Maintenance Ease of attribute changes In an anchor database, data is historized on attribute level rather than row level. This facilitates tracing attribute changes directly instead of having to analyze an entire row in order to derive which of its attributes have changed. In addition, the predefined views and functions (Section 6.3) also simplify temporal querying. With the additional use of metadata it is also possible to trace when and what caused an attribute value to change. Absence of null values There are no null values in an anchor database. This eliminates the need to interpret null values [21] as well as waste of storage space. Simple index design The clustered indexes in an anchor database are given for attribute, anchor and knot tables. No analysis of what columns to index need be undertaken as there are unambiguous rules for determining what indexes are relevant. Asynchronous arrival of data In an anchor database asynchronous arrival of data can be handled in a simple way. Late arriving data will lead to additions rather than updates, as data for a single attribute is stored in a table of its own (compared to other approaches where a table may include several attributes) [15, pp. 271–274]. 25