Database design guide

Database Design Guide
Source : http://www.smart-it-consulting.com/

Before you set up a new database, usually you spend a lot of time at the
white board. Here are some basic tips: my Dos and Don'ts of database
design. Most probably they will reduce your efforts and help you to gain a
clean database design. I didn't write the book on database design, but I
think my experience earned in many projects could be helpful in some
cases. My examples refer to Progress® databases and Progress Software®
4GL, but you'll get the idea, even when you use another database system.

Let's start with a few naming conventions. The usage of dashes, spaces,
digits and special characters is a bad idea, although your database and
operating system might handle these characters (Cobol semantics like
CUST-NAME-1 are ugly and outdated). Ensure the uppercase and lowercase
conversion of each name (applies to tables, prefixes, attributes, sequences
etc.) is unique within the scope of your enterprise-wide databases. Check
your spelling, renaming tables and attributes afterwards is a PITA.

CamelCase all names. Acronyms and abbreviations should not be used in
names when they aren't well known by your users. If you can't avoid them,
write only the first character in capital letters, especially in composite names
like UpsServiceTypes or Customer.VatId. Well, no rule without
exception, ID (unique tuple identifier in a table) as well as OID (enterprise-
wide unique object identifier) should always be printed in capital letters, as
long as the abbreviation is part of the name of a technical attribute
(e.g. VatId - Value Added Tax Identifier vs. CustID or CustOID - primary
key of Customers).

Avoid language mixups, especially if you're not a native speaker and/or your
application has no English user interface. Names
like Buddies.BudVerjaardag sound plain silly, but Maat.MtVerjaardag is
understandable (at least if your understanding of Dutch is flawed). Check
your spelling. Once your application is running, it's hard to live with typos.

Table names and labels designate the business object. Don't use technical
wording nor geek speech. Persistent instances of customers live in a table
named Customers, assigned UPC numbers inAssignedUpcNumbers, UPS
shipments in UpsShipments and UPS parcels in UpsParcels. Since you store
more than one instance of a business object in each table, use plural only.

Each table can be identified by an (enterprise-wide) unique prefix. Never
use a prefix twice. If you have bothInvoices and Inventories, assign

different prefixes like Inv for Invoices and Ivt for Inventories. The prefix
is part of each attribute name and should be used in related sequences and
index names as well.
So far, so easy. When it comes to attribute names, naming conventions
become more complicated. Let's start with technical attributes, because
there is no occasion for interpretations.

In order to guarantee uniqueness, each table has a technical primary
key (a surrogate primary key populated by the create trigger with a unique
sequence value, but preferential a UUID), which will never get a
business meaning. Don't argue, primary keys with business meaning as
well as composite keys are a bad idea. There is nothing to say against
additional unique columns with business meaning, but do not merge the
underlying technical implementation with your business logic. Name the
primary key = table prefix + OID (or ID), e.g. CustOID or CustID. If an
object has children or is an attribute of other objects, use the unchanged
and unextended name of the parent table's primary key as foreign key in
the child table respectively referencing table.

Say you've a table Invoices and a table Addresses:
Addresses.AdrOID [primary key]
Addresses.AdrOtherAttributes ...
Invoices.InvOID [primary key]
Invoices.AdrOID [foreign key]
Invoices.InvOtherAttributes ...
Index Invoices.AdrOID and you can code
FOR EACH Addresses OF Invoices:
Do something.
END.
or
FOR EACH Invoices WHERE Invoices.InvNetAmount >= 1000.00,
EACH Addresses OF Invoices WHERE Adresses.AdrZipCode BEGINS '34':
Do something.
END.
instead of
FOR EACH Addresses WHERE Addresses.AdrOID = Invoices.AdrOID:
Do something.
END.

There is one exception to this rule. Sometimes an object is an attribute of
another object multiple times, without being a class itself. Different roles are
marked by a number sign '#'. The most important foreign key name is kept
as is, other roles are extended by '#Role':
Invoices.AdrOID [billing address]
Invoices.AdrOID#Delivery [delivery address]
Actually, this is way beyond a clean (normalized) database design. Also,
most design tools will not handle such non-normalized structures. If

possible, you should avoid attribute name extensions, better normalize
instead. To bring this point home, let's say your customers provide
permanent delivery addresses. By the way, delivery addresses tend to have
their own attributes and behavior. Most probably a bunch of shipping
addresses are an attribute of Customers:
DeliveryAddresses.DelAdrOID [primary key]
DeliveryAddresses.CustOID [foreign key]
DeliveryAddresses.AdrOID [foreign key]
DeliveryAddresses.DelAdrDispatchType [another attribute, which in real life
would be the reference to a carrier]
Invoices normalized:
Invoices.AdrOID [billing address]
Invoices.DelAdrOID [delivery address]

Let's come to attributes with business meaning. Besides technical
attributes in different roles, I can think of other cases where it is necessary
to extent attribute names. For example default values. As long as there is
just one default value, put it in the attribute's definition. Otherwise you've a
table storing those values:
Discounts.DiscOID
Discounts.DiscAppliesToBusinessType[e.g. wholesale, distributors, retail...]
Discounts.DiscPercent

Since discounts given to customers are calculated individually, the
percentage can vary from customer to customer and it makes no sense to
reference Discounts in Customers. However, in the interest of a readable
model it is good style to mark the source, therefore the attribute discount
percent of Customerskeeps it's source:
Customers.CustOID
Customers.DiscPercent#Cust

There are other advantages of consistent naming rules. In commercial
applications you're dealing with discount percentages in tons of objects.
Imagine you need to analyze your enterprise wide discount policy. Finding
all instances of discount percentages can become a PITA in complex
systems. Consistent naming provided, you can search in your system tables
for 'DiscPercent*' and you get a complete list:
Discounts.DiscPercent
Customers.DiscPercent#Cust
Invoices.DiscPercent#Inv
InvoiceLines.DiscPercent#InvLine
...

If your application shall be used by a group of (affiliated) companies, where
each single company is representing another client in the multi-client
capable accounting system, things become difficult. The easiest solution
would be the physical splitting of your ERP database. Keep all common

objects like countries, currencies, users, clients (=accounting clients) etc. in
one database, and all company related objects in another database. Connect
your users to the first ERP database and the accounting database, let them
choose a client, then create an alias for the client's ERP database to ensure
all client databases can share the same programs. Large operations tend to
shop and sell subsidiary companies every once in a while. The usage of
physical client databases makes this kind of moves a simple and painless
task.

Unfortunately, sometimes a developer's life is not that easy. In a
multicorporate enterprise many subsidiary companies work on the same
projects, billing their time and material partly within the group. That means
subsidiary companies share access to a lot more business objects than just
countries and currencies. Besides a ton of group-wide objects, templates to
ensure enterprise-wide identical customer account numbers and such stuff,
you need the attribute accounting client in many objects. Do not use the
same attribute name in all tables, because database systems and design
tools can't handle the primary relations if you do it. Name the
column client number (or client OID) differently in each table, using the
source pointers explained before, e.g.
Invoices.ClientNumber#Inv
Customers.ClientNumber#Cust
Vendors.ClientNumber#Vend
...

The above said leads to the cognition, that consistent naming is a good
idea in general. IOW: Without a strong naming convention your project
will fail. Each and every name must be self-explanatory and similar
meanings must be kept in identical wording. Some examples:

Invoices.InvPrinted says whether an invoice has been printed or
not, Invoices.InvDatePrinted stores the date of the last
printout, Invoices.InvPrintCounter tells us how many times an invoice
has been printed yet and can be used to mark copies. The same goes for
confirmations of orders and other forms:
OrderConfirmations.OrdConfPrinted, OrderConfirmations.OrdConfDate
Printed, OrderConfirmations.OrdConfPrintCounter and so on.

Look at the first attribute in my example. In common
speech Invoices.InvPrinted can stand for a Boolean value as well as for a
date. To avoid any confusion, you can make it even clearer by naming the
logical attribute Invoices.InvIsPrinted, which leads to perfectly
understandable code like ...

FOR EACH Invoices WHERE NOT Invoices.InvIsPrinted AND
Invoices.InvDateCreate =< (TODAY - 10) AND
Invoices.InvIsDispatched,
EACH Customers OF Invoices,
EACH Staff OF Customers:
ASSIGN
lOk = sendEmail(Staff.StEmailAddy,
'Send out invoice # ' + STRING(Invoices.InvNumber) + ' $'
+ string(Invoices.InvGrossAmount),
'To ' + crlf + getMailAddress(Customers.AdrOID) + crlf + '
immediately')
Staff.StBrowniePoints = Staff.StBrowniePoints - 1.
END.
... and more examples. All types of amounts are addressed by the same
name:
Invoices.InvNetAmount Orders.OrdNetAmount ...
Invoices.InvTaxAmount Orders.OrdTaxAmount ...
Invoices.InvGrossAmount Orders.OrdGrossAmount ...
All numbers are called 'Number' and not 'No', 'Num' ('Num' usually means
'number of') or whatever:
Invoices.InvNumber
Customers.CustNumber (if there is a numeric customer number)
Products.PrdUpcNumber
Countries.CoIsoNumber (ISO 3166 numeric code)
...
Alphanumeric codes are (usually) named 'Code' like
Countries.CoIsoCode (ISO 3166 alphanumeric code)
Products.PrdCode (or Products.PrdSku)
Customers.CustAccountCode
Currencies.CurrIsoCode
...
Borderline cases are 3rd party, non-unique technical keys with business
meaning like the UPS 1Z Tracking Number, which contains both digits and
letters. I'd call it UpsParcels.UpsP1zTrackingNumber, because the term is a
matter of common knowledge and, technically spoken, '1Z' even indicates
an alphanumeric value.

The same goes for all common name components like 'description',
'remarks', 'name', 'quantity', 'price' and so on, I guess you've got the idea.
If possible, try to express the data type by attribute names, not only in
attributes of the type date and logical. 'Url' or 'Description' indicate a single-
line character field, 'LongDescriptions', 'Remarks' or 'Notes' usually get
stored in large text fields, 'Percent', 'Amount' and 'Price' imply decimal
values, 'NumberOf' or 'PageNumber' represent integers and so on.

As for the visible parts of your model, there is not much more to say,
except check your spelling before you save definitions and assign a help text

to each attribute. Besides the above mentioned object identifiers and one to
many relationships, you need a policy for many to many
relationships too. Those are kind of technical classes, making complex
relationships persistent. Users will never see their names nor attributes, so
you may use geek speech. Here is a proven system: name those tables
composing your unique table prefixes delimited by the digits '2' (to) and '4'
(for). If your customers can belong to different groups, the table
representing the relationship 'customers [belonging] to customer groups' is
named Cust2CustGrp and contains only three keys:
Cust2CustGrp.Cust2CustGrpOID [primary key]
Cust2CustGrp.CustOID [foreign key]
Cust2CustGrp.CustGrpOID [foreign key]
To handle all customers of a group you code
FOR EACH Cust2CustGrp OF CustomerGroups,
EACH Customer OF Cust2CustGrp:
Do something.
END.
To get a list of all groups a customer belongs to you write:
FOR EACH Cust2CustGrp OF Customers,
EACH CustomerGroups of Cust2CustGrp:
Do something.
END.

In some rare cases these prevailing technical classes have other attributes.
Pragmatically, here I'd go for an descriptive table label and stick with the
geeky table name. Actually, most probably those attributes are simple
connections, keeping the table itself invisible to users. E.g. if you've a table
storing Xmas present types, you could assign the type (or value) of presents
depending on one of the groups assigned to your customers:
XmasPresentTypes.XptOID
XmasPresentTypes.XptPostardOnly
XmasPresentTypes.XptPrice

Cust2CustGrp4Xpt.Cust2CustGrp4XptOID [primary key]
Cust2CustGrp4Xpt.Cust2CustGrpOID [foreign key]
Cust2CustGrp4Xpt.XptOID [foreign key]
or
Cust2CustGrp4Xpt.Cust2CustGrp4XptOID [primary key]
Cust2CustGrp4Xpt.CustOID [foreign key]
Cust2CustGrp4Xpt.CustGrpOID [foreign key]
Cust2CustGrp4Xpt.XptOID [foreign key]
Pick whatever fits your needs best.

Now let's come to another important rule: Separate all technical stuff
from your business logic. You can't avoid technical attributes in tables
representing business objects, but you can and you should handle them
separately. For example you can assign values like
Table.PrefixOID
Table.PrefixUserLastUpdate (if you don't log user activities, probably you need to store these
data on creation too)

Table.PrefixDateLastUpdate
Table.PrefixTimeLastUpdate
Table.PrefixIsActive || Table.PrefixIsDeleted
in database triggers. Be aware that in n-tier architectures database triggers
usually do not know the user. If you need to log user activities, you can
implement this feature in your key wrapping widgets. Since your
technical primary keys can't be used in user interfaces, you create a key
wrapping widget for each primary key. This widget knows the invisible
primary key and enables the user to choose or enter one or more attributes
with business meaning, which can be used to identify an object. Looking at a
data viewer, those widgets appear just like fill-in fields with search button or
combo boxes. In the background they pass values of technical keys as well
as screen values of their visible attributes with business meaning to an
application server, or another process handling your persistent objects.

Back to logging. Since every data viewer must contain at least one key
wrapping widget (one handling the primary key and probably a few others
handling foreign keys), you can determine the current user here. Just pass
another hidden value to your persistence handler. Then in the database
trigger you compare the old and new buffer, logging changes only. With a
Progress® database, you can fully automate user activity logging using
generated includes in write triggers, made up by a tool accessing the virtual
system tables (VST). By the way, you should assign values to primary keys
in create triggers only. At this point, recap another important rule on state
of the art software design: Do not put any business logic into the user
interface code. Think SOA and encapsulate technical services as well as
audit trail requirements.

Another rule of thumb is: Do not delete physically. Admitted deletions are
technically possible, they are way too expensive, not really necessary and
furthermore you destroy information which as a rule you will need some
day. Deleting logically on the other hand perfectly keeps your referential
integrity, and it is way faster because your database servers update just one
column in a parent table, instead of bothering with often almost endless
cascading deletes along with RI checks. Adding a WHERE clause [NOT]
ParentTable.PrefixIsDeleted, or, much better, [NOT]
ParentTable.PrefixIsActive is cheap in comparison with all the nasty
side effects of physical deletion. Tell your delete button to set a logical
attribute isDeleted to true, or even dump the button and use a check box
instead, which allows your users to reactivate inactive objects.

Large projects can easily exceed the physical limits set by your database
system. If you deal with very large amounts of data in particular entities,
ensure that primary keys of (physically sliced) mega entities are never used

as foreign key in other tables. Only the (logical) mega 'table' keeps
knowledge about relations to other entities. That should not lead to
problems, because these entities are usually children of others (for
examplesales transactions of sales slips of POS terminals of shops).
Implement a smart data access layer handling the requests from higher
application levels. Depending on key value ranges and/or date-time
attributes, the data access layer can determine in which table a requested
tuple is located and in which table a new tuple must be stored, while from
the higher level's perspective this conglomerate of tables comes into view as
one logical table.

The next warning has, like the two rules above, the potential for a bunch of
articles: Avoid array fields. Most persistent arrays I saw, were the work of
lazy code monkeys who weren't capable to look a step further. Although
some database systems like Progress® can handle array fields, most
database systems do not (why should they support tables in tables for
database designers not able to normalize properly?). Furthermore, lots of
front ends and underlying components as well as development tools will not
handle extended attributes. Migrating applications it's hard enough to handle
these constructs in settled (legacy) databases, so don't create new troubles.
As for Progress® word indexes, which work like a charm with character
arrays, there is an alternative compatible with other databases. Just add a
word indexed large text field and populate it with a string of the attributes in
question in your write trigger.

Modelers and developers following the relational theory as set in stone most
possible will be offended by some of the code examples above. In former
paradigms it was -politely expressed- not the best practice to use syntax
like ChildTable OF ParentTable, because (using attributes with business
meaning as primary and foreign keys) it was not obvious which attribute
pair got used to join the objects. However, we got rid of that incredible
stupid concept in the meantime. OF has evident advantages:

A clean database design provided, those misunderstandings caused by
ommission cannot occur, because each and every join uses a single pair of
indexed technical keys in both tables. The technical implementation of
relationships has nothing to do with business logic, thus the consistent
usage of OF increases code readability. Actually, technical attributes should
not appear in any code handling business logic (exceptions
like Table.PrefixIsActive, standing for not logically deleted, and other
technical attributes with at least a portion of business meaning admitted).

If OF fails, you have a technical problem like a missing index on a foreign
key column or (indexed) attribute names are equal in both tables, which

both must not happen. Fortunately the compiler will quit with an error
message in this case. That means, the consistent usage of OF followed by
a WHERE clause expressing business logic by testing attributes with business
meaning, prevents you from logical errors as well as errors and ommissions
in the physical database design.

As I said in the beginning, my intention was not to write a book explaining
each and every aspect of database design. Most probably that's impossible,
because different business requirements do need different solutions. I wrote
this article off the top of my head on a rainy Saturday afternoon, so please
don't expect completeness. And since I make a living with IT consulting,
you'll agree that it would be a bad idea to publish all my business secrets ;)

Author: Sebastian
Published: December 2004 LastUpdate: May 2005

Database design guide

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Database design guide

Semelhante a Database design guide (20)

Mais de Universitas Putera Batam

Mais de Universitas Putera Batam (20)

Último

Último (20)

Database design guide