View Single Post
Old 03-29-2005, 12:59 PM
loquin's Avatar
loquin loquin is offline
Google Hound

Retired Moderator
* Guru *
Join Date: Nov 2001
Location: Arizona, USA
Posts: 12,400
Default Data Design

When I was growing up, one of the biggest thrills in my life was, when after passing my driving test, Dad tossed me the car keys, and said “Go take the car for a spin son. Just be back by dinner.”

Of course, he knew that I wouldn’t have time to get too far, since dinner was in a half-hour, but still, getting the keys was a thrill! Now, one of the things you will often hear about, at least when discussing databases, are primary keys, and foreign keys. They don’t refer to how to win an election, or where the key was made. They do give an indication as to how well your database is planned, though.

“Why would you want to plan a database anyway? That takes a lot of time to do right, and I just don’t have the time to waste on that relational stuff!” Many people DO feel that time spent on the careful design of a database, or on a program, for that matter, is time wasted. They feel that they can accomplish more by diving right in and building things. OK. The next time you take a trip, and you drive over a bridge, would you rather cross one that’s had enough time spent on the design? Or maybe you just want to use the one that was thrown together without the use of blueprints…hmmm. While there are cases, primarily with extremely simple, or one-run applications, where you can ignore good project practices and build from the get-go, time spent on the design of a database, or on a program, almost always takes less time overall.

A good database design can allow you to enter and update data easily, and will allow you to easily query, summarize, and create reports. In addition, it will be easy for you to modify the design later, and the database will be easy to document and to maintain. Since a database is a model of the process you're automating, designing a database requires you to make many decisions, including which tables to create, which fields the tables should contain, what relationships should be created, and what rules to create. These decisions will need to be made whether you follow good design practices or not; but the relational model will help guide you in the decision-making process.

However – the relational model cannot help you know the process you are trying to model. Before you begin modelling the business or process, make SURE you understand it! This will take careful study, and probably a lot of meetings with the people who know the process well.

The relational model was created in 1969 by Dr. E.F. Codd. This model is based on the mathematic disciplines of set theory and predicate logic. While you don’t need to know this math to apply the relational model, it’s good to know that sound mathematical principles form the basis for it.

In the relational model, Tables represent groups of things (entities) in the real world. These things can represent either objects, or events. For instance, tables might be created to represent objects such as Employees, Inventory, or Locations. You might also build a table to represent events like telephone calls, or customer orders, or work assignments.

You should never have duplicate records in a table. If you do, then you can’t uniquely identify or distinguish between the duplicate records programmatically. This can create all sorts of problems in trying to maintain the records. One way in which you can guarantee that all the records in a table are unique is to designate (or create) a primary key. The primary key is a field, or group of fields that you will use to uniquely identify a record in your table. For instance, the EmployeeID field might be the primary key for an employee table.

Candidate Key(s) is the term used to identify the potential keys in a table that could be used as a primary key. Keys can be simple (made up of one field,) or composite, meaning that they are made up of multiple fields. In deciding which candidate field(s) should be used for the primary key, you should choose a key that is Minimal (keys made up of fewer fields are desirable,) Stable (keys whose values are less likely to change are better,) and Simple and Familiar (because they are easier to remember.)

If there aren’t any acceptable candidate keys, either you are leaving out some important field, or you may need to have your system generate one for you. Such an automatic field is known as a surrogate key. In Access, an Auto Number field, in SQL Server, an Identity field, and in Oracle or PostgreSQL, a Sequence, can all make good surrogate keys.

In a relational database, it’s important to be able to establish relationships between related tables. For instance, in a sales system, suppose you have a table of customers, called tblCustomers, defined with the fields CustomerID (the Primary Key), FirstName, LastName, Address, City, State, AddressCode, and PhoneNum. In addition, you’ve created a table with order details, called tblOrders, which includes fields such as OrderNum, OrderDate and OrderValue. As it is, however, you have no way of relating the two tables together. You Could just copy all the information about the customer from the customer table into new fields in the orders table. And, that would work. Just not very well. First, this means that every time Joe Smith places an order, you get to create a new copy of his name and address. Not only does that take up a lot of space, but what do you do when customer Joe calls up and changes his address? Do you have to go back through your records and update all his orders? Instead, a better approach would be to just store a reference to the customer record. You would modify the table tblOrders by adding just the CustomerID field. This key field, which references a different table, is called a Foreign Key. Your Orders table now has TWO keys – the Primary Key (OrderID) and the Foreign Key (CustomerID.)

This type of relationship between the two tables is known as a one-to-many relationship. The one-to-many relationship is the most common relationship in a database application. In our example, the customer (one) can have none, one, or many orders.

The second type of relationship which may need to be addressed in your application is the one-to-one relationship. Tables are said to have a one-to-one relationship if a record in the first table can be referenced by at most, one record in the second table. While very few true one-to-one relationships can be defined in the real world, a one-to-one relationship is quite often defined artificially, typically for security reasons, or to overcome limitations in the database used. In many cases, for instance, you may wish to keep sensitive client information, such as name, address, and social security number, separate from other information. For example, in a hospital database, the client medical data, such as birth date, sex, and allergy information, should be maintained separately from the address and billing information. Different hospital users would be assigned different access levels into the database. The nurses and doctors, for instance, could see the patient’s allergy data, while users in the billing department would not be cleared to review this information. Even though more advanced relational database systems like SQL Server and Oracle allow security restrictions to individual fields within tables, one-to-one relationships are often utilized as well.

The third type of relationship that you may need is the many-to-many relationship. In this relationship, zero to many record from the first table may be related to zero to many records in the second table. An example of this might be a track meet. In a track meet, there are many different events, and one event will have many participants. Conversely, a given participant may be participating in many events. Although it is not possible, in a relational database system, to directly define a many-to-many relationship, this relationship is easily created by building an intersection table, also known as a linking table. An intersection table holds foreign keys to the two many tables of the relationship. The primary key of the intersection table is a compound key, consisting of both of the foreign keys. In our track meet example, the event table, tblEvent, could have fields named EventID (PK), EventName, and EventDescription. The Participant table, tblParticipant, would hold fields ParticipantID, FirstName, LastName, Sex, Age, etc. The intersection table would have, at a minimum, the foreign key fields EventID and ParticipantID. Since the primary key for this table is EventID and ParticipantID, there can only be one unique combination of event and participant, which accurately models the real-world process of a track meet. Intersection tables are often used to model object-events, such as appointments, which pair people and times. In fact, in addition to the people/time intersection table, a working appointment database would also employ an intersection table joining time to location, and a third intersection table joining people to location.

Even though applying primary and foreign keys in your appointment book database just doesn’t have quite the same thrill as taking your Dad’s ’68 Mustang out on your first solo, you can get a lot of satisfaction in getting the relationships set up properly, so that you provide a well-designed database for your project.

FYI: There's a very nice CBT (Computer Based Training) module (flash) on Referential Integrity at the PostgreSQL site. The information provided is general enough to be useful on any database which truly supports R.I. Recommended.
"I have my standards. They may be low, but I have them!" ~ Bette Middler
"It's a book about a Spanish guy called Manual. You should read it." ~ Dilbert
"To understand recursion, you must first understand recursion." ~ unknown

Last edited by webbone; 01-09-2009 at 05:38 PM.
Reply With Quote