Sunday, March 11, 2012

Two Primary Key design principles

During my years of database design, I found 2 things that always make my life easier when it comes to primary keys:
  1. I recommend to always use Surrogate (otherwise known as "dummy") primary keys. Never assign a "meaningful" column to be the primary key of a table. On Oracle, these are implemented by using sequences and before-insert triggers. In Microsoft's world, SQL Server and MS Access have the auto increment columns. I was once involved in a project where the Id Number was the primary key of an an "Individual" table. The designers took the word of the business people that an individual's Id Number never changes, so they assigned this field to the primary key of the table. Then they built a system of 300 tables, and about 100 of those had a foreign key referencing the above table/field. And of course after a couple of years in production, a case came in where an individual had lost their ID Number and a new one was issued. This was a nightmare: We had to write oracle scripts that dropped the foreign keys, do the update and then re create the foreign keys.
    The rule is, as long as the the primary key column has some meaning to the user, then this column may change and databases do not generally like updates on primary key values. There are ways to do it, I know, but why go through the trouble? Just always use Surrogate keys and you will be free from issues like changing or removing records by primary keys.
    Another reason to use surrogate primary keys is if you have some kind of user interface framework that inserts/updates/deletes rows, usually these frameworks work by identifying rows by primary keys. In case you do not have surrogate/dummy keys in your tables, you user interface can seriously mess up your data. Consider this scenario:
    1. User requests to edit record where id number equals '1000'. The system finds it by issuing the appropriate sql where clause ( "WHERE ID_NUMBER = 1000") and shows it on the screen.
    2. The user updates the data on the screen, and then mistakenly enters '1001' in the id number text box. She then submits the changes
    3. The system will then update using where clause "where id number='1001'" since it was changed by the user on the screen, and mistakenly update record 1001 instead of 1000! If you thing this is far fetched, think again! It happened within the first couple of days of UAT for a system I was involved in migrating from Informix to Java 2EE/Oracle. We ended up using the Oracle ROWID instead of the primary key to identify records for the user interface. Probably the best solution to begin with, but had the database use surrogate primary keys then we would not have a problem since user interface users would never see or be able to change the primary key value.
  2. I recommend to Never use composite primary keys. Even if it makes sense from a normalization point of view, I always use a a SINGLE sequence/auto increment column to be the primary key of the table. This helps greatly when mapping the database table to some kind of Object Relation Mapping tool. Also, if again you will have to use a User Interface framework to manage your rows, having a sigle primary key column helps to keep the where clauses that identify rows short and easy to read and maintain. As an example, consider the case where you have a many to many relationship between an INDIVIDUAL table and a VEHICLES table. In other words, an individual can own many vehicles, and a vehicle can be owned by many individuals.   Additionally, a vehicle cannot be owned by an individual on the same date.  If we use a composite key, we will end up with something like this:


    With the above design, in order to identify a record for selects updates and deletes, your WHERE clauses have to include all 3 fields that logically define a unique record: SELECT * FROM VEHICLE_OWNER WHERE VEH_ID=:1 and IND_ID=:2 and OWN_START_DATE=:3
    Alternatively, you can use a single surrogate column for the primary key. You can then add a unique constraint to the table on the 2 foreign keys and the OWN_START_DATE field. Here is the design:

    With the above design, your WHERE clauses are kept short and maintanable:   SELECT * FROM VEHICLE_OWNER WHERE VEH_OWNER_ID=:1
    In addition, if in the future you introduce another table which needs a foreign key to VEHICLE_OWNER, you can easily extend your design since you only have a single field as the primary key. You simply add the VEH_OWNER_ID to that new table and create the foreign key. Had you use composite fields as the primary key, then you would have to drop the composite primary key, add a new column, populate values and create a single field primary key.

Monday, March 5, 2012

The Single Responsibility Principle (SRP)

Simply put: A class should have one and only one task to perform. Or another way to put it: A class should have one and only one reason to exist. Object Mentor says it (and describes it) best at http://www.objectmentor.com/resources/articles/srp.pdf

The single responsibility principle is one of those things that sounds too theoretical. How can a class have just a single task to do? What if my system has 1000 tasks to be performed, do I have to create 1000 classes? Well, YES :-). And I am here to tell you that this principle works, in practice.

What did I gain by following this principle?
  1. Improved the maintainability of my code: Since a class does only one thing, I can touch it w/o breaking any code that does other things. Simple :-)
  2. Improved the extensibility of my code:
And the drawbacks?
  1. The so called 'Class Explosion'. Your application may end up with too many classes to manage.
  2. Yes, I can change code in classes without worrying too much about affecting other code, but first I have to find the code I need to change. And with myriad of classes it can get tricky to pinpoint what you want to change