Saturday, Aug 17th

Last update12:59:40 PM GMT

Database Normalization - Part 1

Write e-mail


Database NormalizationOne of the most important concepts in databases is that of Normalization. In this article I'll try to build an understanding on that.

There are two points to understand about tables and columns that are the essence of any database:

  • Tables store data about an entity 
    An entity may be a person, a book, a car or a part in it, or any other tangible or intangible object, but the primary consideration is that a table should only contain data about one thing.
  • Columns contain the attributes of the entity 
    Just as a table will contain data about a single entity, each column should only contain one attribute of that entity. If, for example, you're creating a table of addresses, there's no point in having a single column contain the city, state, and postal code when it is just as easy to create three columns and record each attribute separately.

Before we discuss normalization, let's look at some common flaws in database designs and the problems they cause. To illustrate these problems, I'll use the following sample table, which I'll call "BadTable":

Student Name Mentor Name SubjectID1 SubjectName1 SubjectTeacherName1 SubjectID2 SubjectName2 SubjectTeacherName2
Sunil Ravi C C Language Programming Yuvraj Java Intro to Java Rohit
Sachin Kapil Java Intro to Java Rohit .Net Programming .Net Ashish
Kapil Bishan MP Microprocessor Programming Anil C++ Object Oriented Programming in C++ Zaheer
Azhar Suresh C C Language Programming Yuvraj MP Microprocessor Programming Anil

Table-1 Bad Table

Let's look at some of the problems with this structure:

  • Repeating Groups 
    The Subject ID, subject name, and teacher are repeated for each class. If a student needs a third class, you need to go back and modify the table design in order to record it. Although you could add SubjectID3, SubjectID4, SubjectID5, etc., along with the associated subject name and teacher fields, no matter how far you take it, one day there would be someone who wants another class. Additionally, adding all those fields when most students would never use them is a waste of storage.
  • Inconsistent Data 
    Let's say that after entering these rows, you discover that Yuvraj's Subject is actually titled "Advanced Programming in C". In order to reflect this change, you would need to examine all the rows and change each individually. This always introduces the element of errors if one of the changes is omitted or done incorrectly.
  • Delete Inconsistencies
    If you no longer wished to track Rohit's Intro to Java class, you would need to delete two students, two mentors, and one additional teacher in order to do it. If you remove the first two rows of the table, all of the data is deleted with the reference to the Subject.
  • Insert Inconsistencies
    Perhaps the department head wishes to add a new class - let's call it "Advanced Java Programming" - but hasn't yet set up a schedule or even an teacher. What would you enter for the student, mentor, and instructor names?

As you can see, this single flat table has introduced a number of problems - all of which can be solved by normalizing the table design.

Return to Top of Page

Database Normalization 

Normalization is a process that is too often described using complex theory and jargon. I'm going to (hopefully) try to provide a plain English explanation of what database normalization means, the various forms of normalization, and how to normalize an unnormalized database.

What is Normalization? 

Normalization is essentially the process of taking a wide table with lots of columns but few rows and redesigning it as many narrow tables with fewer columns but more rows. A properly normalized design allows you to use storage space efficiently, eliminate redundant data, reduce or eliminate inconsistent data, and ease the data maintenence burden. Before looking at the forms of normalization, you need to know one cardinal rule for normalizing a database:

You should be able to reconstruct the original flat view of the data.

If you violate this rule, you would have defeated the original purpose of normalizing the design.

Forms of Normalization

Relational database theorists have divided normalization into several rules called normal forms.

  • First Normal Form 
    No repeating groups.
  • Second Normal Form 
    No nonkey attributes depend on a portion of the primary key.
  • Third Normal Form 
    No attributes depend on other nonkey attributes.

Also, for a database to be in second normal form, it must also be in first normal form, and for a database to be in third normal form, it must meet the requirements for both first and second normal forms. There are additional forms of normalization as well, but these are rarely applied. 

In the part 2 of this article, we will take a look at the First, Second and Third forms of Normalization in detail.

Let me add a disclaimer here normalization is not the answer to all your problems in developing a database design. As you will see later, there may times when it's prudent to de-normalize a structure. We will discuss these at a later stage.


Share this post

Add comment

Please refrain from using slang or abusive language in the comments.
To avoid waiting for your comment to be published after being reviewed, please log in as a registered user.

Security code

Web Hosting