Tuesday, Aug 22nd

Last update12:59:40 PM GMT

Database Normalization - Part 2

Write e-mail

Lets continue from the first part of Normalization and understand each form of normalization nowNormal Forms

argaiv1864

For reference, our table BadTable is shown below:

Student Name Mentor Name SubjectID1 SubjectName1 SubjectTeacherName1 SubjectID2 SubjectName2 SubjectTeacherName2
Sunil Ravi C C Language Programming Yuvraj Java Intro to Java Rohit
Sachin Kapil Java Intro to Java Rohit .Net Programming .Net Ashish
Kapil Bishan MP Microprocessor Programming Anil C++ Object Oriented Programming in C++ Zaheer
Azhar Suresh C C Language Programming Yuvraj MP Microprocessor Programming Anil

 First Normal Form

No repeating groups.

In the First Normal Form, we look for repeating groups of columns and eliminate them. Our aim is to reduce the width of the table. This is done by taking those columns that repeat and making a new table which is defined using those repeating groups. Now, instead of additional columns, the resulting table has more rows.

OK, so how do we look for repeating groups? Let's look at the columns in our sample BadTable:

  • StudentName
  • MentorName
  • SubjectID1
  • SubjectName1
  • SubjectTeacherName1
  • SubjectID2
  • SubjectName2
  • SubjectTeacherName2

In this example, the subject columns are repeated to allow the student to take two Subjects at the same time. The problem occurs when the student wants to take three Subjects or more. Now, you can go ahead and add SubjectID3, etc., to the table, but this would make the table bulky and clumsy. The proper solution is to remove the repeating group of columns and make another table.

Having a table with columns that end in xx1, xx2, etc., is a clear warning that there are repeating groups in the table and needs normalization.

Ok, so let us handle the repeating groups:

Students StudentSubjects
StudentID
StudentName
MentorName
SCStudentID
SCSubjectID
SCSubjectName
SCSubjectTeacherName

Table 2: First Normal Form 

The primary keys are shown in italics.

We've divided the tables so that the student can now take as many Subjects as he wants by removing the Subject information from the original table and creating two tables: one for the student information and one for the Subject list. The repeating group of columns in the original table is gone, but we can still reconstruct the original table using the StudentID and SCStudentID columns from the two new tables. The new field SCStudentID is a foreign key to the Students table. Our BadTable is now in a better shape.

Second Normal Form

No nonkey attributes depend on a portion of the primary key.

Second Normal Form really only applies to tables where the primary key is defined by two or more columns. The essense is that if there are columns which can be identified by only part of the primary key, they need to be in their own table.

Let's look at the sample tables for an example. In the StudentSubjects table, the primary key is the combination of SCStudentID and SCSubjectID. However, the table also contains the SCSubjectName and theSCSubjectTeacherName columns. These columns are only dependent on the SCSubjectID column. In other words, the subject name and Teacher's name will be the same regardless of the student. How do we resolve this problem? Let's revisit the sample tables.

Students StudentSubjects Subjects
StudentID
StudentName
MentorName
SCStudentID
SCSubjectID
SubjectID
SubjectName
SubjectTeacherName

Table 3: Second Normal Form

What we've done is to remove the details of the Subject information to their own table Subjects. The relationship between students and Subjects has at last revealed itself to be a many-to-many relationship. Each student can take many Subjects and each Subject can have many students. The StudentSubjects table now contains only the two foreign keys to Students and Subjects.


We're almost done normalizing this small sample, but before taking the last step, lets add a little more detail to the sample tables to make them look something more like the real world.

Students StudentSubjects Subjects
StudentID
StudentName
StudentPhone
StudentAddress
StudentCity
StudentState
StudentZIP
MentorName
MentorPhone
SCStudentID
SCSubjectID
SubjectID
SubjectName
SubjectTeacherName
SubjectTeacherPhone

Table 4: Detail Columns Added

Return to Top


Third Normal Form

No attributes depend on other nonkey attributes.

This means that all the columns in the table contain data about the entity that is defined by the primary key. The columns in the table must contain data about only one thing. This is really an extension of Second Normal Form - both are used to remove columns that belong in their own table.

To complete the normalization, we need to look for columns that are not dependent on the primary key of the table. In the Students table, we have two data items about the student's Mentor: the name and phone number. The balance of the data pertains only to the student and so is appropriate in the Studentstable. The Mentor information, however, is not dependent on the student. If the student leaves the school, the Mentor and the Mentor's phone number will remain the same. The same logic applies to the Teacher information in the Subjects table. The data for the Teacher is not dependent on the primary key SubjectID since the Teacher will be unaffected if the Subject is dropped from the curriculum.

Let's complete the normalization for the sample tables.

Students Mentors Teachers StudentSubjects Subjects
StudentID
StudentName
StudentPhone
StudentAddress
StudentCity
StudentState
StudentZIP
StudentMentorID
MentorID
MentorName
MentorPhone
TeacherID
TeacherName
TeacherPhone
SCStudentID
SCSubjectID
SubjectID
SubjectName
SubjectTeacherID

Table 5: Third Normal Form


Some additional modifications

There's one more modification that could be made to this design. If you look at the Mentors and Teachers tables, you can see that the columns are essentially the same: a name and phone number. These two tables could be combined into a single common table called Staff or Faculty. This would make the design simpler by using one less table. The disadvantage is that you may need to record different additional details for Teachers than you would record for Mentors. One possible way of resolving this conflict would be to go one further step and create a Staff table that records basic information that any organization would retain about an employee, such as name, address, phone, Social Security number, date of birth, etc. Then you would have the Mentors and Teachers tables contain foreign keys to the Staff table for appropriate individual along with any additional details that might need to be stored specifically for that particular role. The benefit of this approach is that if a member of the staff is both an Mentor and an Teacher (this would often be the case in a college or university), the basic information would not need to be duplicated for both roles.

Return to Top

 

Don't go overboard with Third Normal Form or you'll wreak havoc on performance. If you look at the Students table, you can see that in fact any city and state in India could be identified by the PIN code. However, it may not be practical to design the database so that every time you need to get an address you have to join the row from Students to a table containing the (approximately) 1 million PIN codes in India. A general guideline is that if you routinely run queries which join more than four tables, you may need to consider denormalizing the design.

There are some other Normal Forms taught in engineering colleges as well, like the BCNF, 4NF and 5NF forms and a very recently introduced 6NF. Although these concepts are used, they are more theoritical than practical and trust me, companies wont ask you a lot of questions on them during their interviews.

References

Share this post



Add comment

Please refrain from using slang or abusive language in the comments.
To avoid waiting for your comment to be published after being reviewed, please log in as a registered user.


Security code
Refresh

Web Hosting