Codd proposed Normalization in his paper that outlined the relational model. It is based on the concept of Normal forms. A relation/table is said to be in a particular normal form if it satisfies constraints specified for that normal form. See pages 126-128 for an overview of normal forms (i.e 1NF - 5NF and DK/NF).
Normalization is essentially the process of removing redundant data from relational tables by splitting the tables into smaller tables by projection. The idea is to do this without loss of information. That is, we should be able to combine the new tables by a natural join to recreate the original tables.
To accomplish this we employ a series of normalization rules. We will address rules that allow us to construct tables in third normal form.
First, Second & Third Normal Forms
A relational table is by definition in first normal form (1NF). That is, the attributes/columns of the table must be single valued and so cannot contain repeating groups or arrays. However, tables in 1NF may be susceptible to a variety of update anomalies. To avoid these anomalies, all relational tables should be in, at least, third normal form (3NF).
Second normal form (2NF) and third normal form deal with the dependence between non-key and key attributes/columns. Under 2NF and 3NF, the following general rule applies:
In addition, the table must satisfy 1NF. Following is a discussion of 2NF and 3NF, with rules to transform tables that violate these normal forms into compliant tables.
2NF
2NF is violated when a non-key attribute/column is a fact about a subset of a key. Notice that this is only relevant when the key is a composite key (i.e consists of several attributes/columns). Consider the following relation/table:
The attribute/column WhseAddress is a fact about WhseNumber alone and so is a fact about a subset of the primary key of the relation/table INVENTORY. The following problems may arise from this situation:
To address this weakness we may apply the following transformation:
Hence, the INVENTORY table is transformed by projection into two tables.
3NF
3NF is violated when a non-key attribute/column is a fact about another nonkey attribute. Notice that this is quite different than the situation presented for 2NF above where a non-key attribute was a fact about a subset of a key. Consider the following relation/table:
Let us assume that each department is located in one place. For example, the IT department is located on the 5th floor of the CNA building. Location, is a fact about Department (in addition to being a fact about employee). The problems that may arise from this situation are similar to those above.
To address this weakness we may apply the following transformation:
Hence, the EMPLOYEE table is transformed by projection into two tables.
Readings: Chapter 5, pg 126-130 (optionally, 131-145)