Database Normalization

Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data. It divides larger tables to smaller tables and links them using relationships.

images/articles/mysql/database-normalization.jpg

In relational database design, you not only want to create a structure that stores data, but you also want to store in a way that minimizes potential errors when you work with the data. Data can be manipulated in three following ways:

  1. Insert new data
  2. Delete unwanted data
  3. Update existing data

So, in an without normalized design, there are three problems that can occur when you work with the data:

  1. INSERT ANOMALY: When it is impossible to insert certain types of data into the database.
  2. DELETE ANOMALY: Unintended loss of additional data, that you wanted to preserve.
  3. UPDATE ANOMALY: Updating the value of a column leads to database inconsistencies (different rows on the table have different values).

To address these problems, you go through the process of normalization. Database normalization means to increase the number of tables in the database, while decreasing the amount of data stored in each table.

First Normal Form (1NF)

A database is in first normal form if it:

  • Contains only atomic values (Each table cell should contain a single value)
  • Contains no repeating groups (Each record needs to be unique)

An atomic value is a value that cannot be divided. A repeating group means that a table contains two or more columns that are closely related.

 

Second Normal Form (2NF)

A database is in second normal form if it:

  • Is in first normal form
  • All non-key attributes are fully functional dependent on the primary key (Single Column Primary Key)

In a table, if attribute B is functionally dependent on A, but is not functionally dependent on a proper subset of A, then B is considered fully functional dependent on A. If the primary key is not a composite key, all non-key attributes are always fully functional dependent on the primary key.

Third Normal Form (3NF)

A database is in third normal form if it:

  • Is in second normal form
  • There is no transitive functional dependency

Transitive functional dependency means that the database has the following relationships in the table: A is functionally dependent on B, and B is functionally dependent on C. In this case, C is transitively dependent on A via B.

Keys

A key is a value used to identify a record in a table uniquely. A key can be a single column or combination of multiple columns (composite key). Columns in a table that are not used to identify a record uniquely are called non-key columns.

Primary Key

A primary key is a single column value used to identify a database record uniquely. It has following attributes:

  • A primary key cannot be NULL
  • A primary key value must be unique
  • The primary key values cannot be changed
  • The primary key must be given a value when a new record is inserted

Composite Key

A composite key is a primary key composed of multiple columns used to identify a record uniquely.

Foreign Key

Foreign Key references the primary key of another table. It helps to connect database tables. A foreign key can have a different name from its primary key. It ensures rows in one table have corresponding rows in another.

Unlike the primary key, they do not have to be unique. Foreign keys can be null even though primary keys can not. Use of foreign keys help in maintaining referential integrity.