1. Home
SQL

SQL Tutorial: Learn Structured Query Language Basics

Learn all SQL tutorial concepts in detail and master your career today.

  • 59
  • 9 Hours
right-top-arrow

Tutorial Playlist

46 Lessons
27

Understanding Normalization in SQL: A Comprehensive Guide

Updated on 12/06/202441 Views

Introduction

Normalization is a fundamental concept in SQL and database management. Understanding this is essential for database administrators and developers to design robust, scalable, and maintainable database systems. In this tutorial, we will delve into the intricacies of Normalization in SQL, exploring its significance, principles, and various normal forms such as 1NF, 2NF, 3NF, and BCNF.

Overview

Normalization in SQL is a crucial aspect of database design aimed at structuring data optimally to enhance efficiency and maintain data integrity. It involves breaking down large, complex tables into smaller, more manageable entities and establishing relationships between them. By adhering to specific rules and normal forms, such as 1NF, 2NF, 3NF, and BCNF, normalization helps eliminate data redundancy and minimize dependency, thus preventing anomalies and ensuring consistency in database operations. This overview provides a glimpse into the significance of normalization in SQL, setting the stage for a deeper exploration of its principles and applications.

What is Normalization?

Normalization in SQL is a process used to organize a database efficiently by reducing redundancy and dependency. It involves breaking down large tables into smaller, more manageable entities and establishing relationships between them. This ensures data integrity, minimizes data duplication, and improves database performance.Explanation with Examples:Consider a simple example of a database table storing information about students and their courses:

| Student_ID | Student_Name | Course_ID | Course_Name |

|------------------|---------------------|----------------|--------------------|

| 1                   | Alice                | 101             | Math              |

| 1                   | Alice                | 102             | Science         |

| 1                   | Alice                | 103             | English          |

| 2                   | Bob                 | 101             | Math              |

| 2                   | Bob                 | 104             | History           |

In this table, we can see that there is redundancy in the "Student_Name" column. For example, "Alice" appears multiple times, each time associated with different courses. This redundancy can lead to data inconsistency and makes the database more prone to errors.To normalize this table, we can split it into two separate tables: one for students and another for courses.

Before Normalization:

Students Table:

| Student_ID | Student_Name |

|-----------------|---------------------|

| 1                 | Alice                 |

| 2                 | Bob                  |

Courses Table:

| Course_ID | Course_Name |

|----------------|---------------------|

| 101            | Math                |

| 102            | Science           |

| 103            | English            |

| 104            | History             |

After Normalization:

Enrollment Table:

| Student_ID | Course_ID |

|----------------|-----------------|

| 1                | 101              |

| 1                | 102              |

| 1                | 103              |

| 2                | 101              |

| 2                | 104              |

In the normalized structure, the "Students" table contains unique student information, and the "Courses" table contains unique course information. The "Enrollment" table establishes a many-to-many relationship between students and courses, eliminating redundancy.

What is the need for a Normalization?

Normalization in SQL is essential for several reasons, aiming to improve database efficiency, reduce redundancy, and ensure data integrity. Let's explore the need for normalization with examples:

1. Elimination of Redundancy:

Consider a database table storing customer information, including their addresses:

| Customer_ID | Customer_Name | Street_Address |      City      | State | Zip_Code |

|-------------------|-------------------------|----------------------|---------------|---------|--------------|

| 1                    | John Doe             | 123 Main St       | Anytown   | NY     | 12345     |

| 2                    | Jane Smith          | 456 Elm St         | Otherville | NY     | 12345     |

In this table, both John Doe and Jane Smith have the same zip code. This redundancy wastes storage space and increases the risk of data inconsistency. By normalizing the table and storing zip codes in a separate table, we can eliminate redundancy:

Before Normalization:

| Zip_Code | City           | State |

|--------------|-------------- |---------|

| 12345       |  Anytown  | NY    |

| 12345       | Otherville | NY    |


After Normalization:

| Zip_Code |   City       | State |

|--------------|--------------|---------|

| 12345      | Anytown   | NY    |

| 12346      | Otherville | NY    |

2. Minimization of Dependency:

In a denormalized table, changes to one piece of data can lead to inconsistencies elsewhere. For example, if a student changes their address, updating it in a denormalized table would require changing multiple records. Normalization reduces dependency by separating data into logical units.

Before Normalization:

| Student_ID | Student_Name | Address           | City       | State | Zip_Code |

|------------------|---------------------|--------------------|------------|---------|--------------|

| 1                   | John Doe        | 123 Main St    | Anytown| NY     | 12345      |

| 2                   | Jane Smith      | 456 Elm St     | Otherville| NY    | 12346     |

After Normalization:

Students Table:

| Student_ID | Student_Name |

|----------------|----------------------|

| 1                | John Doe          |

| 2                | Jane Smith       |

Addresses Table:

| Address_ID | Student_ID | Street_Address |      City          | State | Zip_Code |

|-----------------|-----------------|----------------------|-------------------|----------|-------------|

| 1                 | 1                  | 123 Main St      | Anytown        | NY      | 12345    |

| 2                 | 2                  | 456 Elm St       | Otherville       | NY      | 12346    |\

Normalization minimizes dependency and simplifies data updates by separating student information from address details.

Types of Normalization:

Normalization in SQL is divided into several normal forms, each with its own rules to reduce redundancy and dependency in database tables. Let's explore the common types of normalization with examples:

1. First Normal Form (1NF):

1NF requires that each column in a table contain atomic values, meaning each value cannot be further divided. It also prohibits repeating groups or arrays within a single row.

Example: Consider a table storing employee information with multiple phone numbers in a single column:

| Employee_ID | Employee_Name  |             Phone_Numbers          |

|-------------------|--------------------------|------------------------------------------|

| 1                   | John Doe                | 123-456-7890, 9876543210     |

| 2                   | Jane Smith              | 555-555-5555, 9998887777    |

To convert this table into 1NF, we split the phone numbers into separate rows:

Before Normalization:

| Employee_ID | Employee_Name  |             Phone_Numbers          |

|-------------------|--------------------------|------------------------------------------|

| 1                   | John Doe                | 123-456-7890, 9876543210     |

| 2                   | Jane Smith              | 555-555-5555, 9998887777    |

After Normalization (1NF):

| Employee_ID | Employee_Name | Phone_Number   |

|--------------------|------------------------|-------------------------|

| 1                     | John Doe            | 123-456-7890       |

| 1                     | John Doe            | 9876543210         |

| 2                     | Jane Smith         | 555-555-5555       |

| 2                     | Jane Smith         | 9998887777         |

2. Second Normal Form (2NF):

2NF builds upon 1NF and requires that every non-key attribute be fully functionally dependent on the primary key.

Example: Consider a table storing orders with order details:

| Order_ID | Customer_ID | Product_ID | Quantity | Product_Name | Unit_Price |

|--------------|-------------------|-----------------|--------------|---------------------|---------------|

| 1              | 101               | 201              | 2              | Laptop             | 1000         |

| 2              | 102               | 202              | 3              | Smartphone     | 500          |

In this table, "Product_Name" and "Unit_Price" are not fully dependent on the primary key (Order_ID). To achieve 2NF, we split the table into two:

Before Normalization:

| Order_ID | Customer_ID | Product_ID | Quantity | Product_Name | Unit_Price |

|--------------|-------------------|-----------------|--------------|---------------------|---------------|

| 1              | 101               | 201              | 2              | Laptop             | 1000         |

| 2              | 102               | 202              | 3              | Smartphone     | 500          |

After Normalization (2NF):

Orders Table:

| Order_ID | Customer_ID | Product_ID | Quantity |

|--------------|--------------------|----------------|-------------|

| 1              | 101                 | 201            | 2            |

| 2              | 102                 | 202            | 3            |

Products Table:

| Product_ID | Product_Name | Unit_Price |

|-----------------|---------------------|----------------|

| 201             | Laptop              | 1000          |

| 202             | Smartphone     | 500            |

3. Third Normal Form (3NF):

Third Normal Form (3NF) is a level of database normalization that builds upon the principles of First Normal Form (1NF) and Second Normal Form (2NF). It ensures that every non-key attribute in a table is non-transitively dependent on the primary key. In simpler terms, it eliminates any dependencies between non-key attributes.Example:Consider a table storing information about employees and their departments:

| Employee_ID | Employee_Name | Department_ID | Department_Name | Manager_ID |

|-------------------|-------------------------|----------------------|---------------------------|-----------------|

| 1                    | John Doe             | 101                    | IT                            | 5                 |

| 2                    | Jane Smith          | 102                    | Marketing                | 6                 |

| 3                    | Alice Johnson      | 101                    | IT                            | 5                 |

In this table, both "Department_Name" and "Manager_ID" are functionally dependent on "Department_ID", which is the primary key. However, "Manager_ID" is also functionally dependent on "Employee_ID", creating a transitive dependency.To achieve 3NF, we split the table into two:

Before Normalization:

| Employee_ID | Employee_Name | Department_ID | Department_Name | Manager_ID |

|-------------------|-------------------------|----------------------|---------------------------|-----------------|

| 1                    | John Doe             | 101                    | IT                            | 5                 |

| 2                    | Jane Smith          | 102                    | Marketing                | 6                 |

| 3                    | Alice Johnson      | 101                    | IT                            | 5                 |

After Normalization (3NF):

Employees Table:

| Employee_ID | Employee_Name | Department_ID | Manager_ID |

|-------------------|-------------------------|----------------------|------------------|

| 1                    | John Doe             | 101                    | 5                  |

| 2                    | Jane Smith          | 102                    | 6                  |

| 3                    | Alice Johnson      | 101                    | 5                  |

Departments Table:

| Department_ID | Department_Name | Manager_ID |

|----------------------|---------------------------|-----------------|

| 101                   | IT                            | 5                 |

| 102                   | Marketing                | 6                 |

In the normalized structure, each table represents a distinct entity with no transitive dependencies. The Employees table contains employee details, while the Departments table stores department information.

4. Boyce-Codd Normal Form (BCNF):

Boyce-Codd Normal Form (BCNF) is a stricter level of normalization compared to the Third Normal Form (3NF). It ensures that every determinant of a functional dependency is a candidate key, meaning no non-prime attribute is functionally dependent on another non-prime attribute. BCNF eliminates all non-trivial functional dependencies.Example:

Consider a table representing a library system where borrowers check books out:

| Borrower_ID | Book_ID | Borrower_Name |      Book_Title    | Return_Date |

|------------------|-------------|-------------------------|---------------------|-------------------|

| 1                   | 101        | John Doe             | SQL Basics      | 2024-03-15    |

| 2                   | 102        | Jane Smith           | Python Guide  | 2024-03-20    |

| 3                   | 101        | Alice Johnson       | SQL Basics     | 2024-03-18    |

In this table, both "Book_Title" and "Return_Date" are functionally dependent on "Book_ID". However, "Return_Date" is also functionally dependent on "Borrower_ID", creating a non-trivial functional dependency.To achieve BCNF, we need to decompose the table:

Before Normalization:

| Borrower_ID | Book_ID | Borrower_Name |      Book_Title    | Return_Date |

|------------------|-------------|-------------------------|---------------------|-------------------|

| 1                   | 101        | John Doe             | SQL Basics      | 2024-03-15    |

| 2                   | 102        | Jane Smith           | Python Guide  | 2024-03-20    |

| 3                   | 101        | Alice Johnson       | SQL Basics     | 2024-03-18    |

After Normalization (BCNF):

Borrowers Table:

| Borrower_ID | Borrower_Name |

|------------------|-----------------------|

| 1                  | John Doe            |

| 2                  | Jane Smith         |

| 3                  | Alice Johnson     |

Books Table:

| Book_ID |    Book_Title    | Return_Date |

|-------------|---------------------|-------------------|

| 101        | SQL Basics      | 2024-03-15   |

| 102        | Python Guide   | 2024-03-20   |

Borrowings Table:

| Borrower_ID | Book_ID |

|-------------------|------------ |

| 1                    | 101        |

| 2                    | 102        |

| 3                    | 101        |

In the normalized structure, each table represents a distinct entity, and there are no non-trivial functional dependencies. 

Benefits of Normalization:

1. The Database does not have redundant data, so it is smaller in size, so less money will be spent on storage.

2. Because there is less data to search, it is much faster to run a query.

3. Because there is no data duplication, there is better data integrity and less risk of mistakes.

4. Because of the lack of duplication of data, there is less chance of storing two different copies of the data.

5. One change can be made, which will be instantly cascaded across related records.

Data Normalization in SQL:

I have explained this topic above in ‘Types of Normalization’.

Normalization in SQL Server (or SQL server database normalization or Database Normalization in SQL):

Normalization in SQL Server follows the same principles as normalization in SQL databases in general.

Conclusion:

Normalization in SQL is a fundamental process for organizing and structuring data in relational databases. It ensures data integrity by eliminating anomalies and inconsistencies, thereby improving the overall quality of the database. It also enhances query performance and simplifies data management tasks. It is essential for creating well-organized databases that can efficiently handle data storage, retrieval, and manipulation, ultimately contributing to the success and effectiveness of database-driven applications.

FAQs:

Q. Which normalization is best?

A. The "best" normalization level depends on the specific requirements of the database. However, the Third Normal Form (3NF) is commonly sufficient for most databases, balancing efficiency and data integrity.

Q. What is denormalization in SQL?

A. It involves intentionally introducing redundancy into a database to improve query performance by reducing the need for joins and data retrieval complexity.

Amit Chandra

Amit Chandra

Amit Chandra, PMP, SCPM, is a program and product management professional with over 15 years of experience in publishing, EDA and Insurance domai… Read More

Get Free Career Counselling
form image
+91
*
By clicking, I accept theT&Cand
Privacy Policy
image
Join 10M+ Learners & Transform Your Career
Learn on a personalised AI-powered platform that offers best-in-class content, live sessions & mentorship from leading industry experts.
right-top-arrowleft-top-arrow

upGrad Learner Support

Talk to our experts. We’re available 24/7.

text

Indian Nationals

1800 210 2020

text

Foreign Nationals

+918045604032

Disclaimer

upGrad does not grant credit; credits are granted, accepted or transferred at the sole discretion of the relevant educational institution offering the diploma or degree. We advise you to enquire further regarding the suitability of this program for your academic, professional requirements and job prospects before enr...