Normalization in SQL: 1NF, 2NF, 3NF & BCNF
Updated on Jul 03, 2023 | 10 min read | 7.8k views
Share:
For working professionals
For fresh graduates
More
Updated on Jul 03, 2023 | 10 min read | 7.8k views
Share:
Table of Contents
Normalization is a systematic process of ensuring that a relational database model is efficient, suitable for general-purpose querying and free of undesirable characteristics such as insertion, update, and deletion anomalies, leading to losing the integrity of the data. This normalization process also helps to eliminate data redundancy and reduces the chances of inconsistency after any insert, update, or delete operations.
Normalization is a process in database design that organizes data into logical and efficient structures. It ensures that the data is stored to reduce redundancy and minimize data anomalies, such as update, insert, and deletion anomalies. SQL/Structured Query Language, is a popular language used to manage and manipulate databases. Normalization in SQL server is a way of organizing data stored in tables to optimize the efficiency and accuracy of queries.
Normalization involves breaking data into its smallest logical units and creating relationships between them. This allows for reduced duplication and faster query performance when retrieving or manipulating data. It even helps ensure the integrity of the database by ensuring that related fields are not stored together in one table. For example, if an address is included in multiple columns within a single table, it can lead to problems if that address needs to be updated. All entries associated with the old address must be correctly identified and updated. With normalization, however, each part of the address (street name, city, etc.) is stored in its table, making it easier to update and manage.
Normalization in SQL server can also help reduce data storage costs, as redundant data is eliminated. With fewer tables to maintain, the database remains better organized and more efficient. An example of normalization in SQL with an example would be a table that stores customer information like name, address, phone number and email address. By applying the principles of normalization, this table could be broken down into three separate tables – one for names, one for addresses and one for contact details – eliminating any redundancy or duplication. This makes querying the database faster and reduces the risk of updating errors due to incorrect relationships between fields. Understanding how normalization works in SQL is essential for creating efficient databases that perform optimally when retrieving data.
For a better understanding, consider the following schema: Student (Name, Address, Subject, Grade)
Check out our free courses to get an edge over the competition.
There are a few problems or inefficiencies in this schema.
1) Redundancy: The student’s Address is repeated for each subject he is registered for.
2) Updating anomaly: We may have updated the Address in one tuple (row) while leaving it unchanged in the other rows. Thus we would not have a consistently unique address for each student.
3) Insertion Anomaly: We will not record a student’s Address without registering for at least one Subject. Similarly, when a student wants to enrol for a new Subject, it’s possible that a different Address to be inserted.
4) Deletion Anomaly: If a student decides to discontinue all the enrolled subjects, then the student’s address will also be lost in the process of deletion.
Thus, it is important to represent the user data by relations that do not create anomalies following tuple add, delete, or update operations. This can only be achieved by a careful analysis of the integrity constraints, especially the database’s data dependencies.
The relations should be designed so that only those attributes are grouped that exist naturally together. This can mostly be done by a basic understanding of the meaning of all data attributes. However, we still need some formal measure to ensure our design goal.
Check out upGrad’s Java Bootcamp
Normalization is that formal measure. It answers the question of why a particular grouping of attributes will be better than any other.
Seven normal forms exist as of today:
Read: Types of Views in SQL
Check out upGrad’s Full Stack Development Bootcamp (JS/MERN)
Let’s take an example of a schema that is not normalized. Suppose a designer wishes to record the names and telephone numbers of customers. They define a customer table as shown:
Customer ID | First Name | Surname | Telephone Numbers |
123 | Bimal | Saha | 555-861-2025 |
456 | Kapil | Khanna | 555-403-1659, 555-776-4100 |
789 | Kabita | Roy | 555-808-9633 |
Here, it is not in 1 NF. The Telephone Numbers column is not atomic or doesn’t have a scalar value, i.e. it has had more than one value, which can’t be allowed in 1 NF.
Customer ID | First Name | Surname |
123 | Bimal | Saha |
456 | Kapil | Khanna |
789 | Kabita | Roy |
Customer ID | Telephone Numbers |
123 | 555-861-2025 |
456 | 555-403-1659 |
456 | 555-776-4100 |
789 | 555-808-9633 |
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number link appears on its own record.
Checkout: Most Common SQL Interview Questions & Answers
Each normal form has more constraining criteria than its predecessor. So any table that is in second normal form (2NF) or higher is, by definition, also in 1NF. On the other hand, a table that is in 1NF may or may not be in 2NF; if it is in 2NF, it may or may not be in 3NF, and so on.
A 1NF table is said to be in 2NF if and only if none of its nonprime attributes is functionally dependent on a part (proper subset) of a candidate key. (A nonprime attribute does not belong to any candidate key.)
Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.
Overall, normalization in SQL server is a vital part of creating an efficient and accurate database. By understanding how normalization works and applying it correctly, developers can ensure that their databases perform optimally and remain organized with minimal effort. Normalization makes querying the database faster, reduces data storage costs, and helps maintain the system’s integrity – all important considerations for any business organization or application developer.
upGrad’s Exclusive Software Development Webinar for you –
SAAS Business – What is So Different?
BC ? D is in 2nd normal form because BC is not a proper subset of candidate key AC,
AC ? BE is in 2nd normal form as AC itself is the candidate key, and
B ? E is in 2nd normal form B is not a proper subset of candidate key AC.
Thus the given relation R is in the 2nd Normal Form.
A table is said to be in 3NF if and only if for each of its functional dependencies.
X → A, at least one of the following conditions holds:
Another definition of 3NFstates that every non-key attribute of R is non-transitively dependent (i.e. directly dependent) on the primary key of R. This means no nonprime attribute (not part of candidate key) is functionally dependent on other nonprime attributes. If there are two dependencies such that A ? B and BC, then from these FDs, we may derive A ? C. This dependence A-C is transitive.
Consider the following relation Order (Order#, Part, Supplier, UnitPrice, QtyOrdered) with the given set of FDs:
Order# ? Part, Supplier, QtyOrdered and Supplier, Part ? UnitPrice)
Here Order# is key to the relation.
Using Amstrong’s axioms, we get
Order# ? Part, Order ? Supplier, and Order ? QtyOrdered.
Order# ? Part, Supplier and Supplier, Part ? Unit Price, both give Order# ? UnitPrice.
Thus, we see that all nonprime attributes are depending on the key (Order#). However, there exists a transitive dependency between Order# and UnitPrice. So this relation is not in 3NF. How do we make it in 3NF?
We cannot store the UnitPrice of any Part supplied by any Supplier unless someone places an order for that Part. So we will have to decompose the table to make it follow 3NF as follows.
Order (Order#, Part, Supplier, QtyOrdered) and Price Master (Part, Supplier, UnitPrice).
Now there are no transitive dependencies present. The relation is in 3NF.
Also Read: SQL for Data Science
Learn Software Courses online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.
There’s more to normalization, like BCNF, 4NF, 5NF and 6NF. In short, BCNF is nothing but an extension of 3NF, as the last rule of 3NF doesn’t apply here. All functional dependencies need to have the key attributes on the left and none on the right-hand side. (BCNF is also called 3.5NF). However, normal forms from 4NF and beyond are scarcely implemented in regular practice.
If you’re interested to learn more about full-stack development, check out upGrad & IIIT-B’s Executive PG Program in Full-stack Software Development, which is designed for working professionals and offers 500+ hours of rigorous training, 9+ projects, and assignments, IIIT-B Alumni status, practical hands-on capstone projects & job assistance with top firms.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
India’s #1 Tech University
Executive PG Certification in AI-Powered Full Stack Development
77%
seats filled
Top Resources