B4, Normalization etc
How to transform an EERD to Database
- EERD already ensures that M:N relationships are resolved.
ERD |
Database |
|---|---|
| Entity Type | Relation/Table |
| Entity | Tuple/Row |
| Attribute | Attribute/Columns |
| relationships | Foreign Keys referencing Primary Keys |
| Primary Key | Primary Keys |
Where to place Foregin Keys in ERDs
If the relationship of tables A:B is
1:M
Place it on the M side
This ensures that each row in B has exactly one reference to a corresponding row in A
1:M self referencing
1:1
IF relationship is identifying
- Place it in the weaker entity
IF one entity is optional
- Place it in the optional entity
IF both entities are mandatory or optional
- Place it in either entity
Relational Heading Format
It is a way to define the schema of a table to be implemented
Syntax
- <relation>(single-valued Columns, {multi-valued columns})
- Solid Underscore the PK and dotted underline for the FK.
e.g.
Product(Name, Description)
Customer(Email, Name, DOB)
Order(Email, ProdName, quantity)
Normalization: Proper Form Relation
Normalization -> improve the quality of a relational database that transforms the database into a structured form that reduce redundancies and maintains consistency
0NF (Un-normalized Form)
Happens When:
- There are repeating groups
This happens when we store multiple values of the same type across multiple columns
e.g.
| Product1 | Product2 | Product3 |
|---|---|---|
| P1 | P2 | P3 |
| P2 | P4 | NULL |
This is bad since they are storing entries as attributes.
The columns are repeating instances of the same attribute 'Product'.
- There are multi-valued cells
This happens when there are multiple values in a single cell (comma separated, json etc)
e.g.
| StudentName | Courses |
|---|---|
| Jovan | Math,English,Science |
| Ben | Geography,English |
| Euzin | Alchohol Rehabilitation |
Each student takes multiple course, they are stored under a single Courses attribute.
This creates the following Problems
- Multi-valued cells/repeating groups violate relation properties
- There is no Primary Key Defined
Representing in Relational Heading Format
Given the following example; a table of products, customer and suppliers:
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 3 | 47.70 |
| HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 | ||
| RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 | ||
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
Customer_Product_Supplier(First_name, Email, {Prod_Code, Prod_Desc, Unit_px, Supplier_ID, Supplier_Name, Qty, Ttl_Amt})
1NF (First Normal Form)
How to transform:
- Form new rows so that each cell is single valued
- Fill in empty cells
- Determine the PK
e.g.
We will use the Customer Product Supplier table for the rest of the examples
- Form new rows (it looks like the 0NF, since we cannot make true 'multi-valued cells' in markdown)
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 3 | 47.70 |
| HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 | ||
| RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 | ||
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
- Fill in the empty cells
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 3 | 47.70 |
| David | DavidL@gmail.com | HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 |
| David | DavidL@gmail.com | RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 |
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| Linda | LSoh@hotmail.com | HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
- Determine the PK
We can see that Email , Prod_Code is a valid primary key
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 3 | 47.70 |
| David | DavidL@gmail.com | HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 |
| David | DavidL@gmail.com | RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 |
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| Linda | LSoh@hotmail.com | HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
Final 1NF Table
- No multi-valued cells or repeating groups
- PK is identified
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 3 | 47.70 |
| David | DavidL@gmail.com | HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 |
| David | DavidL@gmail.com | RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 |
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| Linda | LSoh@hotmail.com | HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
Relational Heading Format
We can simply just remove the {} inside from our 0NF Relational Heading Format
Customer_Product_Supplier(First_name, Email, Prod_Code, Prod_Desc, Unit_px, Supplier_ID, Supplier_Name, Qty, Ttl_Amt)
Anomalies
Insert
Let us say that a new customer named Rachel wants to sign up in our store.
BUT since she hasn't bought a product yet, her Prod_Code value is NULL.
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 3 | 47.70 |
| David | DavidL@gmail.com | HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 |
| David | DavidL@gmail.com | RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 |
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| Linda | LSoh@hotmail.com | HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
| Rachel | Rachel@gmail.com | ? | ? | ? | ? | ? | ? | ? |
Since the Prod_Code is part of the PK, she cannot be inserted as it violates the Entity Integrity Rule
Update
Lets say that we want to change the name of our supplier from King Dress Pte Ltd to King Kong Dress Pte Ltd.
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Kong Dress Pte Ltd | 3 | 47.70 |
| David | DavidL@gmail.com | HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 |
| David | DavidL@gmail.com | RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 |
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| Linda | LSoh@hotmail.com | HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
As you can see, changing a single entry is not enough. We need to change multiple entries for 'Fully Update' the Name.
This is very inefficient when out table gets very big.
Issue: Data can become inconsistent and redundant if the update is not done properly
Delete
Lets say that Linda wants to no longer be a member in our store
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Kong Dress Pte Ltd | 3 | 47.70 |
| David | DavidL@gmail.com | HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 |
| David | DavidL@gmail.com | RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 |
However by deleting Linda's rows, we are actually removing products like HG6159 that is only bought by Linda
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| Linda | LSoh@hotmail.com | HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
Solutions
- We need to store customer and product information separately
- Changing on table will not affect the other
- Functional Dependencies will be used to formalize the process (used in 2NF)
What are functional dependencies?
A--->B or B=f(A)
In simpler terms: If we know the value of A, we can find the value of B
We say that B is functionally depending on A
e.g 1
With someone's social security number, we can find out their name
Therefore, name is functionally depending on social security number
e.g. 2
Supplier_Id > Supplier_Name, Supplier_Phone, Supplier_Address
Product_Id > Product_Category, Product_desciption
// Example of one columns is FD on multiple other columns
Email, ProductCode > Qty
2NF (Second Normal Form)
- All non-key attributes FULLY DEPEND on the whole of its primary key.
Earlier we could see that one attributes only depended on part of the primary key
We can take those attributes and split them up based on the part of the primary key
How to transform: 1. Identify keys, and all of their combinations 2. Identify functional dependencies on the keys and combinations 3. Create relations for each key and combination if there are functional dependencies
e.g.
Consider the 1NF table we created earlier
| First_name | Prod_Code | Prod_Desc | Unit_px | Supplier_ID | Supplier_Name | Qty | Ttl_Amt | |
|---|---|---|---|---|---|---|---|---|
| David | DavidL@gmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 3 | 47.70 |
| David | DavidL@gmail.com | HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd | 2 | 39.60 |
| David | DavidL@gmail.com | RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd | 4 | 74.40 |
| Linda | LSoh@hotmail.com | HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd | 1 | 15.90 |
| Linda | LSoh@hotmail.com | HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd | 4 | 61.60 |
- Identify keys and combinations
Here we see that we can have the following keys:
- Prod_code
-
Email,Prod_code
-
Identify functional dependencies on the keys and combinations
| Keys | Columns Depending on this key |
|---|---|
| First_Name | |
| Prod_Code | Prod_Desc, Unit_Px, Supplier_ID, Supplier_Name |
| Email,Prod_Code | Qty,Ttl_Amt |
- Create relations for each key combinations
From out functional dependencies we can create 3 tables
| Functional Dependencies | Relation |
|---|---|
| Email -> First_Name | Customer (Email, First_Name) |
| Prod_Code -> Prod_Desc, Unit_Px, Supplier_ID, Supplier_Name | Product (Prod_Code, Prod_Desc, Unit_Px, Supplier_ID, Supplier_Name) |
| Email, Prod_Code -> Qty, Ttl_Amt | Customer_Product_Supplier (Email, Prod_Code, Qty, Ttl_Amt) |
Note that Email and Prod_Code are foreign keys referencing 2 other tables, therefore they should be dotted_underline
They are also its own primary key, so they should be underlined with a solid line as well.
Final Created Tables
Customer Table
| First_Name | |
|---|---|
| DavidL@gmail.com | David |
| LSoh@hotmail.com | Linda |
Product Table
| Prod_Code | Prod_Desc | Unit_Px | Supplier_ID | Supplier_Name |
|---|---|---|---|---|
| HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd |
| HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd |
| RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd |
| HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd |
Customer_Product_Supplier Table
| Prod_Code | Qty | Ttl_Amt | |
|---|---|---|---|
| DavidL@gmail.com | HG7160 | 3 | 47.70 |
| DavidL@gmail.com | HG9298 | 2 | 39.60 |
| DavidL@gmail.com | RQ0207 | 4 | 74.40 |
| LSoh@hotmail.com | HG7160 | 1 | 15.90 |
| LSoh@hotmail.com | HG6159 | 4 | 61.60 |
Potential Issues
- There can still be redundancies in 2NF
- These are in the form of transitive dependencies
Transitive Dependencies:
When a functional dependency exists within the non-key attributes in a relation.
e.g.
Prod_Code > Supplier_ID > Supplier_Name
- Similar anomalies as 1NF will occur when supplier_id/name is updated/deleted
3NF (Third Normal Form)
- When it is already in 2NF AND there are no functional dependencies among non-key attributes
How to transform: 1. Separate Functional Dependency among non-key attributes 2. Create new relations using the separated Functional Dependencies
e.g.
From the above example, only the Product table has transitive dependencies
| Prod_Code | Prod_Desc | Unit_Px | Supplier_ID | Supplier_Name |
|---|---|---|---|---|
| HG7160 | Sale Dress White | 15.90 | S1001 | King Dress Pte Ltd |
| HG9298 | Sale Top + Skirt Red | 19.80 | S1001 | King Dress Pte Ltd |
| RQ0207 | Dress White | 18.60 | S1002 | Ladies Green Pte Ltd |
| HG6159 | Sale Dress Pink | 15.40 | S1003 | Best Clothing Pte Ltd |
Here we can see that supplier name depends on supplier id
Supplier_Id > Supplier_Name
Once we take this out, we can create a new table
Product
| Prod_Code | Prod_Desc | Unit_Px |
|---|---|---|
| HG7160 | Sale Dress White | 15.90 |
| HG9298 | Sale Top + Skirt Red | 19.80 |
| RQ0207 | Dress White | 18.60 |
| HG6159 | Sale Dress Pink | 15.40 |
Supplier
| Supplier_ID | Supplier_Name |
|---|---|
| S1001 | King Dress Pte Ltd |
| S1002 | Ladies Green Pte Ltd |
| S1003 | Best Clothing Pte Ltd |