Skip to content

B4, Normalization etc

How to transform an EERD to Database

  1. EERD already ensures that M:N relationships are resolved.
ERD Database
Entity Type Relation/Table
Entity Tuple/Row
Attribute Attribute/Columns
relationships Foreign Keys referencing Primary Keys
Primary Key Primary Keys

Where to place Foregin Keys in ERDs

If the relationship of tables A:B is

1:M

Place it on the M side

This ensures that each row in B has exactly one reference to a corresponding row in A

1:M self referencing

1:1

IF relationship is identifying

  • Place it in the weaker entity

IF one entity is optional

  • Place it in the optional entity

IF both entities are mandatory or optional

  • Place it in either entity

Relational Heading Format

It is a way to define the schema of a table to be implemented

Syntax

  1. <relation>(single-valued Columns, {multi-valued columns})
  2. Solid Underscore the PK and dotted underline for the FK.

e.g.


Product(Name, Description)

Customer(Email, Name, DOB)

Order(Email, ProdName, quantity)


Normalization: Proper Form Relation

Normalization -> improve the quality of a relational database that transforms the database into a structured form that reduce redundancies and maintains consistency

0NF (Un-normalized Form)

Happens When:

  1. There are repeating groups

This happens when we store multiple values of the same type across multiple columns

e.g.

Product1 Product2 Product3
P1 P2 P3
P2 P4 NULL

This is bad since they are storing entries as attributes.

The columns are repeating instances of the same attribute 'Product'.


  1. There are multi-valued cells

This happens when there are multiple values in a single cell (comma separated, json etc)

e.g.

StudentName Courses
Jovan Math,English,Science
Ben Geography,English
Euzin Alchohol Rehabilitation

Each student takes multiple course, they are stored under a single Courses attribute.

This creates the following Problems

  • Multi-valued cells/repeating groups violate relation properties
  • There is no Primary Key Defined

Representing in Relational Heading Format

Given the following example; a table of products, customer and suppliers:

First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 3 47.70
HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60

Customer_Product_Supplier(First_name, Email, {Prod_Code, Prod_Desc, Unit_px, Supplier_ID, Supplier_Name, Qty, Ttl_Amt})

1NF (First Normal Form)

How to transform:

  1. Form new rows so that each cell is single valued
  2. Fill in empty cells
  3. Determine the PK

e.g.

We will use the Customer Product Supplier table for the rest of the examples

  1. Form new rows (it looks like the 0NF, since we cannot make true 'multi-valued cells' in markdown)
First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 3 47.70
HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60
  1. Fill in the empty cells
First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 3 47.70
David DavidL@gmail.com HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
David DavidL@gmail.com RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
Linda LSoh@hotmail.com HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60
  1. Determine the PK

We can see that Email , Prod_Code is a valid primary key

First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 3 47.70
David DavidL@gmail.com HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
David DavidL@gmail.com RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
Linda LSoh@hotmail.com HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60

Final 1NF Table

  1. No multi-valued cells or repeating groups
  2. PK is identified
First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 3 47.70
David DavidL@gmail.com HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
David DavidL@gmail.com RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
Linda LSoh@hotmail.com HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60

Relational Heading Format

We can simply just remove the {} inside from our 0NF Relational Heading Format


Customer_Product_Supplier(First_name, Email, Prod_Code, Prod_Desc, Unit_px, Supplier_ID, Supplier_Name, Qty, Ttl_Amt)

Anomalies

Insert

Let us say that a new customer named Rachel wants to sign up in our store.

BUT since she hasn't bought a product yet, her Prod_Code value is NULL.

First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 3 47.70
David DavidL@gmail.com HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
David DavidL@gmail.com RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
Linda LSoh@hotmail.com HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60
Rachel Rachel@gmail.com ? ? ? ? ? ? ?

Since the Prod_Code is part of the PK, she cannot be inserted as it violates the Entity Integrity Rule

Update

Lets say that we want to change the name of our supplier from King Dress Pte Ltd to King Kong Dress Pte Ltd.

First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Kong Dress Pte Ltd 3 47.70
David DavidL@gmail.com HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
David DavidL@gmail.com RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
Linda LSoh@hotmail.com HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60

As you can see, changing a single entry is not enough. We need to change multiple entries for 'Fully Update' the Name.

This is very inefficient when out table gets very big.

Issue: Data can become inconsistent and redundant if the update is not done properly

Delete

Lets say that Linda wants to no longer be a member in our store

First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Kong Dress Pte Ltd 3 47.70
David DavidL@gmail.com HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
David DavidL@gmail.com RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40

However by deleting Linda's rows, we are actually removing products like HG6159 that is only bought by Linda

First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
Linda LSoh@hotmail.com HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60
Solutions
  • We need to store customer and product information separately
  • Changing on table will not affect the other
  • Functional Dependencies will be used to formalize the process (used in 2NF)

What are functional dependencies?

A--->B or B=f(A)

In simpler terms: If we know the value of A, we can find the value of B

We say that B is functionally depending on A

e.g 1

With someone's social security number, we can find out their name

Therefore, name is functionally depending on social security number

e.g. 2

Supplier_Id > Supplier_Name, Supplier_Phone, Supplier_Address

Product_Id > Product_Category, Product_desciption

// Example of one columns is FD on multiple other columns
Email, ProductCode > Qty

2NF (Second Normal Form)

  1. All non-key attributes FULLY DEPEND on the whole of its primary key.

Earlier we could see that one attributes only depended on part of the primary key

We can take those attributes and split them up based on the part of the primary key

How to transform: 1. Identify keys, and all of their combinations 2. Identify functional dependencies on the keys and combinations 3. Create relations for each key and combination if there are functional dependencies

e.g.

Consider the 1NF table we created earlier

First_name Email Prod_Code Prod_Desc Unit_px Supplier_ID Supplier_Name Qty Ttl_Amt
David DavidL@gmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 3 47.70
David DavidL@gmail.com HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd 2 39.60
David DavidL@gmail.com RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd 4 74.40
Linda LSoh@hotmail.com HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd 1 15.90
Linda LSoh@hotmail.com HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd 4 61.60
  1. Identify keys and combinations

Here we see that we can have the following keys:

  • Email
  • Prod_code
  • Email,Prod_code

  • Identify functional dependencies on the keys and combinations

Keys Columns Depending on this key
Email First_Name
Prod_Code Prod_Desc, Unit_Px, Supplier_ID, Supplier_Name
Email,Prod_Code Qty,Ttl_Amt
  1. Create relations for each key combinations

From out functional dependencies we can create 3 tables

Functional Dependencies Relation
Email -> First_Name Customer (Email, First_Name)
Prod_Code -> Prod_Desc, Unit_Px, Supplier_ID, Supplier_Name Product (Prod_Code, Prod_Desc, Unit_Px, Supplier_ID, Supplier_Name)
Email, Prod_Code -> Qty, Ttl_Amt Customer_Product_Supplier (Email, Prod_Code, Qty, Ttl_Amt)

Note that Email and Prod_Code are foreign keys referencing 2 other tables, therefore they should be dotted_underline

They are also its own primary key, so they should be underlined with a solid line as well.

Final Created Tables

Customer Table

Email First_Name
DavidL@gmail.com David
LSoh@hotmail.com Linda

Product Table

Prod_Code Prod_Desc Unit_Px Supplier_ID Supplier_Name
HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd
HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd
RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd
HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd

Customer_Product_Supplier Table

Email Prod_Code Qty Ttl_Amt
DavidL@gmail.com HG7160 3 47.70
DavidL@gmail.com HG9298 2 39.60
DavidL@gmail.com RQ0207 4 74.40
LSoh@hotmail.com HG7160 1 15.90
LSoh@hotmail.com HG6159 4 61.60

Potential Issues

  • There can still be redundancies in 2NF
  • These are in the form of transitive dependencies

Transitive Dependencies:

When a functional dependency exists within the non-key attributes in a relation.

e.g.

Prod_Code > Supplier_ID > Supplier_Name 
  • Similar anomalies as 1NF will occur when supplier_id/name is updated/deleted

3NF (Third Normal Form)

  • When it is already in 2NF AND there are no functional dependencies among non-key attributes

How to transform: 1. Separate Functional Dependency among non-key attributes 2. Create new relations using the separated Functional Dependencies

e.g.

From the above example, only the Product table has transitive dependencies

Prod_Code Prod_Desc Unit_Px Supplier_ID Supplier_Name
HG7160 Sale Dress White 15.90 S1001 King Dress Pte Ltd
HG9298 Sale Top + Skirt Red 19.80 S1001 King Dress Pte Ltd
RQ0207 Dress White 18.60 S1002 Ladies Green Pte Ltd
HG6159 Sale Dress Pink 15.40 S1003 Best Clothing Pte Ltd

Here we can see that supplier name depends on supplier id

Supplier_Id > Supplier_Name

Once we take this out, we can create a new table

Product

Prod_Code Prod_Desc Unit_Px
HG7160 Sale Dress White 15.90
HG9298 Sale Top + Skirt Red 19.80
RQ0207 Dress White 18.60
HG6159 Sale Dress Pink 15.40

Supplier

Supplier_ID Supplier_Name
S1001 King Dress Pte Ltd
S1002 Ladies Green Pte Ltd
S1003 Best Clothing Pte Ltd