Data Modeling is an important skill for any data worker to understand. Data Engineers in particular should understand this topic and be able to apply it to their work. Normalization is an important part of Data Modeling and part one can be found here.
For this post, we will be going over the first part of the Objective section of the Database Normalization Wikipedia page.
pt 2: Objectives - this post
Objective.0
A basic objective of the first normal form defined by Codd in 1970 was to permit data to be queried and manipulated using a "universal data sub-language" grounded in first-order logic.[1] (SQL is an example of such a data sub-language, albeit one that Codd regarded as seriously flawed.[2])
Codd is this person who made the relational database model for relational databases. Interestingly, Codd was not a big fan of SQL. Despite his lack of enthusiasm, it’s the most common real-world implementation of his relational model for databases.
Objective.1
The objectives of normalization beyond 1NF (first normal form) were stated as follows by Codd:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies.
Insertion, update, and deletion on a database are complicated when a database isn’t normalized. The reason is that de-normalized data is not made into discrete parts. For example, if I make a change of street address, but not city or state. So it is much easier to update my address if the street address has its own column. Rather than having a single column for these discrete parts of an address; street, city, state, zip, etc.
Normal Forms allow for easier Data Manipulation Language so that you don’t have to do many operations for simple insertions, updates, or deletes.
From the following informative article:
Examples of DML:
INSERT – is used to insert data into a table.
UPDATE – is used to update existing data within a table.
DELETE – is used to delete records from a database table.
Objective.2
2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs.
Increasing Normal Forms on your database allows you to insert data without having to consider new columns or tables, since a well normalized database has been split up so much, it would cover any new cases. However, this isn’t necessarily true if your business is adding completely new functions of business. A highly normalized database may become robust, but making big changes from a strict structure makes the database come around to a lack of flexibility to change.
Objective.3
3. To make the relational model more informative to users.
This one is a little debatable. As, you increase Normal Forms past 3rd or so, the database objects (schema, tables, columns) start to get split up a little too much. Queries get huge with a bunch of joins, because you have so many tables. On top of that, query results start to look like attack of the Primary and Foreign keys.
Objective.4
4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.
— E.F. Codd, "Further Normalization of the Data Base Relational Model"[3]
A common structure for queries allows for the queries to remain the same, even as the contents or data, changes over time. This allows us to automate and schedule queries for reports so that we don’t have to worry about changing them just because we have some more data over time. Theoretically.
Finale
I hope the point is made that all of these arguments are give and take. There is such a thing as “Too much of a good thing” (normalization), which brings rigidity to our database in the pursuit of robustness or data integrity. Thankfully, I don’t have to tackle these kinds of problems because I don’t have to synthesize (or create) a Relational Database model from scratch. But the above theory does help when we have to create a relational data warehouse so that we can make arguments for/against normalization given context.
Thanks for reading!