The Data Modeling Series pt.5/5
Digestible Data Modeling Blogging
Data Modeling is an important skill for any data worker to understand. Data Engineers in particular should understand this topic and be able to apply it to their work.
This post is a continuation of a data or dimensional modeling series. This series will have 5 parts. For pt.5 , we will go over the Benefits of dimensional modeling section in Wikipedia’s page on Dimensional Modeling.
Benefits of dimensional modeling
In part four of this series, we covered the tradeoffs between normalization and de-normalization. In this post, we will finish off with some arguments for dimensional/data modeling your data warehouse.
Understandability. Compared to the normalized model, the dimensional model is easier to understand and more intuitive. In dimensional models, information is grouped into coherent business categories or dimensions, making it easier to read and interpret. Simplicity also allows software to navigate databases efficiently. In normalized models, data is divided into many discrete entities and even a simple business process might result in dozens of tables joined together in a complex way.
Right off the bat, this paragraph draws a distinction between normalized and dimensional model. This means that most databases don’t reach the higher normal forms. Typically above 3NF — 3rd Normal Form is not done. Meaning, we allow some duplicate data, some de-normalization, in exchange for understanding what how the data in our database is organized.
Going back to our first post in Data Modeling series, the business comes before the database. Our database is organized in such a way that it is grouped into business needs. Highly normalized databases are split into so many tables that it is hard to navigate in queries. Even if you do know how to navigate a highly normalized database, query results can often be unintelligible keys, keys, and more keys.
Query performance. Dimensional models are more denormalized and optimized for data querying, while normalized models seek to eliminate data redundancies and are optimized for transaction loading and updating. The predictable framework of a dimensional model allows the database to make strong assumptions about the data which may have a positive impact on performance. Each dimension is an equivalent entry point into the fact table, and this symmetrical structure allows effective handling of complex queries. Query optimization for star-joined databases is simple, predictable, and controllable.
If a database is denormalized, we are duplicating data but making it easier for analysts to query what they need from fewer database objects (schemas, tables, and/or columns).
Extensibility. Dimensional models are scalable and easily accommodate unexpected new data. Existing tables can be changed in place either by simply adding new data rows into the table or executing SQL alter table commands. No queries or applications that sit on top of the data warehouse need to be reprogrammed to accommodate changes. Old queries and applications continue to run without yielding different results. But in normalized models each modification should be considered carefully, because of the complex dependencies between database tables.
Maintaining a normalized database requires a lot of work because any modification may break the tier of normal forms (NFs). However, a star schema/dimensional model has organization based on business needs, so alterations to the database should not be a huge effort, as long as the business hasn’t drastically changed.
This last post was definitely a breeze compared to some of the other 5. Major props to you if you have made it this far. Next week, we will continue on our new series on database normalization, which we started in the part four of this Data Modeling series. Thanks for reading!