In Fall and Winter of 2019 I decided to write a ton about Data Modeling. Data Modeling is one of those topics that escapes a lot of intro learning material in the data world. So I wanted to fill in that gap.
I went over dry Wikipedia articles on Data Modeling, and tried to make them more digestible.
Why Data Modeling?
Data Modeling helps you understand how your company’s database objects are organized. Understanding your company’s data better helps you write more effective SQL.
My posts:
Explaining Data Warehouse, Star Schema, Entity Relationship Diagrams (ERDs).
Conformed Dimensions and Referential Integrity with examples.
How to design a Star Schema for your Data Warehouse
Introduction to Normalization.
Benefits of Data Modeling; Understand-ability, Query Performance, and Extensible Databases.
Intermediate Normalization: Objectives
Advanced Normalization: De-normalization Anomalies
Going through Normalization examples by increasing Normal Forms.
Part Nine TBD
Denormalization in practice
Conceptual, Logical, and Physical levels of Data Modeling.
Taking Suggestions!
Conclusion
Data Modeling is a collection of principles, rather than hard rules, that have trade-offs depending on your circumstances. It’s definitely not the most exciting topic in the data world but it can only help both analysis and engineering.
If you would like me to cover another Data Modeling topic and add it to this post, please comment below!
Data Engineering Content
CRAN just released a data modeling package for R. I highly recommend it as it applies everything in this post to a Tidyverse framework.
RStudio > install.packages("dm")
Building Robust Pipelines with Spark Talk. I highly recommend the first 7 minutes as a great explanation of ETL.
About the Author and Newsletter
I automate data processes. I work mostly in Python, SQL, and bash. For more about my freelance work, check out ocelotdata.com.
At Modern Data Infrastructure, I democratize the knowledge it takes to understand an open-source, transparent, and reproducible Data Infrastructure. In my spare time, I collect Lenca pottery, walk my dog, and listen to music.
More at; What is Modern Data Infrastructure.