Actually, I have been painstakingly reading these books for months, one even years. In an attempt to remind myself to read before the year ends, I decided to stack them.
1.Designing Data-Intensive Applications
When I studied music, every instrument group had method books. The most common for brass was Arban’s. It was originally written for cornet but either way, it was a hard book to get through. Much like this book.
I have been reading this book since February 13, 2019. I finished it December 22, 2019. It’s a bear. It gives you a lot of explanations to the questions of “what” and “how” but saves most answers to “why?” until the very last chapter. If you like cold explanations of facts, this book may be for you.
However, much like Arban’s, I will be re-reading this book in the future because it goes over quite a few concepts that I now understand in a general sense. Before reading this book, I only vaguely understood them. These concepts include, but are not limited to;
persistence and consistency across distributed databases
leader and follower databases
microbatching to stream processing
batch processing
More Unix command line tools like sed and awk
I highly recommend this book to software engineers, despite it’s dryness.
1b.Bookmark
This bookmark is a little business card shaped promo for SQL Performance Explained. It came with the SQL book and I just always kept it as a bookmark because it’s a reference to something technical.
2.The Pragmatic Programmer
I bought this book because my company has an engineering reading group. We are two chapters in and so far, it is quite good at dispensing common sense, best practices, or whatever you want to call it.
However, I am only chapters in and already there is a high degree of individualism in this book. Keep in mind, it was written more than 20 years ago. I hope that more engineers have realized effects that culture, systems, and institutions can have on individual performance. That’s not the focus of this book, but I think it is worth bringing this up. I am happy to be proven wrong in following chapters.
2b.Bookmark
There is no meaningful bookmark here. Just a scrap piece of yellow notepad paper.
3.Database Reliability Engineering
I am a few chapters into this book as well. This book is aimed at Database Administrators (DBAs), however, as a Data Engineer, I can still find tons of value in it. As a Data Engineer, I don’t have to worry about maintaining the lower level parts of databases but going into those advanced (for me) fundamentals (for DBAs), helps me be more skilled in case I need to dip my toes into the DBA pool.
I haven’t read this book thoroughly, so I’ll let it’s own description do the rest of the talking:
You’ll examine a wide range of database persistence options, including how to implement key technologies to provide resilient, scalable, and performant data storage and retrieval
3b.Bookmark
This is a Dr. Seuss bookmark. The bottom changes from “Oh me” to “Oh my” depending on the angle that you look at it. Very cool stuff.
4.Learning Spark
Spark is used to make Data Processing faster. Netflix uses it a ton for all the data they’re moving around. Spark uses the Hadoop Distributed File System (HDFS) in order to make processing faster. Basically, instead of using one machine to do a big thing, we use many machines to do many small things. Computers are way better at many small computations than few big computations.
Spark is kind of weird to me because it seems like a programming language but it’s not. Mostly not. It’s an abstraction layer so that Hadoop is easy to use. This abstraction layer is called a “framework”. The Spark framework can be written in Scala, Java, or Python. I use Python for the most part.
4b.Bookmark
This bookmark is neat because it’s a conference badge from PyData 2018 in New York. I talked at that conference on the Value of Null Results. Most badges were the teal color but as a speaker, mine was orange, so I was very proud of that.
5.Practical Vim
This is another book that I’ve started and have been procrastinating digging into. Vim is a text editor that lives inside your terminal. Learning vim at this point is more of a vain pursuit. I think I should have a positive ROI in time saved with all the coding I plan on doing the rest of my life, but the main motivator for learning vim is feeling and looking cool.
5b.Bookmark
The bookmark for this is a postcard from Maine. I visited Maine for my favorite tech conference, Monktoberfest, in October 2019. I had taken this book in the hopes that I would take a week to learn Vim. Instead, I got the flu as soon as the conference was over.
got lobstah!!
6.SQL Performance Explained
I left this book for the bottom because it’s the least urgent for me. I have performance tuned queries before, but have not dug into filling all the knowledge gaps I might have. I understand B-Tree indexing in databases and limiting reading blocks for queries. In practical terms; I have removed unnecessary JOINs, suggested filtering on indexes, but further than that, not much.
When I first started this book, a few years ago, it seemed like gibberish. I definitely look forward to reading it and solidifying my SQL performance chops. As stitchfix algos writes, ETL written in SQL is preferable for maintenance reasons. So I’m a big proponent of improving SQL knowledge.
6b.Bookmark
The MetroCard is a magnetic stripe card used for fare payment on transportation in the New York City area.
I think I was feeling inspired to move to New York, or at least some big city, when I chose this as my bookmark a few years ago. Neat!
Finale
Thanks for reading! Have a great holiday season.
About the Author
I am a Data Engineer based in Boise, Idaho. I connect APIs and Databases to a Data Warehouse and/or Data Lake. My focus is data infrastructure as code that is easy to stand up and troubleshoot.
Contracting - Freelancing
If you are an analyst working on infrastructure at a company when you’d rather be doing analysis, feel free to contact me and I could remotely help build out your company’s data infrastructure.