There are so many data related open source products nowadays. On one side that’s great. On the other side it’s hard for one human to grasp them all. To be sure, there’s great documentation on them all. And there are books and sessions at meetups and conferences that tell you in depth what they are about. But sometimes you just want to get the gist of it. To quickly learn a lot of products, so you can pick the ones you find useful. But it’s rare to get that kind of overview over multiple products.
But “Seven Databases in Seven Weeks, Second Edition” by Luc Perkins, Jim Wilson and Eric Redmond does exactly that. It describes a selection of seven different types of databases, their strengths and weaknesses.
The seven databases
The seven databases are PostgreSQL, HBase, MongoDB, CouchDB, Neo4J, DynamoDB and Redis. It’s a good mix of databases. They all have their different uses.
PostgreSQL is the regular RDBMS we all learned to love. HBase is meant to scale out and can handle large amounts of data. In fact: don’t put small bits of data in it. MongoDB can also handle large amounts of data and has a flexible data model. CouchDB’s strength is robustness. It’s build on the philosophy that networks and hardware are unreliable. It’s also small enough to run on a smartphone.
Thanks to this book I discovered graph databases. Neo4J is great for many to many relationships in data. DynamoDB is a database offered by Amazon Web Services. It can be used small and has few limits how big it can get (except for your budget). Redis is a key-value store built for speed. It has commands with seemingly random letters.
“Seven databases..” explains where these database shine, where to use them and where they probably wouldn’t work. This is what I really liked about this book. They pique your interest and give you enough information to get started.
There’s a chapter for each database. It starts explaining how to install them and learns some simple commands to talk to them. Then the authors do a project with them. For HBase you learn to show its advantages after loading a lot of texts from Wikipedia. For Neo4J, a graph database, the exercise consists of “six degrees of Kevin Bacon”, based on a movie dataset. And with DynamoDB you load Internet of Things data. You can follow along and quickly get a feel for the database.
Depending on the type of database the book also describes backups, high availability and sharding. It’s actually quite amazing what the authors manage to do for seven databases in just 358 pages.
At a meetup last year I told that I rarely read (e)books to learn open source products. But “Seven Databases in Seven Weeks, Second Edition” is the exception to the rule. I wish there were more books like this that quickly get you on your way.