When you start a new project there are massive number of ways to store your data to choose from. Here are some examples
- Relational Databases
- Document Databases
- Key Value Database
- File System
From experience 99% of the time for new projects the best data store to use is a Relational Database. Having said that the key factor is understanding your data and how you store/access that information and then choosing the appropriate data storage system. Don’t make the big mistake of using a technology just because it’s the current fad, as this will most likely end up with massive rewrite when the current fad doesn’t pan out. I’ve been wanting to use MongoDB for years, but have yet to find a project were this is a good match. Let the data choose the storage method, not what you want to use.
Every system will work and perform well with a small sample data set, so unless you have done lots of careful planning you are not going to find issues until you are live and in production and by then it may be too late to fix.
Here is a article on the issues caused by using the wrong data storage system.
Relational Databases make an excellent starting starting point and I will explain why.
Relational Databases scale really easily.
Relational databases scale very very well. For example stackoverflow.com which is in the worlds top 50 web sites is powered by Relational Database(Microsoft SQL Server). In fact it works so well they have close to 0% CPU usage on their database servers!! That’s insane!
One common technique with Relational Database is to add a read replicas. So all query’s(reports) go to this secondary server(you can add more if required). This technique reduces the load on the primary server. As reports tend to be the heaviest usage of databases adding read replicas allow you to scale these operations.
With Modern hardware e.g. SSD Hard Drives, Terabytes of Memory, 10’s of CPU cores, not many system are going to hit the limits which allow Relational Databases to scale amazing well. It’s a lot easy to scale vertically than horizontal. It’s also lot easy to maintain 1(using vertical scaling) server than 10’s(using horizontal scaling) of servers.
Relational Database’s are easy to program against.
The tools for Relational Databases are very easy to work with. When you use tools like Entity Framework for .Net or (Hibernate for Java)they are so simple to use, you almost forget that you are using a relational database. The ecosystem for Relational Databases is incredibly rich, from Monitoring servers, Reporting Engines, Code Generators etc there is no shortage of tools. Also if you hit an issue, someone else is bound to have hit it too. So sites like Stackoverflow.com have tons of helpful tips.
Relational Database keep your data safe
An important part of relational database is that they have a feature called referential integrity. It helps to ensure that any data you add or delete will be valid and won’t have any broken records. Also each field can have extra checks to help ensure that the entered data is correct.
It can be a pain at times but the extra safety the checks Relational Databases have are well worth the investment of using these features. Nothing is life is free, but for the low effort to enable and use referential integrity etc the gains are fantastic and they have saved me more than once. Some other features that relational databases use to keep you data safe are:
Relational Database are fast(very fast)
They have designed and had over 30 years of optimizations, which have made them very very fast at querying data. Some of Microsoft upgrades to there SQL server have been stunning. I have seen 2x to 10x performance upgrades just by upgrading to the latest version!!
One database I work on has 100,000,000’s of updates very year and hits peaks of 30 to 100 adds/updates per second and it isn’t even breaking a sweat. The database server is a Microsoft Azure P1 instance, so the actual hardware(my laptop is faster) it uses is very very low. You can buy hardware that would be 100x more powerful, so that is a lot of room for growth!
For you next project, I would recommend first modelling our your data structure on a white board. Think about what reports you will require and what indexes you will require. I am sure you will find for majority of applications a Relational Database is the best starting point. As you can always in the future use other Data Storage engines to supplement it required. I use Azure Blobs(file) and Tables(key/Value) to store data that doesn’t need to be in the database as required.
Image Credit: https://commons.wikimedia.org/wiki/File:Balanced_scale_of_Justice.svg