Information to help engineers decide whether to opt for a traditional relational database (RDBM) or what has been coined a NoSQL database (NoSQL).
October 2nd, 2019 • Buildable News & Resources
Any software engineer who participates in the early stages of an application build will inevitably be faced with the decision of selecting a database for storing its data.
Unfortunately, that is not an easy choice. Engineers can find themselves spinning in the myriad of database options, in particular when deciding whether to opt for a traditional relational database (RDBM) or what has been coined a NoSQL database (NoSQL).
Let's go over some differences between the two. Then, we’ll cover some NoSQL crossover functions you will find in modern RDBMS – which might be just what you are looking for.
There are many reasons why someone would want to use the comfort of a relational database. After all, these are the classic databases, with large communities of support, predictable behavior, and built-in mechanisms for data integrity and which can be queried by the infamous SQL language.
SQL – structured query language – is a widely used, versatile language that allows users to search for anything in a database and organize the results in a variety of ways. Rare is the software engineer who has never written SQL queries. These databases have been around for ages, and everyone knows (or should know) how to use them. There are several reputable options to pick from in the open-source community – MariaDB, PostgreSQL, and SQLite, to name a few – and there are several other respectable options in the commercial market as well – such as Microsoft SQL Server, Oracle, and MySQL.
So, if relational databases are so great, why was the NoSQL database invented?
Well, there are certain functions relational databases do not perform well at: working with non-relational data (e.g. JSON, XML, graph, and spatial); building hierarchical data structures; storing large numbers of documents in a distributed way; scaling horizontally; dealing with fault tolerance; or working with data that does not have a predefined schema structure and whose structure can evolve over time (e.g. JSON).
These are the types of situations in which relational databases struggle. Horizontal scaling alone, for example, addresses the ability of a database system to be spread across multiple nodes. While that can be accomplished in an RDBM with a technique called sharding – breaking one large database into multiple smaller, self-contained databases – this is painful to maintain and even more difficult to query globally.
Other capabilities, like fault tolerance, can be managed with replication, but that is also painful to set up and maintain and taxing to recover from. (That said, we have seen great results with SQL Azure – but more on that later.)
Another key differentiator between RDBMS and NoSQL is how they store data.
In a relational database, data is stored very structurally. Data is placed in tables, which contain rows. Rows are defined with columns, each with its own data type. This is what we call the database schema. The data to be stored must be mapped in advance and organized by defining relationships. Increasing the performance of the relational database is usually done vertically, by improving hardware (e.g. more CPU, storage, or network).
On the other hand, data in a NoSQL database can be stored more flexibly and in more than one manner, where the number of ways and the methods of organization are driven by the specifications of the application. Data can be filed into several types, key-value store, document store, graph store, or column store, to name a few.
The most obvious NoSQL public key-store is probably Amazon S3. It's not fast, but you don't have to worry about the availability of your data. Listing items in a bucket (folder) is an iterative process where you can retrieve 1000 objects at a time until there are no more objects to return. It changes how you access data.
The point is, a NoSQL database has been specifically designed to be agnostic to the structure of the data it stores. It does not necessarily know how its data is organized, but it does know how to retrieve it quickly or store it across multiple nodes for better availability. (Think of a distributed file server.) In other words, the database stores the object as is in a column.
Which leads us to another big differentiator: You can't use SQL to query a NoSQL database.
So, to the engineer who is building a new application, it should become obvious that the two types of databases have their use cases, and it should be even more obvious that neither should be solely responsible for storing all the data of the application. While you would want documents to be spread across nodes, you would not want your user records similarly spread. This would be difficult to manage and, more importantly, insecure. A user table should be stored in a relational database.
By the same token, it is fairly common to have a requirement where the structure of the data to be stored is not known in advance or could evolve over time, like user settings, survey entries, user input, metadata, etc. You would not want to store these in a traditional table with a large number of columns just to deal with the “in case we need it” approach.
Does this mean that the engineer building a new application with both relational and non-relational needs is now faced with setting up two databases side-by-side and having the application access one or the other depending on the situation? That would be tedious.
Enter NoSQL functions of traditional relational databases.
The reality is that most applications continue to have a large relational requirement while also exhibiting a small NoSQL requirement. To address this need, most modern relational databases now offer some NoSQL functions that alleviate some obvious NoSQL use cases, specifically JSON, XML, and spatial and graph data types.
However, the scalability or distribution requirements addressed by a NoSQL database will not be solved by these NoSQL functions. The XML type is useful, but rare is the programmer who does not have a love-hate relationship with XML. Spatial data has its uses, but unless you are dealing with geometric data sets (circles, points, etc.) then it will be of little service to you.
The two most exciting types are JSON and graph data types. These non-relational data types are very convenient, as we will describe below.
JSON is very much like XML in that you store data that represents an object but whose content the database does not actually care for.
You can extract data from JSON columns as sub-JSON objects (see JSON_VALUE for SQL Server) or as normal scalars (see JSON_QUERY for SQL Server), which means you can write queries to extract a specific property of the object. You can also use JSON objects as sub-tables which you can join with a regular table (see OPENJSON for SQL Server).
SQL Server, SQL Azure, MySQL, MariaDB, PostgreSQL, and SQLite all provide their own JSON manipulation functions.
The other very exciting bit is the ability to use graph data. Graph data consists of classic SQL tables that are then defined as either node, for storing data, or edge, for storing a relationship.
SQL Server provides a very interesting set of graph functions. These node and edge tables behave very much like traditional tables and can be queried normally, and the graph queries dramatically augment the ability to investigate relationships amongst the data. These are great tools for dealing with relational data.
Unfortunately, the NoSQL functions of these relational databases stop here.
We have yet to see relational databases provide clean support for sharding or distributing content across nodes. Distributing content would make it very difficult to maintain relationship integrity, which is a core promise of an RDBM.
That said, SQL Azure and Amazon Aurora, for example, are two mature, promising database choices that have begun to exhibit some cloud-style scaling. Users can essentially let the cloud services scale a database to large proportions. Unless you are building a new, Facebook-like application, chances are that most relational databases would be suitable for your application.
McMinnville Headquarters
Appointment only
620 NE 3rd Street
Suite A
McMinnville, OR 97128
Pacific City Office
Appointment only
35170 Brooten Road
Suite E
Pacific City, OR 97135
Portland Meeting Room
Appointment only
1355 NE Everett Street
Suite 100
Portland, OR 97209
Phone: (503) 468-4880
Email: connect@buildableworks.com
Talk with an expert at Buildable about your project.
Copyright © 2024 Buildable.
All Rights Reserved
Privacy Policy | Terms of Service