Relational vs. Non-relational Databases
Relational databases rely on schemas that describe tables. These schemas map the relationships between different tables so that primary and foreign keys can be used to retrieve, store or modify data on multiple tables with a single SQL query. While schemas potentially allow you to query your data in many different ways, updating a schema after an application is live is a challenging and clunky process that makes it very inconvenient to change your mind about what information to track and how you want to store data. Relational databases are considered extremely safe and reliable because they allow for ACID (atomic, consistent, isolated, and durable) transactions. To achieve this, relational databases can use read and write locks to prevent other users from reading or updating a record that is currently being updated, so users only have access to the most up-to-date information and the result of a set of concurrent transactions is that same as if each transaction had been executed serially. Additionally, since data is always committed to disk before a transaction is considered complete, relational databases guarantee that a set of changes across different tables happens atomically (for example, during a bank transaction you can guarantee that the value was deducted from one account AND added to the other, there is no chance that only one of these changes happened). These features makes relational databases slow and necessitates that transactions need to happen on a single machine, which makes them difficult to scale horizontally.
Essentially, non-relational databases sacrifice ACID transactions in favor of efficiency and scalability. Non-relational databases come in several flavors including key/value, graph, and document stores, each with different limitations. A fundamental characteristic of all non-relational databases is that they do not rely on schemas and SQL queries, although schemas can be defined and used by models in certain non-relational frameworks. These databases are easy to scale since they can be run on many different servers in many different locations, but this feature introduces several limitations: First, since there are no transactions, you have no guarantees that changes will occur together. Second, if you’re using a cluster of servers to allow for heavy traffic, there is no guarantee for consistency between those database replicas at any given moment (although presumably they will sync with each other fairly regularly). This means that if you write a change to your database and then immediately check for the updated data, you might end up making those requests to different servers and the update will appear to have failed. One way around this problem is to send a server-identifying key with the second request so that it will be routed to the server with your updated information.
Here is an example of why this matters:
When visiting certain websites or using certain apps, you’re required to agree to terms and conditions regarding use and privacy. If the server detects that you have not yet agreed to these terms, it will reroute you to the terms and conditions page. Upon clicking “agree”, the server stores that information in a database and allows you to access content. At this time, only one of several servers “knows” that you are allowed to use the app. If, by chance, your next request is routed to a different server, it will think you haven’t agreed to the terms and conditions yet and will reroute you to that page again.