The Problem with the Relational Database (Part 2) – Predictability
I met with a friend of mine in New York recently who runs a credit card processing end-point company. They specifically built their business around a non-relational database platform and feel they would have major issues had they chosen to build their business around a traditional relational database.
In this business they process hundreds of transactions a second, this means that hundreds of people every second are standing somewhere in the world at a checkout making a purchase. And if the transaction is delayed, that is a lot of people are standing and waiting. Worst still if the transaction is delayed too long the transaction fails and the customer not only has to retry their transaction, they are likely embarrassed and probably somewhat annoyed.
The relational database certainly has the capability to scale to many hundreds of transactions a second. But the reasons for their move away from the relational database aren’t specifically scale related but instead come down to predictability and consistency.
Relational databases are not predictable or reliable in terms of consistent performance for a number of reasons.
Firstly, every query uses a different amount of resources. A query could consume 1 or 2 I/O’s or 1 or 2 million I/O’s depending on how the query is written, what data is selected and factors such as how the database is indexed. Performance is further varied by how the database is maintained (fragmentation). What makes matters more complex is that different predicate values for a query can have vastly given data distributions. The same query executed with different constants can have vastly different resource requirements.
Because every query has a different “footprint” running a query in isolation does not provide indicative statistics on how that query will perform under concurrent load. In fact is become impossible to predict the exact execution duration of a relational database query as its performance will be dependent on what else is being executed at the exact moment it is.
Cost Based Optimization
Because queries can have complex execution steps behind them with multiple paths to achieving the result, one neat thing that was added to the relational database was cost based optimization. This takes the control of “how” a query is resolved away from the developer and instead the relational database server works this out at runtime using a complex series of algorithms. This was hugely important for simplifying the use of relational databases. But it is bad for predictability.
Cost based optimization can get it wrong. It is much less common now than 10 years ago but every DBA has still come across cases when the optimizer chooses the wrong “query plan” and the query execution takes a long time for seemingly for no reason. But worse for predictability is cost based optimization is dynamic and looks at a lot of factors at run time to determine which plan is best. These factors can change meaning that query performance can change (often without any external schema/code modifications). This dynamic ability can cause headaches to DBAs who are often tasked with resolving performance issues when suddenly the database is “going slow”.
Ok let’s take a real world look at this problem. I have gathered some stats from a real, production (SQL Server) relational database. This database is around 500GB in size and processes ~700 transactions per second 24 x 7. This data is just a couple of minutes of captured activity.
The first graph I am showing here is the # of queries based on resource usage. To simplify things resource usage in my example = CPU TIME + # READS + # WRITES.
Because queries in relational databases can have vastly different resource usage profiles, we get a very wide distribution of resource usage by query (note the logarithmic scale).
We can see that most of the queries executing on this database use <50 “resources” (CPU + DISK) but at the same time a smaller number of queries can use anything up to 22750 x this. That is a massive distribution.
Problems with predictability can start to occur when a number of high resource queries randomly end up running concurrently. This restricts resources available to other queries and therefore the execution times of other queries execute at that time are affected.
We can see this by looking at the execution duration of ONLY the queries that have used 50 resources (CPU+DISK) or less. They have a similar resource usage profile, so in theory they should all have a similar execution duration, right?
Unfortunately no. This graph is showing the execution times for only the queries that used 50 resources or less. While we can see that most of these queries executed in 100ms or less, many of these queries took a lot longer than this. In fact there was a wide distribution up to 5297 x this base execution time (529 seconds). All these queries used roughly the same amount of server resources yet their execution times were massively different.
The reasons for this could be many. Firstly it could be resource contention. Many of the queries taking longer could be doing so because they are being executed at the same time as a number of high resource queries. Or alternatively, it could be because of resource concurrency issues. These queries may be delayed because they are waiting for blocking queries to complete. Or it could be because of internal database management issues (indexes updating), or it could be due to concurrent disk issues (log backups, check-pointing) and so on and so on.
These are in fact all issues that DBAs deal with and investigate on a daily basis. The best you can do is to try and optimize as many of the reoccurring queries as possible. However in doing this it still isn’t possible to accurately predict performance at run time, or ensure consistency of execution time at run time. This is bad news when you are trying to use the relational database as a platform for applications where consistency of performance is paramount.