The Problem with the Relational Database (Part 1 ) –The Deployment Model
This is the first detail post in a series I am doing focusing on the issues that exist today with the Relational Database. This first post is on the deployment model. It could be argued that this isn’t directly related to the “relational database” but rather is an implementation model problem. I disagree with this as many characteristics of the relational database lead to the deployment model described (we will explore in later posts).
For most of my career I have been involved with the enterprise and the databases in this environment. Over the years I have seen the volume of databases increase dramatically in line with an increase of data centric applications. This has led to even medium sized organizations often having dozens of physical database servers. Enterprise organizations often have hundreds of database servers, occasionally thousands of them. The volume does vary heavily by database platform however, SQL Server typically suffering the most sprawl out of all the mainstream enterprise relational database platforms.
Problems happen when DBA’s try to co-locate independant databases on a single server. The problems are due to the dynamic nature of databases in terms of data volume and dynamic nature of query load. This dynamic nature makes managing capacity a complicated and time consuming task. When relational databases share resources you risk a small number of intensive database queries causing concurrent impact to a wider group of other queries. Because of this, typically small numbers of databases share the same servers. On average for SQL Server around a 10:1 database to server ratio is seen in the enterprise.
The brokenness of this model is pretty easy to spot. Firstly, resource inefficiency and ineffective distribution is a clear problem. While I am generalizing somewhat, an organization with 100 database servers often could have 70% of those servers vastly underutilized, 20% of those servers effectively used and 10% of those servers highly over utilized with users suffering from poor performance, “bottlenecks”, as a result.
With this deployment model it isn’t possible to take the unused “resources” (CPU, Memory, I/O bandwidth) from elsewhere in the organization and re-apply it to where needed (even with downtime, let alone in real time). Instead new infrastructure investment is made to continually add new resource capacity for the bottlenecked databases.
A relational database is capped by the limits of the server on which it currently sits. A DBA monitors the server trying to keep current query demands as optimal as possible to avoid premature bottlenecking, and continually planning to stay one step ahead of database requirements growth. This is a costly process and one often not helped by the unpredictability of the relational database (which we will discuss later). Multiply this need across the hundreds of servers described and you can imagine it is a significant contributor of the cost of ownership.
When you reach the limits possible on a single server many database platforms have few practical options available for further scalability (such as distributed scalability for reasons again we will address in a later post in this series). Too often organizations with multi-million $ servers are being forced to split workloads, move real time operations to batch operations, replicate data for offline processing purposes and mandate specific times when users can run particular intensive functions. Again, all this manual fiddling becomes a management nightmare and significant overhead when you multiple it out.
This issue in isolation can potentially be addressed through technologies such as virtualization. While virtualization is yet to make major impact on the way in which production databases are deployed in the enterprise, this may change in the future. However as we delve further into the problems associated with the relational database, we will see this is not the only issue that we face taking this technology forward.