Reply to The Future of the NoSQL, SQL, and RDBMS Markets
Conor O'Mahony over at IBM wrote a good post on a favorite topic of mine “The Future of the NoSQL, SQL, and RDBMS Markets”. If this is of interest to you then I suggest you read his original post. I replied in the comments but thought I would also repost my reply here.
Hi Connor, I wish it was as simple as SQL & RDBMS is good for this and NoSQL is good for that. For me at least, the waters are much muddier than that.
The benefit of SQL & RDBMS is that its general purpose nature has meant it can be applied to a lot of problems, and because of its applicability it is become mainstream to the point every developer on the planet can probably write basic SQL. And it is justified, there aren’t many data problems you can’t through a RDBMS at and solve.
The problem with SQL & RDBMS, well essentially I see two. Firstly, distributed scale is a problem in a small number of cases. This can be solved by losing some of the generic nature of RDBMS and keeping SQL such as with MPP or attempts like Stonebraker’s NewSQL. The other way is to lose RDBMS and SQL altogether to achieve scale with alternative key/value methods such as Cassandra, HBase etc. But these NoSQL databases don’t seem to be the ones gaining the most traction. From my perspective, the most “popular” and fastest growing NoSQL databases tend to be those which aren’t entirely focused on pure scale but instead focus first on the development model, such as Couch and MongoDB. Which brings me to my second issue with SQL & RDBMS.
Without a doubt the way in which we build applications has changed dramatically over the last 20 years. We now see much greater application volumes, much smaller developer teams, shorter development timeframes and faster changing requirements. Much of what the RDBMS has offered developers – such as strong normalization, enforced integrity, strong data definition, documented schemas – have become less relevant to applications and developers. Today I would suspect most applications use a SQL database purely as a application specific dumb datastore. Usually there aren’t multiple applications accessing that database, there aren’t lots of direct data import/exports into other aplications, no third party application reporting, no ad-hoc user queries and the data store is just a repository for a single application to retain data purely for the purpose of making that application function. Even several major ERP applications have fairly generic databases with soft schemas without any form of constraints of referential integrity. This is just handled better, from a development perspective, in the code that populates it.
Now of course the RDBMS can meet this requirement – but the issue is the cost of doing this is higher than what it needs to be. People write code with classes, RDBMS uses SQL. The translation between these two structures, the plumbing code, can be in cases 50% of more of an applications code base (be that it hand-written code or automatic code generated by a modeling tool). Why write queries if you are just retrieving and entire row based on key. Why have a strict data model if you are the only application using it and you maintain integrity in the code? Why should a change in requirements require you to now to go through the process of building a schema change script/process that has to have deployed sync’d with application version. Why have cost based optimization when all the data access paths are 100% known at the time of code compilation?
Now I am still largely undecided on all of this. I get why NoSQL can be appealing. I get how it fits with today’s requirements, what I am unsure about if it is all very short sighted. Applications being built today with NoSQL will themselves grow over time. What may start off today as simple gets/puts within a soft schema’d datastore may overtime gain certain reporting or analytics requirements unexpected when initial development began. What might have taken a simple SQL query to meet such a requirement in RDBMS now might require data being extracted into something else, maybe Hadoop or MPP or maybe just a simple SQL RDBMS – where it can be processed and re-extracted back into the NoSQL store in a processed form. It might make sense if you have huge volumes of data but for the small scale web app, this could be a lot of cost and overhead to summarize data for simple reporting needs.
Of course this is all still evolving. And RDBMS vendors and NoSQL are both on some form of convergence path. We have already started hearing noises about RBDMS looking to offer more NoSQL like interfaces to the underlying data stores as well as the NoSQL looking to offer more SQL like interfaces to their repositories. They will meet up eventually, but by then we will all be talking about something new like stream processing :)
Thanks Connor for the thought provoking post.