July 11, 2009

Groovy Baby, Yeah

GroovyCorp

(yeah, this company is going to have to get used to the Austin Powers references.)

Groovy Corp put out a press release last night that starts the official launch of their SQL Switch relational database platform.

I have been speaking with Groovy for a few months, and while the press release is a bit noisy there is actually some interesting stuff in it.

First, an overview

  • They are an in memory RDBMS
  • They have worked with Intel to architect from the ground up for large multi processor concurrency
  • Initially they are launching as a multi-core appliance
  • They claim 200,000 sql operations per second from a single box
  • They are proprietary (not built on MySQL or any other open source database) which means they have had a lot of control around their architecture
  • They are a pretty cool company with some interesting people


So one of their key claims to fame is they can be a lot faster & cheaper than a traditional RDBMS.  In some case significantly so, expect more on this.

But, this is only part of the story.  The bit which has been getting them traction around Silicon Valley, and has rocketed them into relevance, has been their real time focus.  They have developed a push extension to SQL where the RDBMS actually pushes data out as and when required out through the real time layers of the application stack (Ajax, XMPP etc).  This removes the need for the high volume database refreshing/polling queries, these queries currently sit under everything that calls itself “real-time” today when a RDBMS is involved.  If you want to know more about why, you may be interested in this post I wrote a little while back.

I do know that anyone who is big on the web and is focused on real time is checking them out.  While it is early days for them, I think they tick the unique and innovative boxes which are key requirements to staking some ground in the highly active database market.

I will follow with some more technical info.

UPDATE: If you are at the TechCrunch real time event today, check them out.  They are running the bar, their slogan for this is "the only thing refreshing is the beer".

Reblog this post [with Zemanta]

July 09, 2009

The TPC Debate (yawn)

Recently on a number of sites the benefits for and against have been debated with, on occasion, these conversations falling into abuse being thrown in both directions.

From a pure technical perspective, the TPC benchmarks make little sense and are probably not relevant to 99% of organizations looking to implement a database technology.  But as a tool for generating visibility, debate and improved public awareness of a vendors technology they still have an impact. 

This is marketing, pure and simple.  Having a great TPC result is akin to an author having a great review on Amazon.  Doesn’t mean it is relevant for you but if faced with a stack of titles you haven’t yet read you’ll probably look more closely at the ones you’ve heard something positive about.

TPC’s are a tool for the marketing department, just the same as any other form of adverting.

There may be many technical purists who find this offensive.  But the reality is that the database business is a $20b+ a year industry and everyone is jostling for their position.  They will use all valid means to get their product in your mind when you are next making a decision around a database platform.

Reblog this post [with Zemanta]

July 06, 2009

Positioning your Database Start Up for Data Warehousing

Guinness World Record 1PB Data Warehouse Achie...Image via Wikipedia

BI/Data Warehousing is an easier market to enter for new database platform vendors.  This is for a few reasons.  Firstly, most BI deployments are custom built projects for each organization.  This means the ability to pick and choose various layers of the stack is much greater. 

Secondly, BI/DW projects success/failure metrics are often tied to database platform driven properties – performance, scalability, load times etc.  The ability to stray outside any existing database platform “standards” to choose a platform that better meets key metrics is more tolerable.

Thirdly, because the ratio of BI to OLTP is low, the associated impact of violating a corporate standard is much lower.  With OLTP applications typically deployed in the hundreds or the thousands within the enterprise, lack of firm standards could end up with dozens of different database platforms requiring operational support, spread across hundreds of systems.  On the other hand, violating the standard for a handful of DW systems is unlikely to turn into the management nightmare that would occur with the former situation.

The data warehouse database platform has been an area of heavy innovation and many newcomers have appeared over the last 5 years.   If you are going to enter this space you better make sure you have your point of difference pitch really honed. 

In addition you should:

  • Ensure you are supporting standard interfaces (OLE-DB or ODBC).  Being the greatest data warehouse platform that your customers can’t write reports for isn’t going to be a great sell.
  • Ensure you are providing a good standard set of tools.  Query tools, design tools, data loading/integration tools etc.  If you are small getting any of the third party tool vendors to pay attention to you is going to be difficult.  Use compatible interfaces where you can to ease this but also make sure include your own support.

Strategies include the following.

#1 Be Bigger, Faster & Cheaper

At the top end of town, large data warehouses are getting larger.  Multi TB data warehouses are common PB & multi PB are at the leading edge.  But at the same time the response time requirements are getting smaller.

Horizontally partitioned, distributed, highly scalable database platforms are the only way to fulfill these requirements.  Doing this on traditional platforms (Oracle, SQL Server, Teradata, DB2) can be difficult and/or costly.  If you can make it simpler, while being scalable, faster & cheaper (cheaper licenses, less hardware, less difficult to deploy & manage) you’ll have a good story to tell.

Netezza, Greenplum, Vertica & Aster Data are examples in this group.

#2 Be Smaller, Faster & Cheaper

At the other end of the spectrum there are organizations that just want their report queries to run faster.  They may not want to build a multi PB data warehouse, they may just have a few hundred GB of data and want snappy, easier report style queries to run quickly.  Providing a simple, cheap database platform that is easy to implement and easy to migrate to means organizations can quickly start receiving bang for their buck.

Kickfire & Infobright are examples in this group.

#3 Be Specialized

Similar to what I spoke about in the Enterprise OLTP post, picking a specialization and focusing is always a good method for getting a foot inside the door (albeit in a limited initial capacity).  Focusing on a specialization and packaging up your database platform with re-built tools applicable for that specialization can be a great way to out maneuver the competitors.  Such examples of pre-packaged tools include reports, dashboards, alters and other analysis targeted towards that specialization.  This can further save your customers time and money from removing the need to build such capabilities be-spoke.

Tenbase & SenSage are examples in this group.

Reblog this post [with Zemanta]

Positioning your Database Start Up for Enterprise OLTP

Approaching Oracle HQImage by RaghuP via Flickr

It is important to realize that there is less diversity in the enterprise OLTP market than at any point in the last 20 years.  Essentially this market has been boiled down to Oracle, SQL Server & DB2 (with few isolated exceptions).   Most new deployments are typically using one of the first two options.  The lack of diversity has created a stalemate or chicken & egg situation.  Enterprises now only want to install new applications that have been built to support Oracle or SQL Server.  This is what most of the enterprise application software vendors are supporting, so it gives them the ability to standardize their database platform across the enterprise.  And of course, this means that enterprise application software vendors largely now only want to invest in supporting these two platforms. 

Any new vendor coming into this market will find themselves stuck between the proverbial rock & hard-place.  Enterprises don’t want your new database platform because it violates their standards and few applications are supported on it, and enterprise application software vendors don’t want to invest in developing applications for your database platform as their enterprise customers aren’t interested in using it.

If you are pre-funding then you might find a focus on the enterprise hurts you rather than helps you.  While almost all of the database revenues are generated by the enterprise, the VC perspective seems to be that the enterprise sector is flooded and enterprises aren’t buying right now.


So what can you do?

#1 Launch a database platform that is so compelling that enterprise customers start demanding their application vendors support it.

This is a tough strategy.  Normally the enterprise does not demand vendors create applications for anything other than the main set of enterprise database platforms (Oracle or SQL Server and in more isolated cases for DB2 and MySQL).  Usually they are satisfied with a choice of these. 

For this to be of any success, what you are offering the customers has to be compelling, very compelling.  MySQL being free, for example, has not been compelling enough for many enterprise customers to demand it or enterprise application vendors to support it (I am talking about pure enterprise OLTP here.  This is not necessarily the case in other markets), unless combined with large volume deployment (see next point).

#2 Launch a database platform that is so compelling that application developers support it and then push their customers to install it

Again, doing in this in a broad way is going to prove to be very difficult.  You need to win the enterprise application software vendors over in such a big way that they are:

  • Willing to invest development costs into supporting your database platform
  • Take the hit from the customers who won’t buy their product because they are not supporting the mainstream platforms
  • Spend a lot of time convincing their customers of the merits of your database platform

For this to happen you have to sell the enterprise application software vendors something they can’t do (or can’t do easily) on a traditional platform.  It is not enough to be as good as Oracle or SQL Server for these application vendors.  You need to create new opportunities for them that will help them win more business.  Some reasons why a development organization may decide to invest in your platform include:

  • Much shorter development timeframes via less code to write etc.  If you are a true RDBMS this is unlikely, but if you have a revolutionary interface you may have something here.
  • Much lower per unit license or manageability costs.  This is relevant for high volume deployments (embedded, POS, mobile workforce) databases.  Enterprises will also care less if the data is ultimately consolidated into a data mart on their preferred platform.

#3 Pick a high value niche and focus on fixing its problems

The “enterprise” is of course a generic term which describes most large corporate businesses, however within this market you will find specialist sub-markets.  Telecom, mining, utilities, banks, financial markets etc.  If you can focus on a more specialist market and build a database platform product specifically targeted towards that market you may be able to gain the elusive “specialty vendor” gate pass into those relevant enterprises.  Again this is not something that can be done easily.  It will only work if the generic solution is leaving them exposed in some way, and your specialist solution doesn’t bring with along its own insurmountable weaknesses.

MySQL Cluster is an example of a database platform product which has had some success at this.  MySQL Cluster started as a solution for the telecom industry.  It was focused on a specific niche and set about solving the “carrier grade” requirements of this market.  It was successful in this niche and because of that it is being pushed out further in those organizations.  While this is just a small niche, it has provided MySQL a foot in the door to the enterprise.

#4 Provide platform compatibility

The final way in which you can sell into the Enterprise OLTP market is by providing a compatible platform either cheaper or more scalable. 

This is also a difficult sell but you have two sub markets here.  Firstly, your customers in this space are enterprises which have developed their own apps.  You can try and convince them to take their existing skills, knowledge and investment and switch out their existing database platform for yours.  As they support the applications doing so means they can implement any workarounds if your compatibility isn’t 100% true.

You second sub market is to sell your benefits to the application vendors themselves.  Your objective here is to get those application vendors to officially support your platform.  Doing so then allows the enterprise to considering switching out their existing database platform for yours.  If you don’t have the application vendor support then don’t even bother pushing this route.  It is very unlikely that an enterprise will move a production application to a stack that their vendor doesn’t officially support, no matter how much better your platform is.

To get the enterprise application software vendors onboard enough to officially support your platform, you need to convince them that they are losing sales of their applications due to limitations of their supported database platform (which you of course mitigate).  Perhaps this is due to very high license costs, poor performance or limited scalability.  If you are providing good compatibility and you are really solving some of these issues, the application vendor has a low risk but high reward proposition in front of them.  This may just be enough for them to stick their neck out and officially support your database platform.

EnterpriseDB's Postgres Plus is an example of this.  One of their main pushes is their Oracle compatibility & significant savings over pure Oracle license. 

If you can also bring additional benefit to the party (increased performance/scalability is always good) then this will provide you added benefit.

For any of the above to be successful you need to ensure you are supporting the development stack with appropriate interface libraries for all the common development platforms (.NET, JAVA etc).  Getting any third party tool vendors to support your database platform is highly unlikely at this stage of your lifecycle, so you will need to ensure you ship with basic tools for manageability and ad-hoc data.  Performance profiling tools are also a must for any vendor targeting high workload customers.

Strategies that Won’t Work

Of course there are strategies that you should be aware of that won’t work.  Common ones attempted include the following.

#1 Just being Cheap

Free is good right?  Enterprise is trying to save money, right?  So the enterprise is more likely to choose a database platform that is free than one they have to pay for, right?  Well, no.  Not really.

Cost of ownership is what it is all about and this is made up of a lot more than just license costs.  Operational management, hardware, resourcing, training all contribute to the costs associated with running a large enterprise database environment.  To minimize these costs enterprises like to have consistency and standards across their environment.  This means minimizing the number of platforms, selecting the platforms goes back to what applications are supported on what platforms.

If you are not compatible but are cheap, the cost of migrating existing applications to your platform must be considered.  You may be calculate your license costs may be $250k cheaper for enterprise X than the equivalent SQL Server licenses for example.  However $250k doesn’t go far if they have to invest in the re-development & re-testing of their software applications to work on your platform.

One of the exceptions to this is when your platform is being deployed in large volumes (such as in POS or mobile scenarios).

#2 Just being Faster

We are 100 times faster than Oracle, so you should use us rather than Oracle.  Right?  Again, not really.

Being faster is good but on your own it is not enough to win you entry into the enterprise OLTP market.  If you don’t have any existing platform compatibility, you could very well be the fastest database that no application uses.  Unless you are using a strategy above AND being faster is part of that strategy (a great part of that strategy btw) then being faster is not of significant relevance to the enterprise.

There are loads of databases that are faster than SQL Server & Oracle for various workloads that aren’t used in the enterprise in any significant way.  Don’t get me wrong.  Fast is good.  But being fast on its own is not a strategy for enterprise OLTP.  You need to use your fastness as part of your strategy, but also make sure you are pushing all the other right buttons necessary to see your database platform climb get a finger hold in the immense enterprise database platform market.

Reblog this post [with Zemanta]

How to Position your Database Start Up

I have been speaking with a lot or new database vendors over the last 12 months and this has prompted me to revisit a post I wrote mid last year.  The basic premise of this post is that your strategy, and the group of people you’re selling to, largely depends on the market sector you are focusing on (Enterprise OLTP, BI/DW, Cloud & Web 2.0).

A database platform by itself is a largely pointless piece of software.  The only way value is produced from a database platform is through the applications that interact with it.  Therefore the only way to be a successful database platform is by making others successful and motivated to use your platform.

Ok, so as a database platform vendor how do you enter this market then? Well there are a few strategies.  Due to the length of this article I have broken it up into Enterprise OLTP, Enterprise Data Warehousing and Cloud & Web 2.0

Amusing Database Videos

Oh my. This is just immensely funny & sad at the same time - Amusing Database Videos http://www.bigdatabaselist.com/wiki/Amusing_Database_Videos

July 03, 2009

Relational Databases Get a Hard Time

Giant TortoiseImage via Wikipedia

The NoSQL event has triggered a bit of a hard time for the RDBMS the last week.  I won’t add any commentary as this follows what I have been talking about for a while, but here are some of the links.  Most notable is Michael Stonebraker’s post on the ACM site.


Reblog this post [with Zemanta]

June 21, 2009

Mass SQL Server 2000 -> 2008 Upgrades to come

The entrance to Microsoft's Redmond campusImage via Wikipedia

Despite Microsoft’s considerable investment in SQL Server 2005 if never captured the strong market momentum that SQL Server 2000 did.  The problem, essentially it was a victim of its own previous success.  SQL Server 2005 for all its merits was not significantly compelling enough, in a lot of cases, to motivate organizations to upgrade existing installations.  New database applications installations used 2005, which has seen SQL Server 2005 steadily grow its install base.  But here we are 4 years after release and most production databases still remain on SQL Server 2000 (fast approaching its 10 year aniversary).  My own data shows the breakdown as follows:

Production SQL Server Installs

  • 54% - SQL Server 2000
  • 45% - SQL Server 2005
  • 1% - SQL Server 2008

Those SQL Server 2000 installations are now quickly reaching end of life.  Their hardware is out of date (32bit), and the database platform is now out of date.  Upgrades are becoming a forced requirement.  SQL Server 2005 will now be skipped with upgrades going straight to SQL Server 2008. 

With millions of production SQL Server 2000 installations still in existence, I see a strong period of growth for those making tools that assist with the upgrade or consolidation of SQL Server databases. 

Reblog this post [with Zemanta]

June 15, 2009

The problem with the RDBMS (Part 3) – Let's Get Real

The Passage of TimeImage by ToniVC via Flickr

  • Introduction
  • The Problem with the Relational Database (Part 1 ) –The Deployment Model
  • The Problem with the Relational Database (Part 2) – Predictability

    The two primary trends in data management that have been happening for as long as I can remember are:

    1. The expectations of the volume of data we are can produce and consume is growing rapidly
    2. The expected delay between data production and consumption are decreasing rapidly

    We have seen ‘typical’ data volumes of databases grow from MB through GB to a point currently where TB databases are common, and PB databases are the “big guys”.   But at the same time we have seen the expectations around the timeliness of response from these databases also change.  What used to be a monthly report became a weekly, then a daily and finally it is not uncommon to have near real-time expectations for databases in terms of data retrieval and analysis.  We have been on a continual path towards the point where data is consumed at the same moment in which it is created, either in raw form or in an aggregated or otherwise processed state. 

    At the other end of the application stack, our ability to move more data around faster has led to new styles of applications that provide users near immediate access to data as it is created.  Popular consumer web examples of such applications include Facebook, Twitter, Friend Feed etc.

    But at the moment these applications aren’t real time, they are near real time.  This means there is a delay of some form between data creation and consumption.  These delays may be very short or several minutes depending on the particular application and its current workload.  These delays may seem irrelevant for the above mentioned apps, but the difference between “near real-time” and “real-time” can have a significant impact on the application functionality.  I am sure we have all been frustrated when checking in at the airport and choosing a seat, only to get the “sorry that seat is no longer available” once you click the ok button for your selection for example.

    The Problem with the RDBMS

    The problem with the traditional RDBMS is that it is not a real time system.  It is poll based.  This means a query is constructed, submitted and the results are returned to the application.  This itself may happen very quickly, maybe only a few ms to execute and receive a resultset.  However the problem is of course, the data is only “valid” for the exact moment when the query was executed.  From that moment onwards the data becomes stale and numerous changes could be happening on the data within the RDBMS while the extracted resultset is processed.

    NOTE: Yes I am aware that the disconnected approach is modern and a server side cursor approach used to be common.  We moved away from server side results processing for scalability purposes, but regardless even with server side resultset processing you weren’t automatically updated with the data changed.

    Using my example above, while I am deciding if I want a window or an isle or if it is better to have a middle seat at the front of the plane or an isle at the back, the underlying data set could be receiving numerous updates.  When I finally make my selection the dataset could be completely invalid requiring me to start the whole process again.

    While this is a very simplistic example, the issue here is the trend towards real-time in the user experience layer is not supported by the current interfacing mechanisms to a RDBMS.  While we are seeing AJAX etc being used to provide an interface which can update data in real time, underneath likely that data is still being collected from polled queries running intermittently.

    Real time & Efficiency

    One solution to this problem may be simply to run our polling cycles at such a high rate that the difference between real-time and near real-time becomes indistinguishable.  This is possible but of course, it comes at a high cost in terms of impact on scalability.

    Let me use a fictitious example to highlight this.  Imagine a Twitter like messaging system.  This system is to provide a real time like experience to their users so they set a 2 second polling cycle for all client update queries.

    For the purpose of this example, let us assume that we have 1 million users.  Those 1 million users have a different usage profiles, for this example let us assume that:

    • 50% of users get 1 message a day
    • 20% of users get 10 messages a day
    • 15% of users get 30 messages a day
    • 10% of users get 200 messages a day
    • 4% of users get 1000 messages a day
    • 1% of users get 5000 messages a day

    Ok, a couple more assumptions:

    • To poll and retrieve an empty poll requires 5 “resources” (CPU, DISK, NETWORK)
    • To poll and retrieve a message empty poll requires 50 “resources” (CPU, DISK, NETWORK)

    Now let’s compare a system which polls the database every 2 seconds with an alternative system in which messages are pushed from the database on creation to the client on creation.


    % User Base Replies per day Poll Resources Push Resources Push % of Poll
    50 1 108025000000 25000000 0.0%
    20 10 43300000000 100000000 0.2%
    15 30 32625000000 225000000 0.7%
    10 200 22600000000 1000000000 4.4%
    4 1000 10640000000 2000000000 18.8%
    1 5000 4660000000 2500000000 53.6%
    100   221850000000 5850000000 2.6%

    With the above distributions we would see that a 2 second poll time would have a resource requirement equal to 38x a push based database.  This huge overhead is obviously going to be a major overhead and a significant limitation to the upper level of scalability possible.

    So What to Do?

    I will really address the resolution path for the limitations of the RDBMS when I complete this series in my summing up post.  However specific to this issue, there are a couple of things happening which you should be aware of.

    Firstly, traditional RDBMS vendors are trying to shoehorn some form of push based results notifications into existing database platforms.  For example, SQL Server 2005 and above has query notifications and Oracle & MySQL has something similar (please post in the comments).  Current implementations are rudimentary and not suitable for large scale deployment (meant more as a global cache “refresh” event than a user specific resultset update).

    Also to watch, there are a couple of startups which have identified the real-time trend that is happening in Silicon Valley, and have also identified that existing RDBMS’s aren’t going to be able to fulfill this trend in current form.  They are focusing on re-architecting the RDBMS to be push rather than pull based.  GroovyCorp with their SQL Switch product is an organization that I have been speaking to recently.  Groovy is the furthest down this particular road that I am aware of, with a real-time push based RDBMS being launched next month.
     

  • Reblog this post [with Zemanta]

    June 11, 2009

    Graph Databases and the Future of Large-Scale Knowledge Management

    Los Alamos National LaboratoryImage via Wikipedia

    Todd Hoff has posted a link to a Los Alamos National Lab presentation on Graph Databases.  In this paper they provide a revisit on the classic RDBMS vs Graph database debate.

    The Relational Database hasn’t maintained its dominance out of dumb luck.  Instead the RDBMS has consistently outperformed while providing the most general use capability of all the variety of platforms that have been available.  Many other approaches have been tried, often these have provided better object model integration (OODBMS) or better data model representation.  But when the rubber has hit the road they have failed on one or more of the key staples of a DBMS – performance, scalability, security, reliability, recoverability & ease of use.

    Right now there seems to be more focus and traction than ever before to get it right.  Graph databases are interesting and clearly have value in solving the hierarchal abstraction problem currently encountered when modeling such structures in the RDBMS.   In other aspects they do share some similarities with the hybrid DHT’s.  I think a mix of the best of several approaches will be something interesting (of course it will have to perform extremely well and have great developer support).

    It’s such an interesting time to be in data management.

    Reblog this post [with Zemanta]

    © Tony Bain