I work in all markets of the database industry, from web & startup through the largest and most established enterprises. And to be completely honest, the name Ingres has not come up in conversation very much at all. 10 years ago maybe more often, but recently not all that much. But Ingres has been quietly ticking away. Despite being largely off the radar, they still have a sizable and loyal customer base, global offices and a focused & dedicated management team. And importantly they have an open source business model which actually appears to be working.
I wrote last year that their "behind the scenes" status had the potential to change. Ingres had been very clever and worked out a partnership relationship with Peter Bonzc’s Vectorwise. And that relationship was promising big things for data analytics from a price/performance perspective. But at the time it was all promise and little in the way of substance had been produced.
But that has been changing. A month or two back Ingres somewhat quietly launched their Beta program for the Ingres Vectorwise technology. This technology, if you have not read about it before, combines an analytical column store and “vectorized processing” to give much greater throughput rates than previously possible on your existing hardware (Vectorwise is a single node solution i.e. not MPP) .
And I have started hearing feedback, and it is good. Very good. While Ingres Vectorwise isn’t fully baked yet, I have heard it is producing astounding performance results in early testing. In one case I heard of <10TB real life production comparison test and Ingres Vectorwise smoked everything else they had tested. And they have tested a lot of different market leading analytical platforms.
So I think this is the start of an Ingres’s comeback. Certainly anyone looking at <10TB analytical platforms will be getting the recommendation that they at least look at Ingres Vectorwise from me. I am looking forward to seeing what 2010/2011 brings for them.
I have noticed a sharp change of focus in venture funding for data orientated companies over the last six months. Many VCs have lost some interest in funding data start ups that are doing anything around relational data management. Instead the interest is in NoSQL technologies, from key/value stores through to Hapdoop based data management layers.
I am highly supportive in the development, and therefore the funding, of a more diverse set of big data technologies than those based on the relational model alone. However I also advise caution to not throw the baby out with the bathwater. Relational data management technologies continue to be a focus of innovation. There are companies working on game changing step forwards which have relational under-pinnings.
The relational model is going to continue to be the underlying model of most of the worlds structured data for the foreseeable future. Many opportunities for innovation exist and will continue to exist around this fundamental model into the future.
A mindset that relational is yesterdays technology and non-relational is tomorrows defies conventional wisdom and will lead to great opportunities being missed.
One of my favorite terms at the moment is “Big Data”. While all terms are by nature subjective, in this post I will try and explain what Big Data means to me.
So what is Big Data?
Big Data is the “modern scale” at which we are defining or data usage challenges. Big Data begins at the point where need to seriously start thinking about the technologies used to drive our information needs.
While Big Data as a term seems to refer to volume this isn’t the case. Many existing technologies have little problem physically handling large volumes (TB or PB) of data. Instead the Big Data challenges result out of the combination of volume and our usage demands from that data. And those usage demands are nearly always tied to timeliness.
Big Data is therefore the push to utilize “modern” volumes of data within “modern” timeframes. The exact definitions are of course are relative & constantly changing, however right now this is somewhere along the path towards the end goal. This is of course the ability to handle an unlimited volume of data, processing all requests in real time.
So what are Big Data technologies?
More than at any point in the past, data related technologies are the focus of research & innovation. But Big Data challenges won’t be solved anytime soon by a single approach. Keeping in mind all the different platforms that Big Data is having an impact on (web, cloud, enterprise, mobile) combined with all the Big Data domain challenges (transaction processing, analytics, data mining, visualization) as well as many of the Big Data characteristic requirements (volume, timeliness, availability, consistency), it is easy to see how no single technology will provide a cover-all solution for the eclectic mix of needs. Instead a broad set of technologies that are each focused on meeting specific set of needs are improving our ability to manage data at scale.
A few common areas of innovation that I describe as Big Data technologies include: MPP Analytics, Cloud Data Services, Hadoop & Map/Reduce (and associate technologies such as HBase, Pig & Hive), In-Memory Databases, some Distributed NoSQL databaes and some Distributed Transaction Processing databases.
So what is the point of Big Data?
Someone asked me if Big Data was just tools to “try and sell them more relevant crap they don’t want”. While up-sell & targeted advertising are too major uses of Big Data technologies I hope that mine and others work in this field does result achievements more significant than just these.
When describing the point of Big Data I like to think about how the Internet has changed my life in general. By having unlimited & timely access to information we are now better informed in all areas of our existence than ever before. However, we are now facing the problem that there is fast becoming too much data for us to digest in its raw form. To move forward in our understanding we will need to rely on technology to provide timely, summarized & relevant data across all aspects of our lives. This is what those working in Big Data are setting out to achieve.
Next year will be the start of much more difficult times for the existing MPP start ups/ early stage companies (including Greenplum, Vertica, Netezza, Xtreme Data, Kognitio, Aster Data etc). This is because Microsoft introducing an MPP solution is the start of the commoditization of the technology and market (Madison now known as Parallel Data Server). To understand this you need to understand the sales process for MPP. It goes something like:
CIO: We need a data warehouse, what platform should we use? DBA: We are an [Oracle | SQL Server] shop so use that. CIO: Ok.
Some time later….
CIO: Our data warehouse is very slow and people are complaining. DBA: The server is too small as you have loaded much more data than planned. We need a bigger box. CIO: Ok.
Some time later….
CIO: Our data warehouse is slow again DBA: I know but we have the biggest box we can get and we have tuned everything and I am out of ideas.
CIO: Our data warehouse is slow Consultant: Yes of course it is, you need to use an MPP platform CIO: We are an [Oracle | SQL Server ] shop so do these vendors have a solution? Consultant: [Yes but it will cost you | No]. CIO: What about [SQL Server | Oracle ]? Consultant: [No | Yes but it will cost you]. CIO: What about Teradata? Consultant: Yes but it will cost you. CIO: Oh. Any other options? Consultant: Yes there are a bunch of start ups selling MPP solutions. CIO: Which one is best? Consultant: They are all good but all slightly different. CIO: Ok, make a short list and we will do a proof of concept to see which platform does what we want at the price we want.
Some months later.
CIO: Congratulations [Vertica | Greenplum | Netezza | Aster Data | Kognitio ] you have won our business.
You see the problem in this approach for the existing MPP vendors is much of the trickle down that is occurring now is going to be caught higher up by the shear fact that Microsoft has MPP. This must be a big worry and I think we will see some consolidation of MPP vendors before 2012.
As I have mentioned before, the MPP data warehouse space is quite full with many new companies appearing over the last few years. The trick for the newer entrants of course, is to differentiate themselves from the herd to overcome their lack of history and experience.
Aster Data has started to do this with the release of their v4.0 platform. They are now promoting their focus as being on “Big Data Applications” rather than the more generic Big Data Warehousing. This seems to have entailed a rethink about how they were positioning their in-database Map/Reduce functionality (which was obtuse in definition for me at least) and they are now marketing their in-engine code executing capabilities in a much clearer way. That is, to allow the push down of application logic into the MPP environment making Aster Data an MPP Data Application Platform rather than a just a MPP Database Platform. While this may largely just be a change in marketing and semantics (and a new logo), I do think this helps to make Aster stand out and offers them a more unique go to market.
I have yet to look into the details of this, but in theory at least moving higher level application components down into the MPP environment would seem beneficial from a performance and robustness perspective. Interestingly, Teradata has recently been working with SAS to move parts of their analytics stack down into Teradata’s stack.
No specific announcements from Ingres other than I think the VectorWise stuff is progressing well.
To me Ingres is a bit of a dark horse. They are open source and doing reasonable revenues. And they are active in the enterprise market (something MySQL hasn’t really achieved). But they remain largely off the radar in commentary surrounding the DBMS industry.
My personal pick is this will start to change during the second half of next year. Several things happening in the market (Oracle’s eventual acquisition of MySQL being a major one) and some things they have happening internally (VectorWise being a major one) I think will help to start to propel Ingres back into the RDBMS spotlight, especially in the enterprise.
VoltDB
It sounds like VoltDB is getting closer with some talk of being able to see an early version of the product soon.
VoltDB will be an interesting case to watch. VoltDB (Vertica’s “sister”) is a lightweight DBMS optimized for large scale transaction processing. I don’t know which bits of the architecture they are ok for people to talk about yet so I won’t go into detail on that. But regardless of the technology, VoltDB should be watched because of their transaction processing focus. Many analytics DBMS vendors have entered the market over the last few years, but few transaction processing alternatives have set up shop recently. This is for a few reasons, one major on being the transaction processing market is such a tough nut to crack.
It sounds as if VoltDB has been bootstrapped with funding help coming from a company who is involved in the stock market. Certain areas of FSI obviously have “niche’s” that require high end distributed transaction processing, which is precisely where I am sure they will find their early traction. But what will be interesting is if they can break out of this niche and start to engage the wider ISV community. The go to market will be much different and much more difficult than what they have seen with Vertica. But will luminaries like Stonebraker leading the way, who knows they may make a dent.
They funny thing with Michael Stonebraker is most of the companies or institutions he is involved with that I speak to, say that he is spending most of his time on "their" project. I am actually starting to doubt there is one Michael Stonebraker and suspect cloning may somehow be involved…
IBM DB2
I spoke to IBM a few weeks back when they announced their DB2 PureScale technology. PureScale is actually quite exciting. But they chose to announce it around the time of Oracle OpenWorld and press attention was largely drowned out but, among other things, Larry’s persistent bagging of IBM.
IBM DB2 PureScale is a technology solution which provides shared-disk clustering for DB2 on IBM Power Systems. New nodes can be added online (a traditional problem for shared disk clustering), and node failures will not see new requests fail as they will be transparently routed to other available nodes (although I believe in progress transactions will fail). This is done using the hardware architecture of the Power Systems, and also done in a way that doesn’t require any application code changes.
However, on a different note, is it seems part of IBM’s strategy for gaining customers from Oracle is to make DB2 more compatible with Oracle. They say imitation is the greatest form of flattery so I am not sure if IBM is paying Oracle a huge compliment here? But more seriously, my concern about this strategy is I believe Oracle is very much in aware of, and in control of, their wins & losses and can put in preventative measures when they so desire to block any major hemorrhaging. IBM, I don't think you want to put too much focus on chasing Oracle's cast offs. DB2 is also good in it's own right and you need to do a better job of showcasing the platform to ISV's if you want to retain your pride of place.
Although, at least this may allow ISV’s to more easily support DB2 alongside Oracle.
XtremeData
XtremeData is yet another vendor to enter the MPP analytics space. XtremeData is worthy of note because their product is built upon their unique FPGA. Unlike other FPGA’s I have seen, I understand that theirs plugs into a spare CPU socket in the server. The FPGA can then provide pushed down data streaming operations on data at rates available to the CPU bus (instead of the PCI bus other some other FPGA approaches use). Although I haven’t seen any benchmark data yet for what this translates into.
When I spoke to XtremeData their focus seemed to be very much on the very high end. Large deployments of many nodes, in many racks, handling many hundreds of TB (or PB). As I have spoken about before, the MPP space is very busy right now. Most of the companies are naturally focusing on the mid-range MPP needs, so maybe focusing on the very large end is a smart way to differentiate. This of course may change as they ramp up and I will be curious to see if there actually is a sustainable market at this very top end.
NoSQL
There has been a lot happening in the NoSQL technologies (Mongo, Cassandra, Voldemort etc) which I will comment on in other posts. But an annoying thing, which can sometimes happen with community open source initiatives, is the level of infighting and bickering has been rising steadily. And this is not even on important technological decisions. An example, a lot of the bandwidth of the NOSQL mailing list is debating what to call themselves (which degraded into personal attacks and name calling at one point). NoSQL vs many other things, and even what the definition of NoSQL is. This really highlights to me the importance of the commercialized organizations surrounding this technology to keeping providing the necessary beacons to focus on and more this initiative forward.
GoodData has launched and they are providing a cloud based analytics platform for use in integration with online apps. Starting with some initial focus on SalesForce data, but working hard on expanding the list of ISV’s who choose to provide their customers analytics via GoodData.
GoodData was started by “good guy” Czech serial entrepreneur Roman Stanek (NetBeans) and has just raised funds from Andressen Horowitz and appointed Time O’Reilly to the board. GoodData is interesting because it is simple, accessible and available on demand. Still early days but think Roman is on to another winner here. Certainly recommend any ISV building cloud based apps to look at their platform.
Mark Logic
I was keen to learn more about Mark Logic as I didn’t understand their products in any detail. David and Ron were more than obliging and I sat down with them last week for a run though.
In short, I am impressed by the technology of Mark Logic. It is a database that uses XML as the schema data model and XQuery as the primary query language. But it is far more than and XML extension bolted on top of a traditional db engine (such as some of the XML capabilities in the more traditional RBDMS vendors). Internally Mark Logic has all the important DBMS components but they are designed and optimized around the XML schema (query processor, indexing etc) from the ground up. I also understand they have distributed multi-node capability, something which is still quite rare over in the general purpose RBDMS world.
Mark Logic has a history in the content publishing market, as you would expect, because much “published” data is naturally represented in XML. I did sense the team at Mark Logic is keen to break away from this niche a little (while at the same time respecting that this will likely remain their primary market). Exactly how they go about this isn’t entirely clear to me as the world has kind of moved on from the “XML for everything” excitement that existed in the early 2000’s. There will be plenty of case-by-case requirements, but a piecemeal market is hard to drive business development. But publishing remains a clear staple and I am sure they can leverage this into a few more.
I did get somewhat excited when we were talking about serializing JSON in and out of Mark Logic. This is very topical in the web app market as we see a push towards client based web applications and web service dishing up JSON. But this is not necessarily a money spinner as there are “free” offerings servicing this need already (CouchDB, MongoDB etc). I understand Mark Logic is proprietary license so it might be hard to gain traction here.
Kognitio
I spoke briefly with Kognitio a couple of weeks back. I hear very little about Kognitio so I was keen to speak to them about their progress. Kognitio is a UK based company and provides a data warehouse appliance, while only launching in the US last year they have a much longer history in the UK.
Kognitio seems to be taking an alternative approach to achieving growth than the one many of the US vendors are using. While most of the US companies are venture backed and are pushing hard to gain market share, Kognitio on the other hand is privately backed and seems to be taking a slower and more methodical approach. This has obviously served them well in the UK but it will be interesting how that plays out into the highly crowed, highly competitive US data warehousing scene. It may turn out to be a true test to see who really does win out of the tortoise and the hare.
Infobright
The big news at Infobright is that Miriam is no longer CEO and she has been replaced by a temporary CEO, board member Mark Burton. I spoke with Mark a couple of days ago and the reasons cited were around future direction and the next stage in the company’s lifecycle etc. They are still sorting this all out and expect to be ready to start discussing their new direction in a few weeks. In saying that, when we spoke I got the feeling their positioning will still very tied to the MySQL customer base, something I tend to disagree with. But it would be premature to speculate and instead will wait to further information is available.
Here is a summary of the key discussions I have had over the last month. Keep in mind, I’m no analyst. This is largely opinion based on various conversations I have had with the relevant companies (for analyst insight see Curt Monash).
KickFire
I think Kickfire has been doing it a little tough lately. The difficulties in a startup launching a hardware appliance (and associated logistics) combined with being too focused on the MySQL customer base has impacted the growth of this interesting start up. But they aren’t taking it lying down and have adjusted the strategy and have added a new appliance to the range. Kickfire now seems to have a stronger focus on the enterprise and has released a larger version of its appliance to provide a growth path. As I have said all along, the MySQL aspect of their product is interesting but the solution as a whole is much more interesting and has much broader appeal than just the current MySQL customer base.
Flipping hardware appliances is a much tougher play than software only solutions, partly due to it being much more difficult for customers to get their hands on your stuff and have a play before they buy. Hopefully Kickfire has mitigated most of these issues now though their online, on demand evaluation host. I haven’t yet played with this but it is on my list of things to do over the coming month.
Kickfire’s enterprise strategy is just one of many that will be re-enforced by an Oracle acquisition of Sun.
Greenplum
Greenplum has addressed a perceived chink in its amour with the release of its column store capability. Greenplum has taken the popular hybrid approach which means on a case by case basis you can decide if a particular table should be row or column orientated. But as Daniel points out, it is a storage level only solution. The storage only approach brings just part of the benefit of columnar stores, to achieve the full benefit the query execution engine needs to be aware of this layout (so features such as lightweight compression can be effectively used). But I am sure this is an area where Greenplum will make further improvements in the future.
Groovy
Groovy has been working hard carving out its niche in the real time web data market. If you don’t recall, Groovy makes an in-memory RDBMS that has been extended to provide real time data streaming capabilities. Groovy has been positioning this into the large web properties who are working on creating new large scale, real time applications for their user base.
Aster Data
Aster has put out a number of announcements over the last month and I am trying to keep up. Firstly they announced their tight integration with Hadoop. This integration with Hadoop is map-reduce on the outside of the Aster Data platform (which apparently they didn’t have already although I think everyone assumed they did given their strong in database map-reduce message). Aster has been banging the map-reduce drum for some time and is clearly the point of difference they are focusing on.
Aster has also release version 4.0 of their platform a couple of days ago, then a few days ago I was a bit surprised to see an email from them referring to their platform as “the World's First Massively Parallel Data-Application Server”. This seems to be a new name reference to the in database map-reduce stuff, maybe as an effort to differentiate themselves from the myriad of competitors in this space they are trying to carve out a new category all for themselves. For me, the external map-reduce stuff makes sense as I can see this
being useful for data preparation on the way in to Aster and data
dissemination of data on its way out of Aster. But I still don’t have
in my head clear examples when their in database map-reduce stuff is
useful. I am sure it is but I have a feeling it is valuable on a case
by case basis which is difficult to articulate especially as a point of
difference message. But I missed Curt’s map-reduce webinar (at the
last minute) so maybe that would have shed some light. Anyway, they are running a webinar on this which you can register for here.
To me, Aster is more aggressively driving their platform into green fields trying to leverage their technology to find new customers and new markets. Greenplum on the other hand is more ‘steady as she goes’, focusing on a more traditional and conservative enterprise data warehousing market (while still innovating ahead of the general purpose behemoth's). The risks are on both sides. When trying to define a new market you risk not finding one or finding one that is too small or “niche” to support your business. With the conservative approach you risk being lumped in with everyone else, and in data warehousing ‘everyone else’ is now quite a long list.
I was speaking with Michael Stonebraker this morning. I mentioned that lately many have been referencing comments he has made over the last couple of years. And I also mentioned that many had interpreted them as he was implying the RDBMS is “doomed”. Mike has been saying the same thing for years, but the current NoSQL movement seems to have picked up on this and highlighting one of the RDBMS's own pioneers is predicting its demise.
I asked Mike to clarify this. My interpretation of his response is as follows. I understand that he doesn’t believe the relational database itself is doomed. Instead the current general purpose implementations, or “elephants” using his words, were out of date. By moving away from a historical GP function into something more specific in focus, either in transaction processing or analytics, you can easily get 50x performance improvement over GP RDBMS. This doesn’t necessarily mean moving away from the “relational” nature, but instead changing some core design principles for how a RDBMS is implemented. It is this improvement factor that will see “new” specialist platforms overtake “old” general purpose platforms. That is gradually, over time. However Mike also mentioned the relational data model doesn’t make sense in a number of disciplines, particularly in sciences, and alternative modeling paradigms will offer benefits to this market (hence his focus on SciDB). So while relational is a valid data model, other data models are also needed.
I have a similar position to Mike, but perhaps with a few differences.
- Firstly I agree with the mantra that current GP RDBMS platforms provide only a “middle of the road” capability, and we gone too far in using a GP RDBMS for everything. However I do believe there is a long term future for the GP RDBMS. A general purpose application requirement will continued to be well suited for a general purpose platform. With a specialist only approach, a general purpose requirement may need both a specialist OLTP platform and a specialist Analytics platform to provide the same capability.
- I agree that with an extreme requirement, either analytics or transaction processing, a specialist platform is well suited. But I don’t see the choices of just MPP or memory resident RBDMS as being a broad enough set. Apps that use a db just as a persistence cache will benefit from a high performing, scalable database platform with much tighter integration with the object model. I am not sure any of the current NoSQL platforms have it quite right yet, but when these guys eventually get together with the database guys and work on these things together they may get there.
- I don’t think a 50x performance speed up on its own is enough to drive change in OLTP. I have written before how difficult it is to get into this market and how tight Oracle, Microsoft & IBM have this sewn up. But I don’t believe it is impossible, I think you just need to bring slam dunks on multiple fronts (performance just being one of them).
Anyway I feel like I am a bit of a broken record at the moment. I have been addressing the “is the RDBMS doomed” question a couple of times a day for some time. Time to focus on something else for a bit.
I was fortunate enough to speak with Marcin Zukowski earlier about VectorWise. If you missed it, VectorWise came out of stealth mode a day or two ago. The have announced a joint partnership with Ingres and essentially are claiming impressive analytic RDBMS performance gains on conventional hardware.
To start with, a key message that I think needs to be communicated here is that this is not a product announcement. Ingres and VectorWise have announced a partnership in which they of course plan to build products together, today those products are still in the works.
VectorWise is a spin out of CWI based on research that was undertaken by Marcin and others, research that centered on MonetDB. Explaining the essence of VectorWise is difficult because it is largely internal DBMS data storage & processing logic, but I will have a go.
The modern RDBMS is based around design principles that stem from general purpose OLTP roots and historical hardware architectures (this is partially true even for some of the newest analytic platforms). These design principles in a nutshell focus on the fact that disk is slow & CPU is fast. Data is seeked or partially scanned off disk and cached. Row-by-row (tuple-by-tuple) operators process that data, passing the outcome of each operator to the next as part of a queries execution plan until ultimately producing the result.
Traditionally I/O is the main bottleneck, so to make the database faster you add more I/O bandwidth. Today, disk requirements may be up to 100x the actual capacity needs, so many disks are necessary to achieve the I/O bandwidth to provide performance for an analytical RDBMS implementation. Even though the RBDMS’s may parallelize query operators across cores, this typically works by partitioning data between cores, yet each is still processing on a tuple-by-tuple basis.
Conventional wisdom? Well maybe. You see disk is only really “slow” when it is doing random seeks. Give a disk something sequential to do on the other hand and things are very different. Modern disks are able to sequentially scan in the range of 150MB per second. An array of 10 disks should therefore be able to return sequentially read data in the range of 1GB per second.
When it comes to databases, column based storage has been found to effectively structure data for a) high levels of compression and b) sequential access. VectorWise makes use of both of these technologies to help it achieve high levels of sequential I/O. The problem now however is that disk may no longer the bottleneck. While we can get 1GB a second sequentially off disk relatively easily & cheaply, processing tuple-by-tuple at this rate is very difficult. As it turns out, a RDBMS’s may only achieve a data processing rate of 50MB a second per CPU core. This makes the CPU processing limitations a big bottleneck for analytics data sets, assuming the above figures we would need over 20 cores to keep up with 10 disks (and of course CPU cores don’t scalability linearly).
If we step out of the database world for the moment into the world of high end computer games, or high end scientific processing, we find their use of current CPU technology is much more advanced than what we are used to. They are using new CPU extensions (MMX, SSE, SS2, SSE4.2 etc) to parallize & pipeline computation within a CPU’s core meaning they are processing orders of magnitude more instructions per core that what a traditional RDBMS typically has been able to. The exact details are too low level to discuss here (many of the research papers are available online) but it is fair to say, modern CPU architectures contain advanced features that to date haven’t effectively been exploited by database vendors.
Enter VectorWise. Their aim is to marry storage technologies which allow high levels of sequential I/O to occur with query processing logic which is designed for modern CPU architectures. Rather than process tuple-by-tuple they are processing “vectors”, groups of tuples, leveraging modern CPU extensions and high levels of on-chip cache to allow the CPU to carry out higher data processing throughput. The result is instead of the 50MB a second in a tuple-by-tuple approach, VectorWise are able to achieve processing rates in the range of 500Mb-1GB a second per core in some situations. This means processing rates of 8GB a second or more could be possible with relatively low end hardware.
“In some situations” is the key point to stress here, this obviously isn’t a blanket gain that applies to all analytic data sets, workloads and query requirements. Just what those situations are will be the key to their technologies success, how well it actually applies to real world data sets and queries. I wouldn’t expect to see too many specific examples on this until a product beta appears. But the theory is VectorWise can offer high levels of processing capabilities with existing mainstream hardware. At this point VectorWise isn’t even focusing on MPP instead they are single node focused. If their scalability claims pan out you can imagine how this could allow a single node solution to be competitive with existing low to mid scale MPP solutions that are based on a more conventional query processing architecture.
This isn’t VectorWise’s only trick up their sleeve. They are also are leveraging research around column based storage, compression, piggy-backed (shared) scans and so on. Much of the research that has been adopted by VectorWise is referenced from their web site.
So VectorWise have impressive technology, so why then partner with Ingres rather than a larger vendor (or going at it alone)? Marcin offers a few reasons. Firstly, as academics they feel strongly that open source is cool so this path was greatly preferred over a relationship with a non-open vendor. Secondly Ingres will allow them to deliver their technology in an uncompromised fashion. Marcin mentioned that if they had partnered with one of the big three vendors, that vendors existing product strategies and investments would have likely meant their ideas could have only been implemented in partial form. Ingres on the other hand is going to allow them more of a green field. And of course, a partnership with Ingres makes sense from a go to market perspective as Ingres already has a worldwide reputation, a global customer base, sales & marketing capabilities etc.
Marcin confirmed that Ingres have an exclusive license to their technology, and first option to acquire them for a certain period of time. This allows Ingres to really invest in the relationship without the fear of the carpet being pulled out from under them.
VectorWise clearly are applying innovative research to analytical RBDMS requirements. But as interesting as the technology sounds, the proof in the pudding will be how well these design principals translate to real-world analytical processing requirements in mainstream product form. This remains to be seen, but Ingres and their community clearly has high hopes.
VectorWise is clearly differentiated when comparison with a traditional mainstream RDBMS running on mainstream hardware. However in this current market we have lots of different approaches to the problems described. Kickfire for example use their own SQL Chip processor to increase data processing rates and other appliance vendors are using FPGAs etc for similar purposes. The comparison of these different approaches and the relative effectiveness of each approach still need to be examined, however a mainstream hardware approach has obvious benefits.
Tony Bain is an expat Kiwi, Father, Entrepreneur, Angel Investor, Blogger, and occasional Writer for Read Write Web. He is a Director for RockSolid SQL and the founder of Tony Bain Group.