This was, in my opinion, a lazy article from Graeme Philipson:
Graeme has written an article on a new Netezza product after returning from a recent conference, however in his article makes a number of fundamental issues unclear:
"Database vendors such as Oracle and Sybase have worked with hardware companies to develop hybrid systems that use hardware to make databases run better, but these approaches have never been totally satisfactory, because they are still based on traditional DBMS technology".
RDBMS, for the last 30 so years, been the only viable storage and management platform for large volumes of structured data, and today this is still the case. The Netezza product described in Graemes article is a RDBMS.
“Many systems built in this way can get hideously complex, with hundreds of tables. They can also have millions of records. The bigger they get, the harder it is to get at the data, and the longer it takes.”
No, not true. If you are aggregating or reporting a large volume of data then yes it does longer to process a big set of data than it does a little one. But accessing data in a large database doesn’t necessarily take longer than it does in a small database, that is what indexes are for. I know of billion row tables where queries process in milliseconds, on traditional RDBMS.
“In the 1980's, the now-defunct British company ICL invented a thing called CAFS, which stood for content addressable file system. CAFS vastly improved query speeds by putting some of the database logic into the disk drive that held the data.”
CAFS became obsolete because the server memory prices dropped, meaning the main memory of a server could be used as the primary data cache. Most OLTP databases achieve over a 99% cache hit ratio meaning physical I/O reads are infrequent and the need for a disk moderator of I/O was mitigated.
“Like Teradata, Netezza uses parallel processing”
Like Teradata, Oracle, DB2, Grenplum, Informix… most major RDBMS vendors....
“it also uses field-programmable gate arrays (FPGAs) to perform much of the processing”
Great, the guts of what the product difference is. From what I understand, Netezza is a self contained unit (data appliance) that has a set of chips (FPGAs) on each disk that filters/aggregates data before it enters the I/O stream.
This is a cool technology. My point is that it is cool not for the reasons that Graeme alludes to, but because it is proactively avoiding much of the bottleneck, I/O load (and as an added benefit reducing CPU load), when aggregating large volumes of data (sure there is the filtering argument as well which holds some water when you have a warehouse with lots potential filter attributes, and truly ad-hoc queries making it difficult to index effectivley). And sure, some data warehouse requirements will benefit from this.
There are also a bunch of alternatives (OLAP for a different approach to the same issue, DataAllegro for a similar approach to the same issue) which also need to be considered.