Main | Lessons from 1400 RFID case studies »

April 27, 2005


Luke Lonergan

Some smart person once said: "There are lies, damn lies and benchmarks..." or was that statistics, oh well ;-)

We do commonly see 10-50 times speed increases compared with shared disk databases like Oracle and MySQL on real world applications. The reason for this is that our parallel query architecture uses all CPUs and all I/O channels on each host and across multiple hosts to answer each query. This architecture is similar to Teradata and is very different from Oracle, which shares disk channels among cluster hosts. It is also very different from federated databases, in that the data distribution is done at a very low level and is invisible to the application. For more, see research papers from DeWitt at UWisc: Gamma Paper

As you point out, availability is key - the more hosts you add the more chance that any single host might fail. We use host mirroring to ensure that uptime is continuous. Up to half of the hosts in the system can fail without downtime. This is built into the system - you get mirroring by choosing it on creation of an instance, or you can add it later.

WRT TPC-H, we routinely use the cases from that benchmark to prove out our scaling internally. To join the official benchmarking effort is another thing entirely. There are numerous tricks involved in obtaining fast numbers for the benchmark and the effort involved can be huge and specialized, so we avoid it. Instead we prefer to get real customer feedback like we got from O'Reilly here:

"Bizgres MPP's great performance and ability to scale to large data volumes is impressive and necessary as we acquire and analyze large and growing data sets. With Bizgres MPP, processes that used to take 10 hours now run in under 7 minutes."

That's 100x improvement over MySQL. Of course MySQL can't run in parallel over lots of machines and CPUs, so you can't compare it on exactly the same hardware. However, the MySQL work was running on the fastest Opteron server that they could find and we were running on 16 x 2 year old Intel Xeon machines. Since then Roger and team have moved to running MPP on 4 dual Opteron servers with 64 disk drives, total cost of less than $50K.

The comments to this entry are closed.