In my first post in this series I gave an overview of ParStream and their product.
In the second post I gave an overview of the Analytic Database Market from my perspective.
In this post I will briefly talk about how vendors are positioned and introduce a simple market segmentation.
Please note that this is not exhaustive, I've left off numerous vendors with perfectly respectable offerings that I didn't feel I could reasonably place.
The chart above gives you my view of the current Analytic Database market and how the various vendors are positioned. The X axis is log scale going from small data sizes (<100GB) to very large data sizes (~500TB). I have removed the scale because it based purely on my own impressions of common customer data sizes for that vendor based on published case studies and anecdotal information.
The Y axis is a log scale of the number of employees that a vendor's customers have. Employee size is highly variable for a given vendor but nonetheless each vendor seems to find a natural home in businesses of a certain size. Finally the size of each vendor's bubble represents the approximate $ cost per TB for their products (paid versions in the case of 'open core' vendors). Pricing information is notoriously difficult to come across so again this is very subjective but I have first hand experience with a number of these so it's not a stab in the dark.
SMP Defenders: Established vendors with large bases operating on SMP platforms
Teradata+Aster: The top of the tree. Big companies, big data, big money.
MPP Upstarts: Appliance – maybe, Columnar – maybe, Parallelism – always.
Open Source Upstarts: Columnar databases, smaller businesses, free to start.
Hadoop+Hive: The standard bearer for MapReduce. Big Data, small staff.
SMP > MPP Inflection
Still with me? Good, let's look at the notable segments of the market. First, there is a clear inflection point between the big single server (SMP) databases and the multi-server parallelised (MPP) databases. This point moves forward a little every year but not enough to keep up with the rising tide of data. For many years Teradata owned the MPP approach and charged a handsome rent. In the previous decade a bevy of new competitors jumped into the space with lower pricing and now the SMP old guard getting into MPP offerings, e.g., Oracle Exadata and Microsoft SQL Server PDW.
Teradata's Diminished Monopoly
Teradata have not their lost grip on the high end however. They maintain a near monopoly on data warehouse implementations in the very largest companies with the largest volumes of 'traditional' DW data (customers & transactions). Even Netezza has failed to make large dent into Teradata's customers. Perhaps there are instances of Teradata being displaced by Netezza; however I have never actually heard of one. There are 2 vendors who have a publicised history of being 'co-deployed' with Teradata: Greenplum and Aster Data. Greenplum's performance reputation is mixed and it was acquired last year by EMC. Aster's performance reputation is solid at very large scales and their SQL/MapReduce offering has earned them a lot of attention. It's no surprise that Teradata decided to acquire them.
The DBA Inflection Point
The other inflection point in this market happens when the database becomes complex enough to need full time babysitting, e.g., a Database Administrator. This gets a lot less attention than SMP>MPP because it's very difficult to prove. Nevertheless word gets around fairly quickly about the effort required to keep a given product humming along. It's no surprise that vendors of notoriously fiddly products sell them primarily to large enterprises where the cost of employing such an expensive specimen as a collection of DBAs is not an issue.
Small, Simple & Open(ish)
Smaller businesses, if they really need an analytic DB, choose products that have a reputation for being usable by technically inclined end users without a DBA. Recent columnar database vendors fall into this end of the spectrum, especially those that target the MySQL installed base. It's not that a DBA is completely unnecessary, simply that you can take a project a long way without one.
MapReduce: Reduced to Hadoop
Finally we have those customers in smaller businesses (or perhaps government or universities) who need to analyse truly vast quantities of data with the minimum amount of resource. In the past it was literally impossible for them to do this; they were forced to rely on gross simplifications. Now though we have the MapReduce concept of processing data in fairly simple steps, in parallel across a numerous cheap machines. In many ways this is MPP minus the database, sacrificing the convenience of SQL and ACID reliability for pure scale. Hadoop has become the face of MapReduce and is effectively the SQL of MapReduce, creating a common API that alternative approaches can offer to minimise adoption barriers. 'Hadoop compatible' is the Big Data buzz phrase for 2011.