In this blog post, I discuss a (very) high level process for designing a Hadoop & HBase based system. Since SQL based solutions are what people are most familiar with, I will start out discussing how things would be designed in a relational manner and then talk about how the NOSQL solution differs from this. This seems to be the norm when discussing NOSQL solutions.
Software evolves around few operations, events, business elements, and their interconnections. Lets call them software basic elements. The key to software design is where you start (which element) and how you grow. Its not just about code reuse rather is about artifact reuse i.e. reuse of these basic elements.
In data analytics, incremental processing for the aggregation is very important. When we want to serve real time data, we can not run over the old data and newly added data to calculate the overall aggregate. This makes incremental processing the first priority for real time data analytics. This in turn requires processing over the structured dataset. In this article I will try to compare some of the MPP(Massively Parallel Processing) architectures in this light.