• post-spark

    The case for Spark

    I have been following Big Data, Hadoop in particular for the past four years. A lot has changed since then. I fell into this, back at the time, because I was looking for the next big wave of technologies especially on the backend, given that was my forte at the time. I started looking into […]

    Continue reading
  • post-kafka

    Apache Kafka

    This week I want to discuss an up and coming topic specifically Apache Kafka. For a long time streaming data into Hadoop was not considered relevant due to the fact that Hadoop was batch MapReduce. In quite a few respects it is still Batch though there are several efforts to make it faster for interactive […]

    Continue reading
  • post-hbase-paxos

    HBase & Paxos

    First Facebook has been championing HBase for a long time but recently they have evolved HBase to Hydrabase. There are several reasons that they give for this. Behind some of their issues was the fact that HBase relied upon Zookeeper to failover. The Zookeeper is based on the Paxos Consensus Algorithm. Consensus is protocol whereby […]

    Continue reading
  • post-featured-hadoopworld-strata

    Can I get that “Spark” to go?

    Wow, Hadoop World has come to pass and as predicted, Spark was the main topic on everyone’s mind. Just to give you a hint on the various Spark topics Paxata announces their Adaptive Data Prediction product built atop Spark. ClearStory Data announces Collaborative Storyboards that is also built atop Spark. GraphLab announces their tool GraphLab […]

    Continue reading