Big Data Technology Explained-Part 3: Other Big Data Tech

In the final part of this series, Interset's Rob Sader explains additional technologies that companies are using to drive big data programs.


Data Breaches

In my previous posts on big data, we talked about some of the base technology tools that are in use today by companies all over the world to drive their big data programs. We talked about the Hadoop ecosystem and a few of the projects or tools that have become common in commercial use cases. Things like Hbase, Hive, Impala, Storm, Spark, etc.

However, we can’t limit the big data world to just the world of Apache Hadoop. There are dozens and dozens of other technology tools, frameworks, platforms, and applications that have been developed over just the past 5 years that drive real value for organizations. Take a look at the chart below and you can see that there are a ton of players. For this post, I will only dig into a few of them with which I have seen real value be generated by the folks I work with every day.

As I said, many companies are making a play in the Big Data arena. I think there are still lots of opportunities to build more useful apps, but that is for another post down the road.

Out of these many dozen companies on this graphic, I want to call out three.

Elastic:

Formerly known as ElasticSearch, Elastic is an open source indexing and search tool. Similar to Apache Solr, Elastic is used by companies to take documents, chunks of data, or even individual log events, index them, make them searchable, and then purge when space is needed. The beauty to me of the Elastic platform is not just the search mechanism, but also the other tools that have been built to enhance the user experience in using Elastic. In particular, I call out Kibana as a great tool that sits on top of Elastic and makes it very easy to find data that you are looking for.

NiFi:

When it comes to the world of big data, almost nothing is more important than actually being able to easily and quickly move data from one place to another. And not just move the data, but securely move it, at scale, and have the ability to recover it in case something happens in transit. This is where Apache NiFi shines. NiFi has been picked up with incredible speed by some of the largest companies in the world to fill the gap that they all have in moving their data more effectively around their organization.

Neo4j:

As we start to move to a world that is based more on relationships and networks, graph databases start to become more important. That is what Neo4j is all about. Every company, whether they like it or not, over the next few years will need to start connecting the dots that exist around their customers, partners, suppliers, etc. Doing this with traditional databases is almost impossible and there are not any really great tools out there within the common frameworks that make graph databases a possibility, besides Spark. So, I believe we will begin to see real growth in this area. It should be an area to keep an eye on for new, more user-friendly solutions.

This post wraps the series on big data technology. Again, my goal was not to go toe-to-toe with all of the architects of the world on what big data technology really is or how it works. My goal was to help business teams get just enough of the detail about this technology that it helps them make more informed decisions with their internal and external technology partners for their big data programs.

Rob Sader is VP of Business Development Interset.