Big Data is being used everywhere. Kai Wähner mentioned a couple of examples in his talk “Big Data beyond Hadoop – How to integrate ALL your Data” (slides on slideshare):
- rackspace wanted to know from which part of the world customers logged in
- T-Mobile USA is using Big Data to monitor customer transactions and interactions to predict customer defections
- Macy’s monitors websites of competitors so they can match any price reduction per city
Anyone else getting worried by these “success stories”? How do you feel as a mobile customer that your mobile company tries to prevent you from leaving? How about using Big Data to notice bad customer service and prevent making customers unhappy? How do Macy’s competitors feel about this “monitoring”?
Anyway …
One great point was “Silence the HiPPOs” (highest-paid person’s opinion). With the ability “to interpret unimaginable large data stream, the gut feeling is no longer justified!”
Why Big Data? 3 V’s: Volume, Velocity, Variety. But don’t forget the fourth: Value (slide 8)
Before you can start analysis, you need to get the data from somewhere. That usually means integration of a foreign system (reading the data), manipulation of the data (like string to int or date conversion, etc.) and filtering (duplicates, importance, …). See slide 9.
Beware that Big Data is no silver bullet. If you have a gigantic amount of data with poor quality, that will just give you huge problems.
When planning for a Big Data project, begin with a business opportunity (slide 22). Chose the right data (don’t just import everything because you might need it), combine different sources and use easy tooling (slide 26).
Be wary of ETL tools. The network will quickly become your bottleneck.
For the actual implementation, he suggested to look at Apache Camel (slide 34) as a pure integration framework and the talend Open Studio (slide 56) as an example of an integration suite.
[…] would that work? We’re using big data for all kinds of things, tracking customer happiness, searching the Internet and discovering terrorist threats (or […]