Presentatie van Big Data Evangelist Dai Clegg (IBM): 'Babies, Buses and Movies; some examples of the value in big data analytics' tijdens het Big Data Analytics seminar 14 juni van Almere DataCapital in Almere.
Here is another example of something the University of Southern California Annenberg School of Communication did with the IBM Big Data platform’s BigSheets technology. USC@Annenburg created the Film Forecaster tool and used it to correctly predict 2011’s summer block busters based on scraping Twitter and analyzing that against a simple lexicon that described a positive or negative showing for a movie. They made quite the impact since this very solution was featured on ABC News (a national news agency in the USA).More striking is the quote: the application was built by a communication Masters student who learned Big Sheets in a day.
This picture is a little simplistic for 2 reasons:First if gives pre-eminence to Netezza. That is because Netezza’s simplicity, performance and agile support for ad-hoc analysis is often the default proposition for an analytic warehouse in a greenfield situation (though this is not necessarily true if there is an existing commitment to Power or to DB2).Secondly it does not recognise the differentiation between exploratory analysis and repeated analysis.But if you are doing exploratory analysis of relational (ie structured) data, Netezza is a better platform; it thrives on ad-hoc analysis and has very rich tooling (INZA, SPSS etc) for analytics.Clearly exploratory on unstructured is BigI, Exploratory analysis on something in between (e.g. CDRs) could be done on Netezza, but if the data is not already being loaded (and even in a Netezza customer the raw XDRs are probably not loaded into the warehouse) then exploration in a low-cost Hadoop grid makes tons of sense. We have at least one customer use case of this, where once the analysis was repeatable it was implemented in the Netezza. But there are also use cases where the repeated analysis remains in BigI, exploiting its differentiating enterprise readiness.
If it’s data in motion (remember the babies being monitored). it has to be real-time. it has to be Streams. That’s the easy one.If it’s unstructured data, at rest, the best place to start is BigInsights, though you may load data into the relational warehouse subsequently for further insight.If it’s relational data, it’s unlikely you are going to move it to Hadoop If it’s semi-structured you have a choice and you’ll be influenced by these other development factors:It may be that an organization has already developed a map-reduce solution that delivers a high value analysis for data that was unloaded from the corporate EDW.Is the right solution to say ‘great, now you know the solution, re-code it in SQL using in-database analytics and implement it on your warehouse?’ Maybe a better solution is to implement BigInsights to enterprise-harden the Hadoop environment and run the application as is, but with production applications reliability and supportability.It may be that the volume is so huge that a DWH can’t handle it and certainly can’t handle it economically (think Vestas)it may be better to go to the platform with more of the appropriate analytic skills or other development resources availableIt may be that the customer wants to build their capability in Hadoop because they will have more challenging use case later that will be clear-cut BigInsights use cases.It may be that the customer just wants to experiment cheaply and quickly (though actually that’s more a BigI Basic edition use case – we’ll be looking to enterprise harden it later)But remember they are influencers, not deciders. IBMers can adapt to whatever best matches the customer’s needs, because of the comprehensive nature of our big data portfolio.