2. Session #1
• Introduction
• What is Big Data
• Big Data vs. BI
• Big Data from Past to
Future
• Big Data Common Use-
Cases
To Complete Later …
3.
4.
5. • Gartner is an American information and technology research and
advisory firm
• The “hype cycle” is a conceptual framework for understanding how
technologies move from initial invention to widespread application.
• path is simple: whenever a new technology comes along, it usually
gets hyped to the point of inflating expectations about how much it will
revolutionize your life, then reality will sink in and we’ll all be
disillusioned by the unfulfilled promises, after which it finally rises to a
6. • V3 Model from Gartner
• Variety
• Structured, semi-structured and non-structured data
• (non-structured i.e.) emails contain communication patterns of
successful projects
• Most of this data already belongs to organizations, but it is sitting
there unused — that’s why Gartner calls it dark data
• Velocity
• It is frequently equated to real-time analytics
• The concept of “Analytics in motion” vs. “Analytics at rest”
• Volume
• Just imagine that EVERY SENSOR PRODUCES DATA.
• Big data sizes are a constantly moving target, as of 2012 ranging
from a few dozen terabytes to many petabytes of data in a single
data set. (Wikipedia)
7. • Using Big Data doesn’t mean getting rid of Business
Intelligence
• However; Business Intelligence is part of the Big Data
analytics
8. • Extracting click stream data that records
every gesture, click and movement made
on a web site
• Doing performance analytics and
optimization on transactional database
cluster log files trying to find out where
small optimizations on correlated
activities across separate servers.
• Web Analytics
• Using log files generated from network
element managers (of big voice and/or
data networks) to automatically deduce
changes in network inventory and
network topology dynamically.
10. • What customers are saying about you
and your competition
• How sentiment impacts the decision you
are making and the way your company engages
• The effectiveness and receptiveness of your marketing
campaigns
• The value of these data are maximized when relating the social
media analytics back to data inside your enterprise
• IBM big data system for this purpose “Cognos Consumer
Insights (CCI)”
11. • A company decides to offer coupons to individuals based
on their location and other characteristics and also
monitors how successful the campaign is.
• The following steps outline how the process works. The
next figure, relates each step of the flow with relevant
architecture components.
• We are considering a telecommunications solution so the
architecture is modified to reflect data sources relevant to
a telecommunications environment.
12.
13. 1. Data from various sources are collected. In this case, we assume social media
data, customer loyalty data, web log data that indicates how the users
interacted with company sites, customer's location data, and customer profile
data that the company has about the customer.
2. The preceding data goes through an extract-transform-load process in the
appropriate ETL tool if necessary. In many cases, the data can be loaded as is.
3. The data is stored into the appropriate repository according to whether it is
structured or unstructured data.
4. More processing by entity resolution tools can occur to this data to provide a
complete user profile. This profile gives a complete view of the user that is
based on the different sources that are described in step 1. For example, the
profile links customer loyalty data to how the customer interacts with the
website so you know what this customer bought and what the customer might
be interested in buying).
5. Appropriate predictive models are created from the customer profile
information and location information. These models can determine users'
movement habits, where users hang out, who they hang out with, and more
details to better segment the target market.
6. Appropriate campaigns are created in the campaign management system for
the target market segment that includes the marketing channel and message
for each channel.
14.
15. 1. The location data is obtained and processed by stream in
real time.
2. Real-time analytics is performed on the data to determine
whether to send a coupon to this customer. This step invokes
the predictive models in real time and receives a score to
determine whether the customer falls within a target
segment.
3. If the customer falls within the target segment, the campaign
management system determines what message to send and
a coupon is sent through the appropriate channel. Examples
of channels include mobile, social media, and web.
4. The real-time data is stored in the appropriate repository for
future historical analysis.
5. Feedback indicates whether the customer accepted the
coupon.
6. The models are continuously refined based on the success
of the campaign.
16. • Used in British Telecom and
similar system is created in
Etisalat Egypt by E///
• Root Cause Analysis –
Finding which device is
responsible for occasional
flood of Alarms
• Short – Term Fault
Prediction – predict which
device will fail in next 15
minutes
• Long – Term Anomaly
Detection – detect unusual
trends in the network
17. • A typical oil drilling platform has 20,000 to 40,000 sensor
• Only 5 – 10 percent of these data are actively used
• Wind turbines placement problem uses very large
amount of environment data (i.e. temperature, humidity,
pressure, .. Etc.)
• Every smart meter produces several readings per hour