1.0: History of Big Data
Big data is a relative term describing when the data in an
organization is to be stored and managed by timely decision
Time Data Generation Processing
Modern times User generated
using servers)Recently System generated
Big data generated by user and system are
Traditional Data Big Data
Finances Audio and Videos
Stock Recording 3D Models
Personnel Files Simulation
Big Data represents the way this information is
analysed to help open Opportunities.
A deep need exists for the structure to parse the data to
separate out the unwanted and find the useful threads
to uncover opportunities.
4 V’s of BIG data
Volume :vast amounts of data generated every
Velocity:speed at which new data generated moves
Variability :messiness or trustworthiness of the data. It
means inconsistent data flow with periodic peaks.
Variety :different types of data we can now use.
From classifying big data to choosing
a big data solution
Defining a logical
atomic patterns for
big data solutions
to use for big data
Choosing a solution pattern
for a big data solution
Determining the viability of a
business problem for a big
Selecting the right products
to implement a big data
Mappers and Reducers
Map-Reduce job =
- Map function (input->key-value pairs)+
-Reduce function(key and list values->output).
Map() procedure (method) that performs filtering and sorting.
Reduce() method that performs a summary operation
NATURAL JOIN- MAPPING
Join of R(A,B) with S(B,C) is the set of tuples (a,b,c).
Mapper need to send R(a,b) and S (b,c) to the same reducer, so they
can be joined there.
Mapper output:key=B-value,value=relation and othe component (A
R(1,2) —> —>(2,(R,1))
R(4,2) —> —>(2,(R,4))
S(2,3) —> —>(2,(S,3))
S(5,6) —> —>(5,(S,6))
There is a reduce for each key.
Every key-value pair generated by any mapper is sent to the
reducer for its key.
The reduce Function For Join
Given key b and a list of values that are either
) or (S, 𝑐𝑗
), output each triple
(𝑎𝑖 ,b,𝑐𝑗 ).
-Thus, the number of outputs made by a
reducer is the product of the number of R’s on
the list and the numbers of S’s on the list.
Network Resources Related to Big Data
The network's capability to absorb and transfer big
data traffic is made up of six elements:
2. Network delay
4. Data delivery accuracy
Network Monitoring of Big Data
● Most monitoring systems deal with major changes,
failures, configuration data, and traffic reporting.
● The monitoring function itself is a producer of big
data. Therefore, the network data needs to be analyzed
with big data applications.
● Traffic trends, where applications are located, what
caused the traffic, and what network resources are
available to effectively carry the traffic are all part of
the network big data information.
Network Monitoring Strategies
● Ensure that your monitoring tools collect the network information with
enough granularity to produce detailed statistical representations.
● You will need a dashboard that continuously provides alerts and alarms
when traffic changes occur that are outside acceptable.
● Create short-term reports rapidly so that traffic changes that could impair
the network operation can be discovered as soon as possible.
● If a cloud service is employed, do you have the traffic data from the cloud
delivered in real time so you can make decisions before a problem worsens?
Benefits of Big Data Network Monitoring
1. Load balancing
2. Data Filtering
3. Real-time data analysis
4. Managing Virtual resources
Big data for network design
Big data for network management
Big data for network resource optimization
Big data for network security and privacy
Big data for network economics and pricing
Big data for network performance evaluation
Parallel and distributed algorithms for Big Data
Targeted marketing and
Using 'tracking cookies' Facebook can collect
information about each website you are
It is possible to accurately predict a range of
highly sensitive personal attributes simply by
analysing the ‘Likes’
Network Security & Bigdata
Software-Defined Networking (SDN)-based
controllers and Big Data analytics within and
about the data network
Analyzes network security attacks and potential
risks immediately, which prevents security
Eg:Behavior analysis software to prevent the
misuse of crutial data.
Network partitioning is crucial in setting up big data
Heavy demands from applications do not impact other
Prepare now for big data scalability later
Yahoo is running more than 42,000 nodes in its big
data environment, in 2013 the average number of
nodes in a big data cluster was just over 100
Big data helps better analysis and market
Helps develop better logistic and accuracy in
systems and reduces redundancy.
The characteristic 4 v’s support the
management and utilization of massive data.
Notas do Editor
It's the information owned by a company, obtained and processed through new techniques to produce value in the best way possible.
A problem is broken down into parts that can be solved concurrently.
Each part is further broken down into instructions.
Instructions execute simultaneously over multiple processors.