Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #analysis #data #dataanalysis #Mapreduction
Big DATA
By- Yash Bheda (1524008)
Janhavi Jaltare(1524011)
Krisha Udani()
Binal Savla (1524003)
Table of Contents
Topics
History of Big Data
Big Data
Architecture for Network
Network Analysis Algorithm
Big Data network analysing
Network Application
Summary
1.0: History of Big Data
Big data is a relative term describing when the data in an
organization is to be stored and managed by timely decision
making.
Time Data Generation Processing
Initially Employee
generated data
Single Processor
Modern times User generated
data
Parallel
Processing(Multiple processors
using servers)Recently System generated
data
Contents
Big data generated by user and system are
mostly unstructured.
Traditional Data Big Data
Documents Photographs
Finances Audio and Videos
Stock Recording 3D Models
Personnel Files Simulation
Location Data
BIG data
Big Data represents the way this information is
analysed to help open Opportunities.
A deep need exists for the structure to parse the data to
separate out the unwanted and find the useful threads
to uncover opportunities.
Input information
New processing
techniques
Better results
4 V’s of BIG data
Volume :vast amounts of data generated every
second.
Velocity:speed at which new data generated moves
around.
Variability :messiness or trustworthiness of the data. It
means inconsistent data flow with periodic peaks.
Variety :different types of data we can now use.
From classifying big data to choosing
a big data solution
Defining a logical
architecture
Understanding
atomic patterns for
big data solutions
Understanding
composite patterns
to use for big data
solutions
Choosing a solution pattern
for a big data solution
Determining the viability of a
business problem for a big
data solution
Selecting the right products
to implement a big data
solution
Mappers and Reducers
Map-Reduce job =
- Map function (input->key-value pairs)+
-Reduce function(key and list values->output).
Map() procedure (method) that performs filtering and sorting.
Reduce() method that performs a summary operation
NATURAL JOIN- MAPPING
Join of R(A,B) with S(B,C) is the set of tuples (a,b,c).
Mapper need to send R(a,b) and S (b,c) to the same reducer, so they
can be joined there.
Mapper output:key=B-value,value=relation and othe component (A
or C).
-Example:R(1,2)-> (2,(R,1))
S(2,3)-> (2.(S,3))
Mapping Tuples
R(1,2) —> —>(2,(R,1))
R(4,2) —> —>(2,(R,4))
S(2,3) —> —>(2,(S,3))
S(5,6) —> —>(5,(S,6))
Mapper
For R(1,2)
Mapper
For
R(4,2)
Mapper
For
S(2,3)
Mapper
For
S(5,6)
Grouping Phase
There is a reduce for each key.
Every key-value pair generated by any mapper is sent to the
reducer for its key.
The reduce Function For Join
Given key b and a list of values that are either
(R, 𝑎𝑖
) or (S, 𝑐𝑗
), output each triple
(𝑎𝑖 ,b,𝑐𝑗 ).
-Thus, the number of outputs made by a
reducer is the product of the number of R’s on
the list and the numbers of S’s on the list.
Network Resources Related to Big Data
The network's capability to absorb and transfer big
data traffic is made up of six elements:
1. Bandwidth
2. Network delay
3. Security
4. Data delivery accuracy
5. Availability
6. Resiliency
Network Monitoring of Big Data
● Most monitoring systems deal with major changes,
failures, configuration data, and traffic reporting.
● The monitoring function itself is a producer of big
data. Therefore, the network data needs to be analyzed
with big data applications.
● Traffic trends, where applications are located, what
caused the traffic, and what network resources are
available to effectively carry the traffic are all part of
the network big data information.
Network Monitoring Strategies
● Ensure that your monitoring tools collect the network information with
enough granularity to produce detailed statistical representations.
● You will need a dashboard that continuously provides alerts and alarms
when traffic changes occur that are outside acceptable.
● Create short-term reports rapidly so that traffic changes that could impair
the network operation can be discovered as soon as possible.
● If a cloud service is employed, do you have the traffic data from the cloud
delivered in real time so you can make decisions before a problem worsens?
Benefits of Big Data Network Monitoring
1. Load balancing
2. Data Filtering
3. Real-time data analysis
4. Managing Virtual resources
Network Applications
Big data for network design
Big data for network management
Big data for network resource optimization
Big data for network security and privacy
Big data for network economics and pricing
Big data for network performance evaluation
Parallel and distributed algorithms for Big Data
Online services
Netflix actually does comparison of their show
banners and gives each customer what
appeals to them
Targeted marketing and
advertising
Using 'tracking cookies' Facebook can collect
information about each website you are
visiting
It is possible to accurately predict a range of
highly sensitive personal attributes simply by
analysing the ‘Likes’
Network Security & Bigdata
Software-Defined Networking (SDN)-based
controllers and Big Data analytics within and
about the data network
Analyzes network security attacks and potential
risks immediately, which prevents security
breaches.
Eg:Behavior analysis software to prevent the
misuse of crutial data.
Implementation
Network partitioning is crucial in setting up big data
environments.
Heavy demands from applications do not impact other
mission-critical workloads
Prepare now for big data scalability later
Yahoo is running more than 42,000 nodes in its big
data environment, in 2013 the average number of
nodes in a big data cluster was just over 100
Summary
Big data helps better analysis and market
prediction.
Helps develop better logistic and accuracy in
systems and reduces redundancy.
The characteristic 4 v’s support the
management and utilization of massive data.
Notas do Editor
It's the information owned by a company, obtained and processed through new techniques to produce value in the best way possible.
A problem is broken down into parts that can be solved concurrently.
Each part is further broken down into instructions.
Instructions execute simultaneously over multiple processors.