SlideShare a Scribd company logo
1 of 6
Download to read offline
Q 1.
(a)
Explain the Stream Data Model Architecture with a neat diagram.
In analogy to a database-management system, we can view a stream processor as a kind of
data-management system, the high-level organization of which is suggested in Fig.
Any number of streams can enter the system. Each stream can provide elements at its own
schedule; they need not have the same data rates or data types, and the time between elements
of one stream need not be uniform. The fact that the rate of arrival of stream elements is not
under the control of the system distinguishes stream processing from the processing of data
that goes on within a database-management system. The latter system controls the rate at
which data is read from the disk, and therefore never has to worry about data getting lost as it
attempts to execute queries. Streams may be archived in a large archival store, but we assume
it is not possible to answer queries from the archival store. It could be examined only under
special circumstances using time-consuming retrieval processes. There is also a working store,
into which summaries or parts of streams may be placed, and which can be used for answering
queries. The working store might be disk, or it might be main memory, depending on how fast
we need to process queries. But either way, it is of sufficiently limited capacity that it cannot
store all the data from all the streams.
2
What is bloom filter? Determine the probability of false positivenness in Bloom Filter.
A Bloom filter consists of:
1. An array of n bits, initially all 0’s.
2. A collection of hash functions h1, h2, . . . , hk. Each hash function maps “key” values
to n buckets, corresponding to the n bits of the bit-array.
3. A set S of m key values.
The purpose of the Bloom filter is to allow through all stream elements whose keys are in S,
while rejecting most of the stream elements whose keys are not in S.
The model to use is throwing darts at targets. Suppose we have x targets and y darts. Any dart
is equally likely to hit any target. After throwing the darts, how many targets can we expect to
be hit at least once?
 The probability that a given dart will not hit a given target is (x − 1)/x
 The probability that none of the y darts will hit a given target is ((x−1)/x)^y
 We can write this expression as (1 – 1 x )^x( y x ).
 Using the approximation (1−ǫ)1/ǫ = 1/e for small E we conclude that the probability
that none of the y darts hit a given target is e−y/x.
3. Explain Girvan Newman Algorithm .Detect communities for the following graph using Girvan
Newman Algorithm(Edge Betweenness mentioned in the graph)
 In order to find out between edges, we need to calculate shortest paths from going
through each of the edges.
 Girvan - Newman Algorithm visits each node X once and computes the number of
shortest paths from X to each of the other nodes that go through each of the edges.
 The algorithm begins by performing a breadth first search [BFS] of the graph, starting
at the node X.
 The edges that go between node at the same level can never be a part of a shortest path
from X.
 Edges DAG edge will be part of at-least one shortest path from root X.
 To complete the betweeness calculation, we have to repeat this calculation for every
node as the root and sum the contributions.
 After calculations, following graph shows final betweenness values:
 We can cluster by taking the in order to increasing betweenness and add them to the
graph at a time.
 We can remove edge with highest value to cluster the graph.
 In the example graph we remove edge BD to get two communities as follows:
4) Define PageRank . Calculate page rank for the following graph
5 Explain Flajolet-Martin Algorithm.Perform FM for the stream 1.3.2,1,2,3,4,3,1,2,3,1……….
Flajolet-Martin algorithm approximates the number of unique objects in a stream or a
database in one pass. If the stream contains n elements with m of them unique, this algorithm
runs in O(n)O(n) time and needs O(log(m))O(log(m)) memory.
Algorithm:
1. Create a bit vector (bit array) of sufficient length L, such that 2L>n2L>n, the number
of elements in the stream. Usually a 64-bit vector is sufficient since 264264 is quite
large for most purposes.
2. The i-th bit in this vector/array represents whether we have seen a hash function value
whose binary representation ends in 0i0i. So initialize each bit to 0.
3. The i-th bit in this vector/array represents whether we have seen a hash function value
whose binary representation ends in 0i. So initialize each bit to 0.
4. The i-th bit in this vector/array represents whether we have seen a hash function value
whose binary representation ends in 0i. So initialize each bit to 0.
Example S=1,3,2,1,2,3,4,3,1,2,3,1S=1,3,2,1,2,3,4,3,1,2,3,1
h(x)=(6x+1) mod 5h(x)=(6x+1) mod 5
Assume |b| = 5
R = max( r(a) ) = 5
So no. of distinct elements = N=2R=25=32
6 Write psuedocode for pagerank calculation using MapReduce. What is the role of combiners
in performing the pagerank calculation?
Combiners: (2 Marks)
There are two reasons
1. We might wish to add terms for v ′ i , the ith component of the result vector v, at the
Map tasks. This improvement is the same as using a combiner, since the Reduce
function simply adds terms with a common key. Recall that for a MapReduce
implementation of matrix–vector multiplication, the key is the value of i for which a
term mijvj is intended.
2. We might not be using MapReduce at all, but rather executing the iteration step at a
single machine or a collection of machines.
7. Explain CURE clustering algorithm with an example.
The CURE (Clustering Using Representatives) Algorithm is large scale clustering algorithm
in the point assignment classs which assumes Euclidean space. It does not assume anything
about the shape of clusters; they need not be normally distributed, and can even have strange
bends, S-shapes, or even rings.
Instead of representing clusters by their centroid, it uses a collection of representative points,
as the name implies.
The CURE algorithm is divided into into phases:
1. Initialization in CURE
2. Completion of the CURE Algorithm
Initialization in CURE:
1. Take a small sample of the data and cluster it in main memory. In principle, any
clustering method could be used, but as CURE is designed to handle oddly shaped
clusters, it is often advisable to use a hierarchical method in which clusters are merged
when they have a close pair of points.
2. Select a small set of points from each cluster to be representative points. These points
should be chosen to be as far from one another as possible, using the K-means method.
3. Move each of the representative points a fixed fraction of the distance between its
location and the centroid of its cluster. Perhaps 20% is a good fraction to choose. Note
that this step requires a Euclidean space, since otherwise, there might not be any notion
of a line between two points.
Completion of the CURE Algorithm:
The next phase of CURE is to merge two clusters if they have a pair of representative points,
one from each cluster, that are sufficiently close. The user may pick the distance that defines
“close.” This merging step can repeat, until there are no more sufficiently close clusters.

More Related Content

What's hot

Concept of hashing
Concept of hashingConcept of hashing
Concept of hashingRafi Dar
 
Dynamic Memory & Linked Lists
Dynamic Memory & Linked ListsDynamic Memory & Linked Lists
Dynamic Memory & Linked ListsAfaq Mansoor Khan
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2SHAKOOR AB
 
Clustering
ClusteringClustering
Clusteringbutest
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec IISajid Marwat
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithmVinit Dantkale
 
Machine learning hands on clustering
Machine learning hands on clusteringMachine learning hands on clustering
Machine learning hands on clusteringDr. Dragos Crintea
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)NYversity
 
K means clustering
K means clusteringK means clustering
K means clusteringThomas K T
 
Advance algorithm hashing lec I
Advance algorithm hashing lec IAdvance algorithm hashing lec I
Advance algorithm hashing lec ISajid Marwat
 
Hashing and Hash Tables
Hashing and Hash TablesHashing and Hash Tables
Hashing and Hash Tablesadil raja
 
Data Structure and Algorithms Hashing
Data Structure and Algorithms HashingData Structure and Algorithms Hashing
Data Structure and Algorithms HashingManishPrajapati78
 

What's hot (20)

Concept of hashing
Concept of hashingConcept of hashing
Concept of hashing
 
Dynamic Memory & Linked Lists
Dynamic Memory & Linked ListsDynamic Memory & Linked Lists
Dynamic Memory & Linked Lists
 
08 Hash Tables
08 Hash Tables08 Hash Tables
08 Hash Tables
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
 
Clustering
ClusteringClustering
Clustering
 
Hashing PPT
Hashing PPTHashing PPT
Hashing PPT
 
Ch17 Hashing
Ch17 HashingCh17 Hashing
Ch17 Hashing
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec II
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
Machine learning hands on clustering
Machine learning hands on clusteringMachine learning hands on clustering
Machine learning hands on clustering
 
Machine learning (11)
Machine learning (11)Machine learning (11)
Machine learning (11)
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Advance algorithm hashing lec I
Advance algorithm hashing lec IAdvance algorithm hashing lec I
Advance algorithm hashing lec I
 
Hashing 1
Hashing 1Hashing 1
Hashing 1
 
Hashing and Hash Tables
Hashing and Hash TablesHashing and Hash Tables
Hashing and Hash Tables
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Quadratic probing
Quadratic probingQuadratic probing
Quadratic probing
 
Data Structure and Algorithms Hashing
Data Structure and Algorithms HashingData Structure and Algorithms Hashing
Data Structure and Algorithms Hashing
 

Similar to Bigdata analytics

ADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
ADA Unit — 2 Greedy Strategy and Examples | RGPV De BunkersADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
ADA Unit — 2 Greedy Strategy and Examples | RGPV De BunkersRGPV De Bunkers
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxdickonsondorris
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetHoang Nguyen Phong
 
Data Structures Design Notes.pdf
Data Structures Design Notes.pdfData Structures Design Notes.pdf
Data Structures Design Notes.pdfAmuthachenthiruK
 
Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Ashwini Mathur
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdfSudhanshiBakre1
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsPhilip Schwarz
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
VCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxVCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxskilljiolms
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...ArchiLab 7
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithmsSandeep Joshi
 
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftMathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftAlexanderCominsky
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
Optimization Of Fuzzy Bexa Using Nm
Optimization Of Fuzzy Bexa Using NmOptimization Of Fuzzy Bexa Using Nm
Optimization Of Fuzzy Bexa Using NmAshish Khetan
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 

Similar to Bigdata analytics (20)

ADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
ADA Unit — 2 Greedy Strategy and Examples | RGPV De BunkersADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
ADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
 
User_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docxUser_42751212015Module1and2pagestocompetework.pdf.docx
User_42751212015Module1and2pagestocompetework.pdf.docx
 
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer SheetPerspective in Informatics 3 - Assignment 2 - Answer Sheet
Perspective in Informatics 3 - Assignment 2 - Answer Sheet
 
Data Structures Design Notes.pdf
Data Structures Design Notes.pdfData Structures Design Notes.pdf
Data Structures Design Notes.pdf
 
H010223640
H010223640H010223640
H010223640
 
Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdf
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with Views
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
VCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxVCE Unit 01 (2).pptx
VCE Unit 01 (2).pptx
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithms
 
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftMathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
 
Optimization Of Fuzzy Bexa Using Nm
Optimization Of Fuzzy Bexa Using NmOptimization Of Fuzzy Bexa Using Nm
Optimization Of Fuzzy Bexa Using Nm
 
Kiaras Ioannis cern
Kiaras Ioannis cernKiaras Ioannis cern
Kiaras Ioannis cern
 
[PPT]
[PPT][PPT]
[PPT]
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 

Recently uploaded

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 

Recently uploaded (20)

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 

Bigdata analytics

  • 1. Q 1. (a) Explain the Stream Data Model Architecture with a neat diagram. In analogy to a database-management system, we can view a stream processor as a kind of data-management system, the high-level organization of which is suggested in Fig. Any number of streams can enter the system. Each stream can provide elements at its own schedule; they need not have the same data rates or data types, and the time between elements of one stream need not be uniform. The fact that the rate of arrival of stream elements is not under the control of the system distinguishes stream processing from the processing of data that goes on within a database-management system. The latter system controls the rate at which data is read from the disk, and therefore never has to worry about data getting lost as it attempts to execute queries. Streams may be archived in a large archival store, but we assume it is not possible to answer queries from the archival store. It could be examined only under special circumstances using time-consuming retrieval processes. There is also a working store, into which summaries or parts of streams may be placed, and which can be used for answering queries. The working store might be disk, or it might be main memory, depending on how fast we need to process queries. But either way, it is of sufficiently limited capacity that it cannot store all the data from all the streams. 2 What is bloom filter? Determine the probability of false positivenness in Bloom Filter. A Bloom filter consists of: 1. An array of n bits, initially all 0’s. 2. A collection of hash functions h1, h2, . . . , hk. Each hash function maps “key” values to n buckets, corresponding to the n bits of the bit-array. 3. A set S of m key values. The purpose of the Bloom filter is to allow through all stream elements whose keys are in S, while rejecting most of the stream elements whose keys are not in S. The model to use is throwing darts at targets. Suppose we have x targets and y darts. Any dart is equally likely to hit any target. After throwing the darts, how many targets can we expect to be hit at least once?  The probability that a given dart will not hit a given target is (x − 1)/x  The probability that none of the y darts will hit a given target is ((x−1)/x)^y
  • 2.  We can write this expression as (1 – 1 x )^x( y x ).  Using the approximation (1−ǫ)1/ǫ = 1/e for small E we conclude that the probability that none of the y darts hit a given target is e−y/x. 3. Explain Girvan Newman Algorithm .Detect communities for the following graph using Girvan Newman Algorithm(Edge Betweenness mentioned in the graph)  In order to find out between edges, we need to calculate shortest paths from going through each of the edges.  Girvan - Newman Algorithm visits each node X once and computes the number of shortest paths from X to each of the other nodes that go through each of the edges.  The algorithm begins by performing a breadth first search [BFS] of the graph, starting at the node X.  The edges that go between node at the same level can never be a part of a shortest path from X.  Edges DAG edge will be part of at-least one shortest path from root X.  To complete the betweeness calculation, we have to repeat this calculation for every node as the root and sum the contributions.  After calculations, following graph shows final betweenness values:  We can cluster by taking the in order to increasing betweenness and add them to the graph at a time.  We can remove edge with highest value to cluster the graph.  In the example graph we remove edge BD to get two communities as follows:
  • 3. 4) Define PageRank . Calculate page rank for the following graph
  • 4. 5 Explain Flajolet-Martin Algorithm.Perform FM for the stream 1.3.2,1,2,3,4,3,1,2,3,1………. Flajolet-Martin algorithm approximates the number of unique objects in a stream or a database in one pass. If the stream contains n elements with m of them unique, this algorithm runs in O(n)O(n) time and needs O(log(m))O(log(m)) memory. Algorithm: 1. Create a bit vector (bit array) of sufficient length L, such that 2L>n2L>n, the number of elements in the stream. Usually a 64-bit vector is sufficient since 264264 is quite large for most purposes. 2. The i-th bit in this vector/array represents whether we have seen a hash function value whose binary representation ends in 0i0i. So initialize each bit to 0. 3. The i-th bit in this vector/array represents whether we have seen a hash function value whose binary representation ends in 0i. So initialize each bit to 0. 4. The i-th bit in this vector/array represents whether we have seen a hash function value whose binary representation ends in 0i. So initialize each bit to 0. Example S=1,3,2,1,2,3,4,3,1,2,3,1S=1,3,2,1,2,3,4,3,1,2,3,1 h(x)=(6x+1) mod 5h(x)=(6x+1) mod 5 Assume |b| = 5 R = max( r(a) ) = 5 So no. of distinct elements = N=2R=25=32 6 Write psuedocode for pagerank calculation using MapReduce. What is the role of combiners in performing the pagerank calculation?
  • 5. Combiners: (2 Marks) There are two reasons 1. We might wish to add terms for v ′ i , the ith component of the result vector v, at the Map tasks. This improvement is the same as using a combiner, since the Reduce function simply adds terms with a common key. Recall that for a MapReduce implementation of matrix–vector multiplication, the key is the value of i for which a term mijvj is intended. 2. We might not be using MapReduce at all, but rather executing the iteration step at a single machine or a collection of machines. 7. Explain CURE clustering algorithm with an example. The CURE (Clustering Using Representatives) Algorithm is large scale clustering algorithm in the point assignment classs which assumes Euclidean space. It does not assume anything about the shape of clusters; they need not be normally distributed, and can even have strange bends, S-shapes, or even rings. Instead of representing clusters by their centroid, it uses a collection of representative points, as the name implies. The CURE algorithm is divided into into phases: 1. Initialization in CURE 2. Completion of the CURE Algorithm Initialization in CURE: 1. Take a small sample of the data and cluster it in main memory. In principle, any clustering method could be used, but as CURE is designed to handle oddly shaped clusters, it is often advisable to use a hierarchical method in which clusters are merged when they have a close pair of points.
  • 6. 2. Select a small set of points from each cluster to be representative points. These points should be chosen to be as far from one another as possible, using the K-means method. 3. Move each of the representative points a fixed fraction of the distance between its location and the centroid of its cluster. Perhaps 20% is a good fraction to choose. Note that this step requires a Euclidean space, since otherwise, there might not be any notion of a line between two points. Completion of the CURE Algorithm: The next phase of CURE is to merge two clusters if they have a pair of representative points, one from each cluster, that are sufficiently close. The user may pick the distance that defines “close.” This merging step can repeat, until there are no more sufficiently close clusters.