Data mining with big data

Data Mining
With Big Data
Presented By:
Dinesh chandra yenduri
Rg.no : y15mc24095

Abstract
Big Data concern large-volume, complex,
growing data sets with multiple, autonomous
sources. With the fast development of
networking, data storage, and the data collection
capacity, Big Data are now rapidly expanding in
all science and engineering domains, including
physical, biological and biomedical sciences
2

Outlines
• Introduction
• What is Data Mining With Big Data
• How To Produce The Big Data
• Big Data Characteristics
• 4Vs Big Data
• Hadoop System Architecture
• Hadoop Framework
• Data Mining Challenges With Big Data
• Big Data Challenges and solution
• Advantages
• Conclusion
• References
3

Introduction
• The volume of business data worldwide, across all
companies, doubles every 1.2 years (was 1.5 years)
• Daily 2500 quintillion of data are produced and more
than 90 percentage of data are produced within past two
years.
• Face book processes 10 TB of data every day / Twitter 7
TB
• On 4 October 2012, the first presidential debate between
President Barack Obama and Governor Mitt Romney
triggered more than 10 million tweets within 2 hours
• Examples : Booing Jet, Scientific Data, Sensor Data,
Internet Data
4

What is Data Mining With Big Data
5

Big Data Characteristics
• Data has grown
tremendously.
• Big Data starts with
large-volume,
heterogeneous,
autonomous sources
with distributed and
decentralized system
7

4Vs Big Data
Volume
• Data quantity
Velocity
• Data Speed
Variety
• Data Types
Variability
• Authenticity
8

How To Manage The Big Data
• By using the Hadoop
• It is the open source system
• It is distributed file system
9

Data Mining Challenges With Big Data
• Big Data Mining Platform
• Big Data Semantics and Application Knowledge
• Big Data Mining Algorithm
12

Big Data Mining Platform
• Data are typically large and cannot be fit into the
main memory
• Parallel computing programming to carry out
the mining process
• Big Data processing framework will rely on
cluster computers with a high-performance
computing platform on a large number of
computing nodes
13

Big Data Mining Platform (Cont…)
• Big Data mining offers opportunities to go
beyond traditional relational databases to rely
on less structured data: weblogs, social media,
e-mail, sensors, and photographs that can be
mined for useful information
14

Big Data Semantics and Application
Knowledge
The tw0 most important issues at this section
1) Data sharing and privacy
2) Domain and application knowledge
15

Data sharing and privacy
• Information sharing is an ultimate goal for all
systems involving multiple parties
• Those are the two common approaches or their
1) Restrict access to the data, such as adding
certification or access control to the data
entries, so sensitive information is accessible
by a limited group of users only
2) anonymize data fields such that sensitive
information cannot be pinpointed to an
indivi- dual record
16

Domain and application knowledge
• Domain and application knowledge provides
essential information for designing Big Data
mining algorithms and systems
• The domain and application knowledge can also
help design achievable business objectives by
using Big Data analytical techniques
17

Big Data Mining Algorithm
I. Local Learning and Model Fusion for
Multiple Information Sources
II. Mining from Sparse, Uncertain, and
Incomplete Data
III. Mining Complex and Dynamic Data
18

Local Learning and Model Fusion for Multiple
Information Sources
As Big Data applications are featured with
autonomous sources and decentralized controls,
aggregating distributed data sources to a
centralized site for mining is system - atically
prohibitive due to the potential transmission
cost and privacy concerns
19

Mining from Sparse, Uncertain, and
Incomplete Data
• Sparse, uncertain, and incomplete data are
defining features for Big Data applications
20

Mining Complex and Dynamic Data
• The rise of Big Data is driven by the rapid
increasing of complex data and their changes in
volumes and in nature
• Documents posted on WWW servers, Internet
back- bones, social networks, communication
networks, and transportation networks, and so
on are all featured with dynamic data
21

Big Data Challenges and solution
 Location of Big Data sources- Commonly Big
Data are stored in different locations
 Volume of the Big Data- size of the Big Data
grows continuously.
 Hardware resources- RAM capacity
 Privacy
 Domain knowledge
 Getting meaningful information
23

solution
 Parallel computing programming
 An efficient platform for computing
will not have centralized data storage
instead of that platform will be
distributed in big scale storage.
 Restricting access to the data
24

Advantages
• No Fast response
• Extract useful information
• Prediction of required data from large amount of
data
• Serves of better results in the form of
visualization
25

Conclusion
Big Data as an emerging trend and the need for
Big Data mining is arising in all science and
engineering domains. With Big Data
technologies, we will hopefully be able to provide
most relevant and most accurate social sensing
feedback to better understand our society at
real- time
26

References
• R. Ahmed and G. Karypis, “Algorithms for Mining
the Evolution of Conserved Relational States in
Dynamic Networks,” Knowledge and Information
Systems, vol. 33, no. 3, pp. 603-630, Dec. 2012.
• M.H. Alam, J.W. Ha, and S.K. Lee, “Novel
Approaches to Crawling Important Pages Early,”
Knowledge and Information Systems, vol. 33, no. 3,
pp 707-734, Dec. 2012.
• S. Aral and D. Walker, “Identifying Influential and
Susceptible Members of Social Networks,” Science,
vol. 337, pp. 337-341, 2012.
27

Data mining with big data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (17)

Semelhante a Data mining with big data

Semelhante a Data mining with big data (20)

Último

Último (20)

Data mining with big data