Data Mining With Big Data presents an overview of data mining techniques for large and complex datasets. It discusses how big data is produced and its characteristics including volume, velocity, variety, and variability. The document outlines challenges of big data mining such as platform and algorithm design, and solutions like distributed computing and privacy controls. Hadoop is presented as a framework for managing big data using its distributed file system and processing capabilities. The presentation concludes that big data technologies can provide more relevant insights by analyzing large and dynamic data sources.
1. Data Mining
With Big Data
Presented By:
Dinesh chandra yenduri
Rg.no : y15mc24095
2. Abstract
Big Data concern large-volume, complex,
growing data sets with multiple, autonomous
sources. With the fast development of
networking, data storage, and the data collection
capacity, Big Data are now rapidly expanding in
all science and engineering domains, including
physical, biological and biomedical sciences
2
3. Outlines
• Introduction
• What is Data Mining With Big Data
• How To Produce The Big Data
• Big Data Characteristics
• 4Vs Big Data
• Hadoop System Architecture
• Hadoop Framework
• Data Mining Challenges With Big Data
• Big Data Challenges and solution
• Advantages
• Conclusion
• References
3
4. Introduction
• The volume of business data worldwide, across all
companies, doubles every 1.2 years (was 1.5 years)
• Daily 2500 quintillion of data are produced and more
than 90 percentage of data are produced within past two
years.
• Face book processes 10 TB of data every day / Twitter 7
TB
• On 4 October 2012, the first presidential debate between
President Barack Obama and Governor Mitt Romney
triggered more than 10 million tweets within 2 hours
• Examples : Booing Jet, Scientific Data, Sensor Data,
Internet Data
4
7. Big Data Characteristics
• Data has grown
tremendously.
• Big Data starts with
large-volume,
heterogeneous,
autonomous sources
with distributed and
decentralized system
7
8. 4Vs Big Data
Volume
• Data quantity
Velocity
• Data Speed
Variety
• Data Types
Variability
• Authenticity
8
9. How To Manage The Big Data
• By using the Hadoop
• It is the open source system
• It is distributed file system
9
12. Data Mining Challenges With Big Data
• Big Data Mining Platform
• Big Data Semantics and Application Knowledge
• Big Data Mining Algorithm
12
13. Big Data Mining Platform
• Data are typically large and cannot be fit into the
main memory
• Parallel computing programming to carry out
the mining process
• Big Data processing framework will rely on
cluster computers with a high-performance
computing platform on a large number of
computing nodes
13
14. Big Data Mining Platform (Cont…)
• Big Data mining offers opportunities to go
beyond traditional relational databases to rely
on less structured data: weblogs, social media,
e-mail, sensors, and photographs that can be
mined for useful information
14
15. Big Data Semantics and Application
Knowledge
The tw0 most important issues at this section
1) Data sharing and privacy
2) Domain and application knowledge
15
16. Data sharing and privacy
• Information sharing is an ultimate goal for all
systems involving multiple parties
• Those are the two common approaches or their
1) Restrict access to the data, such as adding
certification or access control to the data
entries, so sensitive information is accessible
by a limited group of users only
2) anonymize data fields such that sensitive
information cannot be pinpointed to an
indivi- dual record
16
17. Domain and application knowledge
• Domain and application knowledge provides
essential information for designing Big Data
mining algorithms and systems
• The domain and application knowledge can also
help design achievable business objectives by
using Big Data analytical techniques
17
18. Big Data Mining Algorithm
I. Local Learning and Model Fusion for
Multiple Information Sources
II. Mining from Sparse, Uncertain, and
Incomplete Data
III. Mining Complex and Dynamic Data
18
19. Local Learning and Model Fusion for Multiple
Information Sources
As Big Data applications are featured with
autonomous sources and decentralized controls,
aggregating distributed data sources to a
centralized site for mining is system - atically
prohibitive due to the potential transmission
cost and privacy concerns
19
20. Mining from Sparse, Uncertain, and
Incomplete Data
• Sparse, uncertain, and incomplete data are
defining features for Big Data applications
20
21. Mining Complex and Dynamic Data
• The rise of Big Data is driven by the rapid
increasing of complex data and their changes in
volumes and in nature
• Documents posted on WWW servers, Internet
back- bones, social networks, communication
networks, and transportation networks, and so
on are all featured with dynamic data
21
23. Big Data Challenges and solution
Location of Big Data sources- Commonly Big
Data are stored in different locations
Volume of the Big Data- size of the Big Data
grows continuously.
Hardware resources- RAM capacity
Privacy
Domain knowledge
Getting meaningful information
23
24. solution
Parallel computing programming
An efficient platform for computing
will not have centralized data storage
instead of that platform will be
distributed in big scale storage.
Restricting access to the data
24
25. Advantages
• No Fast response
• Extract useful information
• Prediction of required data from large amount of
data
• Serves of better results in the form of
visualization
25
26. Conclusion
Big Data as an emerging trend and the need for
Big Data mining is arising in all science and
engineering domains. With Big Data
technologies, we will hopefully be able to provide
most relevant and most accurate social sensing
feedback to better understand our society at
real- time
26
27. References
• R. Ahmed and G. Karypis, “Algorithms for Mining
the Evolution of Conserved Relational States in
Dynamic Networks,” Knowledge and Information
Systems, vol. 33, no. 3, pp. 603-630, Dec. 2012.
• M.H. Alam, J.W. Ha, and S.K. Lee, “Novel
Approaches to Crawling Important Pages Early,”
Knowledge and Information Systems, vol. 33, no. 3,
pp 707-734, Dec. 2012.
• S. Aral and D. Walker, “Identifying Influential and
Susceptible Members of Social Networks,” Science,
vol. 337, pp. 337-341, 2012.
27