The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
BIG DATA-Seminar Report
1. ABSTRACT
Big data is a broad term for data sets so large or complex that traditional data
processing applications are inadequate. Challenges include analysis, capture, data
curation, search, sharing, storage, transfer, visualization, and information privacy.
The term often refers simply to the use of predictive analytics or other
certain advanced methods to extract value from data, and seldom to a particular
size of data set. Accuracy in big data may lead to more confident decision making.
And better decisions can mean greater operational efficiency, costreductions and
reduced risk.
Analysis of data sets can find new correlations, to "spotbusiness trends,
prevent diseases, combat crime and so on." Scientists, practitioners of media and
advertising and governments alike regularly meet difficulties with large data sets in
areas including Internet search, finance and business informatics. Scientists
encounter limitations in e-Science work, including meteorology, genomics,
connectomics, complex physics simulations, and biological and environmental
research.
Work with big data is necessarily uncommon; most analysis is of "PC size"
data, on a desktop PC or notebookthat can handle the available data set. Relational
database management systems and desktop statistics and visualization packages
often have difficulty handling big data. The work instead requires "massively
parallel software running on tens, hundreds, or even thousands of servers". What is
considered "big data" varies depending on the capabilities of the users and their
tools, and expanding capabilities make Big Data a moving target. Thus, what is
considered to be "Big" in one year will become ordinary in later years. "Forsome
organizations, facing hundreds of gigabytes of data for the first time may trigger a
need to reconsider data management options. Forothers, it may take tens or
hundreds of terabytes before data size becomes a significant consideration."
2. 2
CONTENTS
1. What is Big Data………………………………………………………………………...
2. 3v’S of Big Data…………………………………………………………………………
2.1. Volume……………………………………………………………………………….
2.2. Variety…………………………………………………………………………………
2.3. Velocity…………………………………………………………………………….
3. Types of Big Data………………………………………………………………………
3.1. Structured……………………………………………………………………………..
3.2. Unstructured………………………………………………………………………….
3.3. Semi-structured………………………………………………………………………
4. Why Big Data important……………………………………………………………
5. Big Data Architecture…………………………………………………………………………
5.1. Introduction…………………………………………………………………………….
5.2. Data Source……………………………………………………………………………….
5.3. Data Storage………………………………………………………………………………
5.4. Stream Processing…………………………………………………………………………
5.5. Analytical Data Store……………………………………………………………………..
5.6. Analytics and Reporting…………………………………………………………………
5.7. Orchestration………………………………………………………………………………
5.8. Challenges in Designing Big Data Architecture………………………………………
5.8.1. Data Quality…………………………………………………………………..
5.8.2. Scaling………………………………………………………………………..
5.8.3. Security…………………………………………………………………………..
5.8.4. Choosing Technology set………………………………………………………
5.8.5. Paying loads of money………………………………………………………..
5.9. Benefits of Big Data Architecture……………………………………………………..
6. Big Data Technologies…………………………………………………………………….
7. Applications…………………………………………………………………………….
7.1.1. Tracking Customer Spending Habit………………………………………………...
7.1.2. Shopping Behaviour……………………………………………………………..
7.1.3. Recommendation………………………………………………………………..
7.1.4. Smart Traffic System…………………………………………………………….
7.1.5. Secure Air Traffic System……………………………………………………….
7.1.6. Auto Driving Car…………………………………………………………………..
7.1.7. Virtual Personal Assistant Tool…………………………………………………….
7.1.8. IoT………………………………………………………………………………….
7.1.9. Education Sector Energy Sector…………………………………….....................
7.1.10. Media and Entertainment Sector………………………………………………….
8. Challenges in Big Data……………………………………………………………………….
8.1.1. Data complexity…………………………………………………………………….
8.1.2. Computational complexity…………………………………………………………
8.1.3. System complexity…………………………………………………………………
9. Advantages and Disadvantages………………………………………………………….
10. Conclusion…………………………………………………………………………………….
3. 3
1.INTRODUCTION
Big data is a broad term for data sets so large or complex that traditional data
processing applications are inadequate. Challenges include analysis, capture, data
curation, search, sharing, storage, transfer, visualization, and information privacy.
The term often refers simply to the use of predictive analytics or other certain
advanced methods to extract value from data, and seldom to a particular size of
data set. Accuracy in big data may lead to more confident decision making. And
better decisions can mean greater operational efficiency, costreductions and
reduced risk.
Analysis of data sets can find new correlations, to "spotbusiness trends,
prevent diseases, combat crime and so on." Scientists, practitioners of media and
advertising and governments alike regularly meet difficulties with large data sets in
areas including Internet search, finance and business informatics. Scientists
encounter limitations in e-Science work, including meteorology, genomics,
connectomics, complex physics simulations, and biological and environmental
research. Data sets grow in size in part because they are increasingly being
gathered by cheap and numerous information-sensing mobile devices, aerial
(remote sensing), software logs, cameras, microphones, radio-frequency
identification (RFID) readers, and wireless sensornetworks.
The world's technological per-capita capacity to store information has
roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5
exabytes (2.5×1018) of data were created; The challenge for large enterprises is
determining who should own big data initiatives that straddle the entire
organization. Work with big data is necessarily uncommon; most analysis is of
"PC size" data, on a desktop PC or notebookthat can handle the available data set.
Relational database management systems and desktop statistics and visualization
packages often have difficulty handling big data. The work instead requires
"massively parallel software running on tens, hundreds, or even thousands of
servers". What is considered "big data" varies depending on the capabilities of the
users and their tools, and expanding capabilities make Big Data a moving target.
Thus, what is considered to be "Big" in one year will become ordinary in later
years. "Forsome organizations, facing hundreds of gigabytes of data for the first
time may trigger a need to reconsider data management options. Forothers, it may
take tens or hundreds of terabytes before data size becomes a significant
consideration."
4. 4
2.WHAT IS BIG DATA
Big Data? In it’s purest form, Big Data is used to describe the massive
volume of both structured and unstructured data that is so large it is difficult to
process using traditional techniques. So Big Data is just what it sounds like — a
whole lot of data.
The conceptof Big Data is a relatively new one and it represents both the
increasing amount and the varied types of data that is now being collected.
Proponents of Big Data often refer to this as the “datification” of the world. As
more and more of the world’s information moves online and becomes digitized, it
means that analysts can start to use it as data. Things like social media, online
books, music, videos and the increased amount of sensors have all added to the
astounding increase in the amount of data that has become available for analysis.
Everything you do online is now stored and tracked as data. Reading a book
on your Kindle generates data about what you’re reading, when you read it, how
fast you read it and so on. Similarly, listening to music generates data about what
you’re listening to, when how often and in what order. Your smart phone is
constantly uploading data about where you are, how fast you’re moving and what
apps you’re using.
5. 5
3.3V’S OF BIG DATA
Big data analytics can be a difficult conceptto grasp onto, especially with
the vast varieties and amounts of data today. To make sense of the concept, experts
broken it down into 3 simple segments. These three segments are the three big V’s
of data: variety, velocity, and volume.
3.1.Volume
Volume refers to the quantity of data generated and stored by a Big Data
system.Here lies the essential value of Big Data sets – with so much data available
there is huge potential for analysis and pattern finding to an extent unavailable to
human analysis or traditional computing techniques.
Given the size of Big data sets, analysis cannot be performed by traditional
computing resources. Specialized Big Data processing, storage and analytical tools
are needed. To this end, Big Data has underpinned the growth of cloud computing,
distributed computing and edge computing platforms, as well as driving the
emerging fields of machine learning and artificial intelligence.
Example: Just think of all the emails, Twitter messages, photos, video clips and
sensordata that we produceand share every second. It is not about terabytes, but
zettabytes or brontobytes of data. On Facebookalone one send 10 billion messages
per day,billion times and upload 350 million new pictures each and every day. If
all the data generated in the world between the beginning of time and the year
2000, it is the same amount it is now generated every minute! This increasingly
makes data sets too large to store and analyze using traditional database
technology. With big data technology, one can now store and use these data sets
with the help of distributed systems, where parts of the data is stored in different
locations, connected by networks and brought together by software
3.2.Variety
The Internet of Things is characterized by a huge variety of data types. Data
varies in its format and the degree to which it is structured and ready for
6. 6
processing.With data typically accessed from multiple sources and systems, the
ability to deal with variability in data is an essential feature of Big Data solutions.
Because Big Data is often unstructured or, at best, semi-structured one of the key
challenges is the task of standardizing and streamlining data.Products like Open
Automation Software specialise in smoothing out your big data by rendering data
in an open format ready for consumption by other systems.
Example: Just think of social media messages going viral in minutes, the speed at
which credit card transactions are checked for fraudulent activities or the
milliseconds it takes trading systems to analyze social media networks to pick up
signals that trigger decisions to buy or sell shares. Big data technology now allows
us to analyze the data while it is being generated without ever putting it into
databases.
3.1.Velocity
The growth of global networks and the spread of the Internet of Things in
particular means that data is being generated and transmitted at an ever increasing
pace.Much of this data needs to be analyzed in real time so it is critical that
systems are able to copewith the speed and volume of data generated.
Systems must be robust and scalable and employ technologies specifically
designed to protect the integrity of high speed and realtime data. handle the rate
such as advanced caching and buffering technologies.Big Data systems rely on
networking features that can handle huge data throughputs while maintaining the
integrity of real time and historical data
Example:- In the past, the focus was on structured data that neatly fits into tables
or relational databases such as financial data (for example, sales by productor
region). In fact, 80 percent of the world’s data is now unstructured and therefore
can’t easily be put into tables or relational databases—think of photos, video
sequences or social media updates. With big data technology, one can now harness
differed types of data including messages, social media conversations, photos,
sensordata, and video or voice recordings and bring them together with more
traditional, structured data.
7. 7
4.TYPES OF BIG DATA
Following are the types of Big Data:
Structured
Unstructured
Semi-structured
4.1.Structured
Any data that can be stored, accessed and processedin the form of fixed
format is termed as a 'structured' data. Over the period of time, talent in computer
science has achieved greater success in developing techniques for working with
such kind of data (where the format is well known in advance) and also deriving
value out of it. However, nowadays, we are foreseeing issues when a size of such
data grows to a huge extent, typical sizes are being in the rage of multiple
zettabytes.
4.2.Unstructured
Any data with unknown form or the structure is classified as unstructured
data. In addition to the size being huge, un-structured data poses multiple
challenges in terms of its processing for deriving value out of it. A typical example
of unstructured data is a heterogeneous data source containing a combination of
simple text files, images, videos etc. Now day organizations have wealth of data
available with them but unfortunately, they don'tknow how to derive value out of
it since this data is in its raw form or unstructured format.
4.3.Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-
structured data as a structured in form but it is actually not defined with e.g. a table
definition in relational DBMS. Example of semi-structured data is a data
represented in an XML
8. 8
WHY BIG DATA IMPORTANT.?
The importance of big data does not revolve around how much data a
company has but how a company utilises the collected data. Every company uses
data in its own way; the more efficiently a company uses its data, the more
potential it has to grow. The company can take data from any source and analyse it
to find answers which will enable:
Cost Savings :
Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring
costadvantages to business when large amounts of data are to be stored and these
tools also help in identifying more efficient ways of doing business.
Time Reductions :
The high speed of tools like Hadoop and in-memory analytics can easily
identify new sources ofdata which helps businesses analyzing data immediately
and make quick decisions based on the learnings.
Understand the market conditions :
By analyzing big data you can get a better understanding of current market
conditions. For example, by analyzing customers’ purchasing behaviors, a
company can find out the products that are sold the most and produceproducts
according to this trend. By this, it can get ahead of its competitors.
Control online reputation:
Big data tools can do sentiment analysis. Therefore, you can get feedback
about who is saying what about your company. If you want to monitor and
improve the online presence of your business, then, big data tools can help in all
this.
Using Big Data Analytics toBoost Customer Acquisition andRetention:
9. 9
The customer is the most important asset any business depends on. There is
no single business that can claim success without first having to establish a solid
customer base. However, even with a customer base, a business cannot afford to
disregard the high competition it faces. If a business is slow to learn what
customers are looking for, then it is very easy to begin offering poorquality
products. In the end, loss of clientele will result, and this creates an adverse overall
effect on business success. The use of big data allows businesses to observe
various customer related patterns and trends. Observing customer behaviour is
important to trigger loyalty.
Using Big Data Analytics toSolve Advertisers ProblemandOffer Marketing
Insights
Big data analytics can help change all business operations. This includes the
ability to match customer expectation, changing company’s productline and of
courseensuring that the marketing campaigns are powerful.
Big Data Analytics As aDriver of Innovations and Product Development
Another huge advantage of big data is the ability to help companies innovate
and redevelop their products.
10. 10
BIG DATA ARCHITECTURE
Introduction
Big Data architecture is a system used for ingesting, storing, and processing vast
amounts of data (known as Big Data) that can be analyzed for business gains. It is
a blueprint of a big data solution based on the requirements and infrastructure of
business organizations. A robustarchitecture saves the company money. It helps
them to predict future trends and improves decision making.
It is designed for handling:
Batch processing of big data sources.
Real-time processing of big data.
Predictive analytics and machine learning.
Data sources:
Data sources govern Big Data architecture. It involves all those sources from
where the data extraction pipeline gets built. Data Sources are the starting point of
the big data pipeline. Data arrives through multiple sources including relational
databases, sensors, companyservers, IoT devices, static files generated from apps
such as Windows logs, third-party data providers, etc. This data can be batch data
11. 11
or real-time data. Big Data architecture is designed in such a way that it handles
this vast amount of data
Data Storage:
Data Storage is the receiving end for Big Data. Data Storage receives data of
varying formats from multiple data sources and stores them. It even changes the
format of the data received from data sources depending on the system
requirements. Forexample, Big Data architecture stores unstructured data in
distributed file storage systems like HDFS or NoSQL database. It stores structured
data in RDBMS.
Real-time MessageIngestion:
We need to build a mechanism in our Big Data architecture that captures and
stores real-time data that is consumed by stream processing consumers. It is simply
a datastore where the new messages are dropped inside the folder. There are a
number of solutions that require the necessity of a message-based ingestion store
that acts like a message buffer and supports scale based processing. They provide
reliable delivery along with the other messaging queuing semantics.
It may include options like Apache Kafka, Event hubs from Azure, Apache Flume,
etc.
Batch Processing:
The architecture requires a batch processing system for filtering,
aggregating, and processing data which is huge in size for advanced analytics.
These are generally long-running batch jobs that involve reading the data from the
data storage, processing it, and writing outputs to the new files. The most
commonly used solution for Batch Processing is Apache Hadoop.
Stream Processing:
There is a little difference between stream processingand real-time message
ingestion. Stream processing handles all streaming data which occurs in windows
or streams. It then writes the data to the output sink. It includes Apache Spark,
Storm, Apache Flink, etc.
12. 12
Analytical Data Store:
After processing data, we need to bring data in one place so that we can
accomplish an analysis of the entire data set. The analytical data store is important
as it stores all our process data at one place making analysis comprehensive. It is
optimized mainly for analysis rather than transactions. It can be a relational
database or cloud-based data warehouse depending on our needs.
Analytics and Reporting:
After ingesting and processingdata from varying data sources we require a
tool for analyzing the data. For this, there are many data analytics and visualization
tools that analyze the data and generate reports or a dashboard. Companies use
these reports for making data-driven decisions.
Orchestration:
Moving data through these systems requires orchestration in some form of
automation. Ingesting data, transforming the data, moving data in batches and
stream processes, then loading it to an analytical data store, and then analyzing it to
derive insights must be in a repeatable workflow. This allows us to continuously
gain insights from our big data.
Challenges in Designing Big DataArchitecture
Data Quality:
Data quality is a challenge while working with multiple data sources. The
architecture must ensure data quality. The data formats must match, no duplicate
data, and no data must be missed. The architecture must be designed in such a way
that it analyses and prepares the data before bringing data together with other data
for analysis.
Scaling:
Big Data architecture must be designed in such a way that it can scale up
when the need arises. Otherwise, the system performance can degrade
significantly.
13. 13
Security:
Data Security is the most crucial part. It is the biggest challenge while
dealing with big data. Hackers and Fraudsters may try to add their own fake data or
skim companies’ data for sensitive information. Cybercriminal would easily mine
company data if companies do not encrypt the data, secure the perimeters, and
work to anonymize the data for removing sensitive information.
Choosing Technology set:
There are many tools and technologies with their pros and cons for big data
analytics like Apache Hadoop, Spark, Casandra, Hive, etc. Choosing the right
technology set is difficult. Companies must be aware that whether they need Spark
or the speed of Hadoop MapReduce is enough. Also they must know whether to
store data in Cassandra, HDFS, or HBase.
Paying loads of money:
Big data architecture entails lots of expenses. During architecture design,
the Big data company must know the hardware expenses, new hires expenses,
electricity expenses, needed framework is open-sourceor not, and many more.
Benefits ofBig Data Architecture
1. Reducing costs:Big data technologies such as Apache Hadoop significantly
reduce storage costs.
2. Improve decisionmaking: The use of Big data architecture streaming
component enables companies to make decisions in real-time.
3. Future trends prediction: Big Data analytics helps companies to predict future
trends by analyzing big data from multiple sources.
4. Creating new Products: Companies can understand the customer’s
requirements by analyzing customer previous purchases and create new products
accordingly.
14. 14
BIG DATA TECHNOLOGIES
Big data requires exceptional technologies to efficiently process large quantities of
data within tolerable elapsed times. A 2011 McKinsey report suggests suitable
technologies include A/B testing, crowdsourcing, data fusion and integration,
genetic algorithms, machine learning, natural language processing, signal
processing, simulation, time series analysis and visualisation. Multidimensional big
data can also be represented as tensors, which can be more efficiently handled by
tensor-based computation, such as multilinear subspacelearning. Additional
technologies being applied to big data include massively parallel-processing (MPP)
databases, search-based applications, data mining, distributed file systems,
distributed databases, cloud based infrastructure (applications, storage and
computing resources)and the Internet.
Some but not all MPP relational databases have the ability to store and
manage petabytes of data. Implicit is the ability to load, monitor, backup, and
optimize the use of the large data tables in the RDBMS.
DARPA’s Topological Data Analysis program seeks the fundamental
structure of massive data sets and in 2008 the technology went public with the
launch of a company called Ayasdi.
The practitioners of big data analytics processes are generally hostile to
slower shared storage, preferring direct-attached storage (DAS) in its various forms
from solid state drive (SSD) to high capacity SATA disk buried inside parallel
processing nodes. The perception of shared storage architectures—Storage area
network (SAN) and Network-attached storage (NAS) —is that they are relatively
slow, complex, and expensive. These qualities are not consistent with big data
analytics systems that thrive on system performance, commodity infrastructure,
and low cost.
Real or near-real time information delivery is one of the defining
characteristics of big data analytics. Latency is therefore avoided whenever and
wherever possible. Data in memory is good—dataon spinning disk at the other end
of a FC SAN connection is not. The costof a SAN at the scale needed for analytics
applications is very much higher than other storage techniques.
There are advantages as well as disadvantages to shared storage in big data
analytics, but big data analytics practitioners as of 2011 did not favour it.
15. 15
APPLICATION
In today’s world, there are a lot of data. Big companies utilize those data
for their business growth. By analyzing this data, the useful decision can be made
in various cases as discussed below:
Tracking Customer Spending Habit, Shopping Behavior:
In big retails store (like Amazon, Walmart, Big Bazar etc.) management
team has to keep data of customer’s spending habit (in which product customer
spent, in which band they wish to spent, how frequently they spent), shopping
behavior, customer’s most liked product(so that they can keep those products in
the store). Which product is being searched/sold most, based on that data,
production/collection rate of that productget fixed.
Banking sector uses their customer’s spending behavior-related data so that
they can provide the offer to a particular customer to buy his particular liked
product by using bank’s credit or debit card with discount or cashback. By this
way, they can send the right offer to the right person at the right time.
Recommendation:
By tracking customer spending habit, shopping behavior, Big retails store
provide a recommendation to the customer. E-commerce site like Amazon,
Walmart, Flipkart does product recommendation. They track what product a
customer is searching, based on that data they recommend that type of product to
that customer.As an example, supposeany customer searched bed cover on
Amazon. So, Amazon got data that customer may be interested to buy bed cover.
Next time when that customer will go to any google page, advertisement of
various bed covers will be seen. Thus, advertisement of the right product to the
right customer can be sent.YouTube also shows recommend video based on
user’s previous liked, watched video type. Based on the content of a video, the
user is watching, relevant advertisement is shown during video running. As an
example supposesomeone watching a tutorial video of Big data, then
advertisement of some other big data course will be shown during that video.
16. 16
Smart Traffic System:
Data about the condition of the traffic of different road, collected through
camera kept beside the road, at entry and exit point of the city, GPS device placed
in the vehicle (Ola, Uber cab, etc.). All such data are analyzed and jam-free or
less jam way, less time taking ways are recommended. Such a way smart traffic
system can be built in the city by Big data analysis. One more profit is fuel
consumption can be reduced.
Secure Air Traffic System:
At various places of flight (like propeller etc) sensors present. These
sensors capture data like the speed of flight, moisture, temperature, other
environmental condition. Based on such data analysis, an environmental
parameter within flight are set up and varied.
By analyzing flight’s machine-generated data, it can be estimated how long the
machine can operate flawlessly when it to be replaced/repaired.
Auto Driving Car:
Big data analysis helps drive a car without human interpretation. In the
various spotof car camera, a sensor placed, that gather data like the size of the
surrounding car, obstacle, distance from those, etc. These data are being
analyzed, then various calculation like how many angles to rotate, what should be
speed, when to stop, etc carried out. These calculations help to take action
automatically.
Virtual Personal Assistant Tool:
Big data analysis helps virtual personal assistant tool (like Siri in Apple
Device, Cortana in Windows, Google Assistant in Android) to provide the answer
of the various question asked by users. This tool tracks the location of the user,
their local time, season, other data related to question asked, etc. Analyzing all
such data, it provides an answer.As an example, supposeone user asks “Do I
need to take Umbrella?”, the tool collects data like location of the user, season
and weather condition at that location, then analyze these data to conclude if there
is a chance of raining, then provide the answer.
17. 17
IoT:
Manufacturing company install IOT sensor into machines to collect
operational data. Analyzing such data, it can be predicted how long machine will
work without any problem when it requires repairing so that company can take
action before the situation when machine facing a lot of issues or gets totally
down. Thus, the cost to replace the whole machine can be saved.In the Healthcare
field, Big data is providing a significant contribution. Using big data tool, data
regarding patient experience is collected and is used by doctors to give better
treatment. IoT device can sense a symptom of probable coming disease in the
human body and prevent it from giving advance treatment.
IoT Sensor placed near-patient, new-born baby constantly keeps track of various
health condition like heart bit rate, blood presser, etc. Whenever any parameter
crosses the safe limit, an alarm sent to a doctor, so that they can take step
remotely very soon.
Education Sector:
Online educational course conducting organization utilize big data to
search candidate, interested in that course. If someone searches for YouTube
tutorial video on a subject, then online or offline course provider organization on
that subject send ad online to that person about their course.
Energy Sector:
Smart electric meter read consumed power every 15 minutes and sends this
read data to the server, where data analyzed and it can be estimated what is the
time in a day when the power load is less throughout the city. By this system
manufacturing unit or housekeeper are suggested the time when they should drive
their heavy machine in the night time when power load less to enjoy less
electricity bill.
Media and Entertainment Sector:
Media and entertainment service providing company like Netflix, Amazon
Prime, Spotify do analysis on data collected from their users. Data like what type
of video, music users are watching, listening most, how long users are spending
on site, etc are collected and analyzed to set the next business strategy.
18. 18
CHALLENGES IN BIG DATA
There are many challenges in harnessing the potential of big data today,
ranging from the design of processingsystems at the lower layer to analysis means
at the higher layer, as well as a series of open problems in scientific research.
Among these challenges, some are caused by the characteristics of big data, some,
by its current analysis models and methods, and some, by the limitations of current
data processingsystems. In this section, we briefly describe the major issues and
challenges.
Data complexity
The study of data complexity metrics is an emergent area in the field of data
mining and is focus on the analysis of several data set characteristics to extract
knowledge from them. This information used to supportthe election of the proper
classification algorithm
Computational complexity
Three of the key features of big data, namely, multisources, huge volume,
and fast-changing, make it difficult for traditional computing methods (such as
machine learning, information retrieval, and data mining) to effectively supportthe
processing, analysis and computation of big data. Such computations cannot
simply rely on paststatistics, analysis tools, and iterative algorithms used in
traditional approaches for handling small amounts of data. New approaches will
need to break away from assumptions made in traditional computations based on
independent and identical distribution of data and adequate sampling for
generating reliable statistics. When solving problems involving big data, we will
need to re-examine and investigate its computability, computational complexity,
and algorithms. New approaches for big data computing will need to address big
data-oriented, novel and highly efficient computing paradigms, provide innovative
methods for processing and analyzing big data, and supportvalue-driven
applications in specified domains. New features in big data processing, such as
insufficient samples, open and uncertain data relationships, and unbalanced
19. 19
distribution of value density, not only provide great opportunities, but also pose
grand challenges, to studying the computability of big data and the development of
new computing paradigms.
System complexity
Big data processing systems suitable for handling a diversity of data types
and applications are the key to supporting scientific research of big data. For data
of huge volume, complex structure, and sparsevalue, its processing is confronted
by high computational complexity, long duty cycle, and real-time requirements.
These requirements not only posenew challenges to the design of system
architectures, computing frameworks, and processing systems, but also impose
stringent constraints on their operational efficiency and energy consumption. The
design of system architectures, computing frameworks, processing modes, and
benchmarks for highly energy-efficient big data processing platforms is the key
issue to be address in system complexity. Solving these problems can lay the
principles for designing, implementing, testing, and optimizing big data processing
systems. Their solutions will form an important foundation for developing
hardware and software system architectures with energy-optimized and efficient
distributed storage and processing.
20. 20
ADVANTAGE AND DISADVANTAGE
Advantages of Big Data:
➨Big data analysis derives innovative solutions. Big data analysis helps in
understanding and targeting customers. It helps in optimizing business processes.
➨It helps in improving science and research.
➨It improves healthcare and public health with availability of record of patients.
➨It helps in financial tradings, sports, polling, security/law enforcement etc.
➨Any one can access vast information via surveys and deliver anaswer of any
query.
➨Every second additions are made.
➨One platform carry unlimited information.
DisadvantagesofBig Data:
➨Traditional storage can costlot of money to store big data.
➨Lots of big data is unstructured.
➨Big data analysis violates principles of privacy.
➨It can be used for manipulation of customer records.
➨It may increase social stratification.
➨Big data analysis is not useful in short run. It needs to be analyzed for longer
duration to leverage its benefits.
➨Big data analysis results are misleading sometimes.
➨Speedyupdates in big data can mismatch real figures.
21. 21
CONCLUSION
The availability of Big Data, low-cost commodity hardware, and new information
management and analytic software have produced a unique moment in the history
of data analysis. The convergence of these trends means that we have the
capabilities required to analyze astonishing data sets quickly and cost-effectively
for the first time in history. These capabilities are neither theoretical nor trivial.
They represent a genuine leap forward and a clear opportunity to realize enormous
gains in terms of efficiency, productivity, revenue, and profitability. The Age of
Big Data is here, and these are truly revolutionary times if both business and
technology professionals continue to work together and deliver on the promise.