SlideShare uma empresa Scribd logo
1 de 21
ABSTRACT
Big data is a broad term for data sets so large or complex that traditional data
processing applications are inadequate. Challenges include analysis, capture, data
curation, search, sharing, storage, transfer, visualization, and information privacy.
The term often refers simply to the use of predictive analytics or other
certain advanced methods to extract value from data, and seldom to a particular
size of data set. Accuracy in big data may lead to more confident decision making.
And better decisions can mean greater operational efficiency, costreductions and
reduced risk.
Analysis of data sets can find new correlations, to "spotbusiness trends,
prevent diseases, combat crime and so on." Scientists, practitioners of media and
advertising and governments alike regularly meet difficulties with large data sets in
areas including Internet search, finance and business informatics. Scientists
encounter limitations in e-Science work, including meteorology, genomics,
connectomics, complex physics simulations, and biological and environmental
research.
Work with big data is necessarily uncommon; most analysis is of "PC size"
data, on a desktop PC or notebookthat can handle the available data set. Relational
database management systems and desktop statistics and visualization packages
often have difficulty handling big data. The work instead requires "massively
parallel software running on tens, hundreds, or even thousands of servers". What is
considered "big data" varies depending on the capabilities of the users and their
tools, and expanding capabilities make Big Data a moving target. Thus, what is
considered to be "Big" in one year will become ordinary in later years. "Forsome
organizations, facing hundreds of gigabytes of data for the first time may trigger a
need to reconsider data management options. Forothers, it may take tens or
hundreds of terabytes before data size becomes a significant consideration."
2
CONTENTS
1. What is Big Data………………………………………………………………………...
2. 3v’S of Big Data…………………………………………………………………………
2.1. Volume……………………………………………………………………………….
2.2. Variety…………………………………………………………………………………
2.3. Velocity…………………………………………………………………………….
3. Types of Big Data………………………………………………………………………
3.1. Structured……………………………………………………………………………..
3.2. Unstructured………………………………………………………………………….
3.3. Semi-structured………………………………………………………………………
4. Why Big Data important……………………………………………………………
5. Big Data Architecture…………………………………………………………………………
5.1. Introduction…………………………………………………………………………….
5.2. Data Source……………………………………………………………………………….
5.3. Data Storage………………………………………………………………………………
5.4. Stream Processing…………………………………………………………………………
5.5. Analytical Data Store……………………………………………………………………..
5.6. Analytics and Reporting…………………………………………………………………
5.7. Orchestration………………………………………………………………………………
5.8. Challenges in Designing Big Data Architecture………………………………………
5.8.1. Data Quality…………………………………………………………………..
5.8.2. Scaling………………………………………………………………………..
5.8.3. Security…………………………………………………………………………..
5.8.4. Choosing Technology set………………………………………………………
5.8.5. Paying loads of money………………………………………………………..
5.9. Benefits of Big Data Architecture……………………………………………………..
6. Big Data Technologies…………………………………………………………………….
7. Applications…………………………………………………………………………….
7.1.1. Tracking Customer Spending Habit………………………………………………...
7.1.2. Shopping Behaviour……………………………………………………………..
7.1.3. Recommendation………………………………………………………………..
7.1.4. Smart Traffic System…………………………………………………………….
7.1.5. Secure Air Traffic System……………………………………………………….
7.1.6. Auto Driving Car…………………………………………………………………..
7.1.7. Virtual Personal Assistant Tool…………………………………………………….
7.1.8. IoT………………………………………………………………………………….
7.1.9. Education Sector Energy Sector…………………………………….....................
7.1.10. Media and Entertainment Sector………………………………………………….
8. Challenges in Big Data……………………………………………………………………….
8.1.1. Data complexity…………………………………………………………………….
8.1.2. Computational complexity…………………………………………………………
8.1.3. System complexity…………………………………………………………………
9. Advantages and Disadvantages………………………………………………………….
10. Conclusion…………………………………………………………………………………….
3
1.INTRODUCTION
Big data is a broad term for data sets so large or complex that traditional data
processing applications are inadequate. Challenges include analysis, capture, data
curation, search, sharing, storage, transfer, visualization, and information privacy.
The term often refers simply to the use of predictive analytics or other certain
advanced methods to extract value from data, and seldom to a particular size of
data set. Accuracy in big data may lead to more confident decision making. And
better decisions can mean greater operational efficiency, costreductions and
reduced risk.
Analysis of data sets can find new correlations, to "spotbusiness trends,
prevent diseases, combat crime and so on." Scientists, practitioners of media and
advertising and governments alike regularly meet difficulties with large data sets in
areas including Internet search, finance and business informatics. Scientists
encounter limitations in e-Science work, including meteorology, genomics,
connectomics, complex physics simulations, and biological and environmental
research. Data sets grow in size in part because they are increasingly being
gathered by cheap and numerous information-sensing mobile devices, aerial
(remote sensing), software logs, cameras, microphones, radio-frequency
identification (RFID) readers, and wireless sensornetworks.
The world's technological per-capita capacity to store information has
roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5
exabytes (2.5×1018) of data were created; The challenge for large enterprises is
determining who should own big data initiatives that straddle the entire
organization. Work with big data is necessarily uncommon; most analysis is of
"PC size" data, on a desktop PC or notebookthat can handle the available data set.
Relational database management systems and desktop statistics and visualization
packages often have difficulty handling big data. The work instead requires
"massively parallel software running on tens, hundreds, or even thousands of
servers". What is considered "big data" varies depending on the capabilities of the
users and their tools, and expanding capabilities make Big Data a moving target.
Thus, what is considered to be "Big" in one year will become ordinary in later
years. "Forsome organizations, facing hundreds of gigabytes of data for the first
time may trigger a need to reconsider data management options. Forothers, it may
take tens or hundreds of terabytes before data size becomes a significant
consideration."
4
2.WHAT IS BIG DATA
Big Data? In it’s purest form, Big Data is used to describe the massive
volume of both structured and unstructured data that is so large it is difficult to
process using traditional techniques. So Big Data is just what it sounds like — a
whole lot of data.
The conceptof Big Data is a relatively new one and it represents both the
increasing amount and the varied types of data that is now being collected.
Proponents of Big Data often refer to this as the “datification” of the world. As
more and more of the world’s information moves online and becomes digitized, it
means that analysts can start to use it as data. Things like social media, online
books, music, videos and the increased amount of sensors have all added to the
astounding increase in the amount of data that has become available for analysis.
Everything you do online is now stored and tracked as data. Reading a book
on your Kindle generates data about what you’re reading, when you read it, how
fast you read it and so on. Similarly, listening to music generates data about what
you’re listening to, when how often and in what order. Your smart phone is
constantly uploading data about where you are, how fast you’re moving and what
apps you’re using.
5
3.3V’S OF BIG DATA
Big data analytics can be a difficult conceptto grasp onto, especially with
the vast varieties and amounts of data today. To make sense of the concept, experts
broken it down into 3 simple segments. These three segments are the three big V’s
of data: variety, velocity, and volume.
3.1.Volume
Volume refers to the quantity of data generated and stored by a Big Data
system.Here lies the essential value of Big Data sets – with so much data available
there is huge potential for analysis and pattern finding to an extent unavailable to
human analysis or traditional computing techniques.
Given the size of Big data sets, analysis cannot be performed by traditional
computing resources. Specialized Big Data processing, storage and analytical tools
are needed. To this end, Big Data has underpinned the growth of cloud computing,
distributed computing and edge computing platforms, as well as driving the
emerging fields of machine learning and artificial intelligence.
Example: Just think of all the emails, Twitter messages, photos, video clips and
sensordata that we produceand share every second. It is not about terabytes, but
zettabytes or brontobytes of data. On Facebookalone one send 10 billion messages
per day,billion times and upload 350 million new pictures each and every day. If
all the data generated in the world between the beginning of time and the year
2000, it is the same amount it is now generated every minute! This increasingly
makes data sets too large to store and analyze using traditional database
technology. With big data technology, one can now store and use these data sets
with the help of distributed systems, where parts of the data is stored in different
locations, connected by networks and brought together by software
3.2.Variety
The Internet of Things is characterized by a huge variety of data types. Data
varies in its format and the degree to which it is structured and ready for
6
processing.With data typically accessed from multiple sources and systems, the
ability to deal with variability in data is an essential feature of Big Data solutions.
Because Big Data is often unstructured or, at best, semi-structured one of the key
challenges is the task of standardizing and streamlining data.Products like Open
Automation Software specialise in smoothing out your big data by rendering data
in an open format ready for consumption by other systems.
Example: Just think of social media messages going viral in minutes, the speed at
which credit card transactions are checked for fraudulent activities or the
milliseconds it takes trading systems to analyze social media networks to pick up
signals that trigger decisions to buy or sell shares. Big data technology now allows
us to analyze the data while it is being generated without ever putting it into
databases.
3.1.Velocity
The growth of global networks and the spread of the Internet of Things in
particular means that data is being generated and transmitted at an ever increasing
pace.Much of this data needs to be analyzed in real time so it is critical that
systems are able to copewith the speed and volume of data generated.
Systems must be robust and scalable and employ technologies specifically
designed to protect the integrity of high speed and realtime data. handle the rate
such as advanced caching and buffering technologies.Big Data systems rely on
networking features that can handle huge data throughputs while maintaining the
integrity of real time and historical data
Example:- In the past, the focus was on structured data that neatly fits into tables
or relational databases such as financial data (for example, sales by productor
region). In fact, 80 percent of the world’s data is now unstructured and therefore
can’t easily be put into tables or relational databases—think of photos, video
sequences or social media updates. With big data technology, one can now harness
differed types of data including messages, social media conversations, photos,
sensordata, and video or voice recordings and bring them together with more
traditional, structured data.
7
4.TYPES OF BIG DATA
Following are the types of Big Data:
 Structured
 Unstructured
 Semi-structured
4.1.Structured
Any data that can be stored, accessed and processedin the form of fixed
format is termed as a 'structured' data. Over the period of time, talent in computer
science has achieved greater success in developing techniques for working with
such kind of data (where the format is well known in advance) and also deriving
value out of it. However, nowadays, we are foreseeing issues when a size of such
data grows to a huge extent, typical sizes are being in the rage of multiple
zettabytes.
4.2.Unstructured
Any data with unknown form or the structure is classified as unstructured
data. In addition to the size being huge, un-structured data poses multiple
challenges in terms of its processing for deriving value out of it. A typical example
of unstructured data is a heterogeneous data source containing a combination of
simple text files, images, videos etc. Now day organizations have wealth of data
available with them but unfortunately, they don'tknow how to derive value out of
it since this data is in its raw form or unstructured format.
4.3.Semi-structured
Semi-structured data can contain both the forms of data. We can see semi-
structured data as a structured in form but it is actually not defined with e.g. a table
definition in relational DBMS. Example of semi-structured data is a data
represented in an XML
8
WHY BIG DATA IMPORTANT.?
The importance of big data does not revolve around how much data a
company has but how a company utilises the collected data. Every company uses
data in its own way; the more efficiently a company uses its data, the more
potential it has to grow. The company can take data from any source and analyse it
to find answers which will enable:
Cost Savings :
Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring
costadvantages to business when large amounts of data are to be stored and these
tools also help in identifying more efficient ways of doing business.
Time Reductions :
The high speed of tools like Hadoop and in-memory analytics can easily
identify new sources ofdata which helps businesses analyzing data immediately
and make quick decisions based on the learnings.
Understand the market conditions :
By analyzing big data you can get a better understanding of current market
conditions. For example, by analyzing customers’ purchasing behaviors, a
company can find out the products that are sold the most and produceproducts
according to this trend. By this, it can get ahead of its competitors.
Control online reputation:
Big data tools can do sentiment analysis. Therefore, you can get feedback
about who is saying what about your company. If you want to monitor and
improve the online presence of your business, then, big data tools can help in all
this.
Using Big Data Analytics toBoost Customer Acquisition andRetention:
9
The customer is the most important asset any business depends on. There is
no single business that can claim success without first having to establish a solid
customer base. However, even with a customer base, a business cannot afford to
disregard the high competition it faces. If a business is slow to learn what
customers are looking for, then it is very easy to begin offering poorquality
products. In the end, loss of clientele will result, and this creates an adverse overall
effect on business success. The use of big data allows businesses to observe
various customer related patterns and trends. Observing customer behaviour is
important to trigger loyalty.
Using Big Data Analytics toSolve Advertisers ProblemandOffer Marketing
Insights
Big data analytics can help change all business operations. This includes the
ability to match customer expectation, changing company’s productline and of
courseensuring that the marketing campaigns are powerful.
Big Data Analytics As aDriver of Innovations and Product Development
Another huge advantage of big data is the ability to help companies innovate
and redevelop their products.
10
BIG DATA ARCHITECTURE
Introduction
Big Data architecture is a system used for ingesting, storing, and processing vast
amounts of data (known as Big Data) that can be analyzed for business gains. It is
a blueprint of a big data solution based on the requirements and infrastructure of
business organizations. A robustarchitecture saves the company money. It helps
them to predict future trends and improves decision making.
It is designed for handling:
 Batch processing of big data sources.
 Real-time processing of big data.
 Predictive analytics and machine learning.
Data sources:
Data sources govern Big Data architecture. It involves all those sources from
where the data extraction pipeline gets built. Data Sources are the starting point of
the big data pipeline. Data arrives through multiple sources including relational
databases, sensors, companyservers, IoT devices, static files generated from apps
such as Windows logs, third-party data providers, etc. This data can be batch data
11
or real-time data. Big Data architecture is designed in such a way that it handles
this vast amount of data
Data Storage:
Data Storage is the receiving end for Big Data. Data Storage receives data of
varying formats from multiple data sources and stores them. It even changes the
format of the data received from data sources depending on the system
requirements. Forexample, Big Data architecture stores unstructured data in
distributed file storage systems like HDFS or NoSQL database. It stores structured
data in RDBMS.
Real-time MessageIngestion:
We need to build a mechanism in our Big Data architecture that captures and
stores real-time data that is consumed by stream processing consumers. It is simply
a datastore where the new messages are dropped inside the folder. There are a
number of solutions that require the necessity of a message-based ingestion store
that acts like a message buffer and supports scale based processing. They provide
reliable delivery along with the other messaging queuing semantics.
It may include options like Apache Kafka, Event hubs from Azure, Apache Flume,
etc.
Batch Processing:
The architecture requires a batch processing system for filtering,
aggregating, and processing data which is huge in size for advanced analytics.
These are generally long-running batch jobs that involve reading the data from the
data storage, processing it, and writing outputs to the new files. The most
commonly used solution for Batch Processing is Apache Hadoop.
Stream Processing:
There is a little difference between stream processingand real-time message
ingestion. Stream processing handles all streaming data which occurs in windows
or streams. It then writes the data to the output sink. It includes Apache Spark,
Storm, Apache Flink, etc.
12
Analytical Data Store:
After processing data, we need to bring data in one place so that we can
accomplish an analysis of the entire data set. The analytical data store is important
as it stores all our process data at one place making analysis comprehensive. It is
optimized mainly for analysis rather than transactions. It can be a relational
database or cloud-based data warehouse depending on our needs.
Analytics and Reporting:
After ingesting and processingdata from varying data sources we require a
tool for analyzing the data. For this, there are many data analytics and visualization
tools that analyze the data and generate reports or a dashboard. Companies use
these reports for making data-driven decisions.
Orchestration:
Moving data through these systems requires orchestration in some form of
automation. Ingesting data, transforming the data, moving data in batches and
stream processes, then loading it to an analytical data store, and then analyzing it to
derive insights must be in a repeatable workflow. This allows us to continuously
gain insights from our big data.
Challenges in Designing Big DataArchitecture
Data Quality:
Data quality is a challenge while working with multiple data sources. The
architecture must ensure data quality. The data formats must match, no duplicate
data, and no data must be missed. The architecture must be designed in such a way
that it analyses and prepares the data before bringing data together with other data
for analysis.
Scaling:
Big Data architecture must be designed in such a way that it can scale up
when the need arises. Otherwise, the system performance can degrade
significantly.
13
Security:
Data Security is the most crucial part. It is the biggest challenge while
dealing with big data. Hackers and Fraudsters may try to add their own fake data or
skim companies’ data for sensitive information. Cybercriminal would easily mine
company data if companies do not encrypt the data, secure the perimeters, and
work to anonymize the data for removing sensitive information.
Choosing Technology set:
There are many tools and technologies with their pros and cons for big data
analytics like Apache Hadoop, Spark, Casandra, Hive, etc. Choosing the right
technology set is difficult. Companies must be aware that whether they need Spark
or the speed of Hadoop MapReduce is enough. Also they must know whether to
store data in Cassandra, HDFS, or HBase.
Paying loads of money:
Big data architecture entails lots of expenses. During architecture design,
the Big data company must know the hardware expenses, new hires expenses,
electricity expenses, needed framework is open-sourceor not, and many more.
Benefits ofBig Data Architecture
1. Reducing costs:Big data technologies such as Apache Hadoop significantly
reduce storage costs.
2. Improve decisionmaking: The use of Big data architecture streaming
component enables companies to make decisions in real-time.
3. Future trends prediction: Big Data analytics helps companies to predict future
trends by analyzing big data from multiple sources.
4. Creating new Products: Companies can understand the customer’s
requirements by analyzing customer previous purchases and create new products
accordingly.
14
BIG DATA TECHNOLOGIES
Big data requires exceptional technologies to efficiently process large quantities of
data within tolerable elapsed times. A 2011 McKinsey report suggests suitable
technologies include A/B testing, crowdsourcing, data fusion and integration,
genetic algorithms, machine learning, natural language processing, signal
processing, simulation, time series analysis and visualisation. Multidimensional big
data can also be represented as tensors, which can be more efficiently handled by
tensor-based computation, such as multilinear subspacelearning. Additional
technologies being applied to big data include massively parallel-processing (MPP)
databases, search-based applications, data mining, distributed file systems,
distributed databases, cloud based infrastructure (applications, storage and
computing resources)and the Internet.
Some but not all MPP relational databases have the ability to store and
manage petabytes of data. Implicit is the ability to load, monitor, backup, and
optimize the use of the large data tables in the RDBMS.
DARPA’s Topological Data Analysis program seeks the fundamental
structure of massive data sets and in 2008 the technology went public with the
launch of a company called Ayasdi.
The practitioners of big data analytics processes are generally hostile to
slower shared storage, preferring direct-attached storage (DAS) in its various forms
from solid state drive (SSD) to high capacity SATA disk buried inside parallel
processing nodes. The perception of shared storage architectures—Storage area
network (SAN) and Network-attached storage (NAS) —is that they are relatively
slow, complex, and expensive. These qualities are not consistent with big data
analytics systems that thrive on system performance, commodity infrastructure,
and low cost.
Real or near-real time information delivery is one of the defining
characteristics of big data analytics. Latency is therefore avoided whenever and
wherever possible. Data in memory is good—dataon spinning disk at the other end
of a FC SAN connection is not. The costof a SAN at the scale needed for analytics
applications is very much higher than other storage techniques.
There are advantages as well as disadvantages to shared storage in big data
analytics, but big data analytics practitioners as of 2011 did not favour it.
15
APPLICATION
In today’s world, there are a lot of data. Big companies utilize those data
for their business growth. By analyzing this data, the useful decision can be made
in various cases as discussed below:
Tracking Customer Spending Habit, Shopping Behavior:
In big retails store (like Amazon, Walmart, Big Bazar etc.) management
team has to keep data of customer’s spending habit (in which product customer
spent, in which band they wish to spent, how frequently they spent), shopping
behavior, customer’s most liked product(so that they can keep those products in
the store). Which product is being searched/sold most, based on that data,
production/collection rate of that productget fixed.
Banking sector uses their customer’s spending behavior-related data so that
they can provide the offer to a particular customer to buy his particular liked
product by using bank’s credit or debit card with discount or cashback. By this
way, they can send the right offer to the right person at the right time.
Recommendation:
By tracking customer spending habit, shopping behavior, Big retails store
provide a recommendation to the customer. E-commerce site like Amazon,
Walmart, Flipkart does product recommendation. They track what product a
customer is searching, based on that data they recommend that type of product to
that customer.As an example, supposeany customer searched bed cover on
Amazon. So, Amazon got data that customer may be interested to buy bed cover.
Next time when that customer will go to any google page, advertisement of
various bed covers will be seen. Thus, advertisement of the right product to the
right customer can be sent.YouTube also shows recommend video based on
user’s previous liked, watched video type. Based on the content of a video, the
user is watching, relevant advertisement is shown during video running. As an
example supposesomeone watching a tutorial video of Big data, then
advertisement of some other big data course will be shown during that video.
16
Smart Traffic System:
Data about the condition of the traffic of different road, collected through
camera kept beside the road, at entry and exit point of the city, GPS device placed
in the vehicle (Ola, Uber cab, etc.). All such data are analyzed and jam-free or
less jam way, less time taking ways are recommended. Such a way smart traffic
system can be built in the city by Big data analysis. One more profit is fuel
consumption can be reduced.
Secure Air Traffic System:
At various places of flight (like propeller etc) sensors present. These
sensors capture data like the speed of flight, moisture, temperature, other
environmental condition. Based on such data analysis, an environmental
parameter within flight are set up and varied.
By analyzing flight’s machine-generated data, it can be estimated how long the
machine can operate flawlessly when it to be replaced/repaired.
Auto Driving Car:
Big data analysis helps drive a car without human interpretation. In the
various spotof car camera, a sensor placed, that gather data like the size of the
surrounding car, obstacle, distance from those, etc. These data are being
analyzed, then various calculation like how many angles to rotate, what should be
speed, when to stop, etc carried out. These calculations help to take action
automatically.
Virtual Personal Assistant Tool:
Big data analysis helps virtual personal assistant tool (like Siri in Apple
Device, Cortana in Windows, Google Assistant in Android) to provide the answer
of the various question asked by users. This tool tracks the location of the user,
their local time, season, other data related to question asked, etc. Analyzing all
such data, it provides an answer.As an example, supposeone user asks “Do I
need to take Umbrella?”, the tool collects data like location of the user, season
and weather condition at that location, then analyze these data to conclude if there
is a chance of raining, then provide the answer.
17
IoT:
Manufacturing company install IOT sensor into machines to collect
operational data. Analyzing such data, it can be predicted how long machine will
work without any problem when it requires repairing so that company can take
action before the situation when machine facing a lot of issues or gets totally
down. Thus, the cost to replace the whole machine can be saved.In the Healthcare
field, Big data is providing a significant contribution. Using big data tool, data
regarding patient experience is collected and is used by doctors to give better
treatment. IoT device can sense a symptom of probable coming disease in the
human body and prevent it from giving advance treatment.
IoT Sensor placed near-patient, new-born baby constantly keeps track of various
health condition like heart bit rate, blood presser, etc. Whenever any parameter
crosses the safe limit, an alarm sent to a doctor, so that they can take step
remotely very soon.
Education Sector:
Online educational course conducting organization utilize big data to
search candidate, interested in that course. If someone searches for YouTube
tutorial video on a subject, then online or offline course provider organization on
that subject send ad online to that person about their course.
Energy Sector:
Smart electric meter read consumed power every 15 minutes and sends this
read data to the server, where data analyzed and it can be estimated what is the
time in a day when the power load is less throughout the city. By this system
manufacturing unit or housekeeper are suggested the time when they should drive
their heavy machine in the night time when power load less to enjoy less
electricity bill.
Media and Entertainment Sector:
Media and entertainment service providing company like Netflix, Amazon
Prime, Spotify do analysis on data collected from their users. Data like what type
of video, music users are watching, listening most, how long users are spending
on site, etc are collected and analyzed to set the next business strategy.
18
CHALLENGES IN BIG DATA
There are many challenges in harnessing the potential of big data today,
ranging from the design of processingsystems at the lower layer to analysis means
at the higher layer, as well as a series of open problems in scientific research.
Among these challenges, some are caused by the characteristics of big data, some,
by its current analysis models and methods, and some, by the limitations of current
data processingsystems. In this section, we briefly describe the major issues and
challenges.
Data complexity
The study of data complexity metrics is an emergent area in the field of data
mining and is focus on the analysis of several data set characteristics to extract
knowledge from them. This information used to supportthe election of the proper
classification algorithm
Computational complexity
Three of the key features of big data, namely, multisources, huge volume,
and fast-changing, make it difficult for traditional computing methods (such as
machine learning, information retrieval, and data mining) to effectively supportthe
processing, analysis and computation of big data. Such computations cannot
simply rely on paststatistics, analysis tools, and iterative algorithms used in
traditional approaches for handling small amounts of data. New approaches will
need to break away from assumptions made in traditional computations based on
independent and identical distribution of data and adequate sampling for
generating reliable statistics. When solving problems involving big data, we will
need to re-examine and investigate its computability, computational complexity,
and algorithms. New approaches for big data computing will need to address big
data-oriented, novel and highly efficient computing paradigms, provide innovative
methods for processing and analyzing big data, and supportvalue-driven
applications in specified domains. New features in big data processing, such as
insufficient samples, open and uncertain data relationships, and unbalanced
19
distribution of value density, not only provide great opportunities, but also pose
grand challenges, to studying the computability of big data and the development of
new computing paradigms.
System complexity
Big data processing systems suitable for handling a diversity of data types
and applications are the key to supporting scientific research of big data. For data
of huge volume, complex structure, and sparsevalue, its processing is confronted
by high computational complexity, long duty cycle, and real-time requirements.
These requirements not only posenew challenges to the design of system
architectures, computing frameworks, and processing systems, but also impose
stringent constraints on their operational efficiency and energy consumption. The
design of system architectures, computing frameworks, processing modes, and
benchmarks for highly energy-efficient big data processing platforms is the key
issue to be address in system complexity. Solving these problems can lay the
principles for designing, implementing, testing, and optimizing big data processing
systems. Their solutions will form an important foundation for developing
hardware and software system architectures with energy-optimized and efficient
distributed storage and processing.
20
ADVANTAGE AND DISADVANTAGE
Advantages of Big Data:
➨Big data analysis derives innovative solutions. Big data analysis helps in
understanding and targeting customers. It helps in optimizing business processes.
➨It helps in improving science and research.
➨It improves healthcare and public health with availability of record of patients.
➨It helps in financial tradings, sports, polling, security/law enforcement etc.
➨Any one can access vast information via surveys and deliver anaswer of any
query.
➨Every second additions are made.
➨One platform carry unlimited information.
DisadvantagesofBig Data:
➨Traditional storage can costlot of money to store big data.
➨Lots of big data is unstructured.
➨Big data analysis violates principles of privacy.
➨It can be used for manipulation of customer records.
➨It may increase social stratification.
➨Big data analysis is not useful in short run. It needs to be analyzed for longer
duration to leverage its benefits.
➨Big data analysis results are misleading sometimes.
➨Speedyupdates in big data can mismatch real figures.
21
CONCLUSION
The availability of Big Data, low-cost commodity hardware, and new information
management and analytic software have produced a unique moment in the history
of data analysis. The convergence of these trends means that we have the
capabilities required to analyze astonishing data sets quickly and cost-effectively
for the first time in history. These capabilities are neither theoretical nor trivial.
They represent a genuine leap forward and a clear opportunity to realize enormous
gains in terms of efficiency, productivity, revenue, and profitability. The Age of
Big Data is here, and these are truly revolutionary times if both business and
technology professionals continue to work together and deliver on the promise.

Mais conteúdo relacionado

Mais procurados

Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
Upender Upr
 

Mais procurados (20)

IoT and Cloud Computing in Automation Application
IoT and Cloud Computing in Automation ApplicationIoT and Cloud Computing in Automation Application
IoT and Cloud Computing in Automation Application
 
Nano computing
Nano computingNano computing
Nano computing
 
Big Data
Big DataBig Data
Big Data
 
Edge computing
Edge computingEdge computing
Edge computing
 
Edge Computing: Bringing the Internet Closer to You
Edge Computing: Bringing the Internet Closer to YouEdge Computing: Bringing the Internet Closer to You
Edge Computing: Bringing the Internet Closer to You
 
Cloud computing ppt
Cloud computing pptCloud computing ppt
Cloud computing ppt
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Green cloud computing
Green cloud computingGreen cloud computing
Green cloud computing
 
Big data
Big dataBig data
Big data
 
cluster computing
cluster computingcluster computing
cluster computing
 
Cognitive Computing
Cognitive ComputingCognitive Computing
Cognitive Computing
 
Big Data to avoid weather related flight delays
Big Data to avoid weather related flight delaysBig Data to avoid weather related flight delays
Big Data to avoid weather related flight delays
 
Big Data
Big DataBig Data
Big Data
 
Edge Computing and Cloud Computing
Edge Computing and Cloud ComputingEdge Computing and Cloud Computing
Edge Computing and Cloud Computing
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
 
IOT DATA AND BIG DATA
IOT DATA AND BIG DATAIOT DATA AND BIG DATA
IOT DATA AND BIG DATA
 
Cognitive computing ppt.
Cognitive computing ppt.Cognitive computing ppt.
Cognitive computing ppt.
 

Semelhante a BIG DATA-Seminar Report

Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
saranya270513
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
pateelhs
 
big-datagroup6-150317090053-conversion-gate01.pdf
big-datagroup6-150317090053-conversion-gate01.pdfbig-datagroup6-150317090053-conversion-gate01.pdf
big-datagroup6-150317090053-conversion-gate01.pdf
VirajSaud
 

Semelhante a BIG DATA-Seminar Report (20)

Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
 
Big Data
Big DataBig Data
Big Data
 
MBA-TU-Thailand:BigData for business startup.
MBA-TU-Thailand:BigData for business startup.MBA-TU-Thailand:BigData for business startup.
MBA-TU-Thailand:BigData for business startup.
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
1
11
1
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
big-datagroup6-150317090053-conversion-gate01.pdf
big-datagroup6-150317090053-conversion-gate01.pdfbig-datagroup6-150317090053-conversion-gate01.pdf
big-datagroup6-150317090053-conversion-gate01.pdf
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
 
Age Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big DataAge Friendly Economy - Introduction to Big Data
Age Friendly Economy - Introduction to Big Data
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 

Último

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Último (20)

Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 

BIG DATA-Seminar Report

  • 1. ABSTRACT Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making. And better decisions can mean greater operational efficiency, costreductions and reduced risk. Analysis of data sets can find new correlations, to "spotbusiness trends, prevent diseases, combat crime and so on." Scientists, practitioners of media and advertising and governments alike regularly meet difficulties with large data sets in areas including Internet search, finance and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, and biological and environmental research. Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a desktop PC or notebookthat can handle the available data set. Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers". What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving target. Thus, what is considered to be "Big" in one year will become ordinary in later years. "Forsome organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. Forothers, it may take tens or hundreds of terabytes before data size becomes a significant consideration."
  • 2. 2 CONTENTS 1. What is Big Data………………………………………………………………………... 2. 3v’S of Big Data………………………………………………………………………… 2.1. Volume………………………………………………………………………………. 2.2. Variety………………………………………………………………………………… 2.3. Velocity……………………………………………………………………………. 3. Types of Big Data……………………………………………………………………… 3.1. Structured…………………………………………………………………………….. 3.2. Unstructured…………………………………………………………………………. 3.3. Semi-structured……………………………………………………………………… 4. Why Big Data important…………………………………………………………… 5. Big Data Architecture………………………………………………………………………… 5.1. Introduction……………………………………………………………………………. 5.2. Data Source………………………………………………………………………………. 5.3. Data Storage……………………………………………………………………………… 5.4. Stream Processing………………………………………………………………………… 5.5. Analytical Data Store…………………………………………………………………….. 5.6. Analytics and Reporting………………………………………………………………… 5.7. Orchestration……………………………………………………………………………… 5.8. Challenges in Designing Big Data Architecture……………………………………… 5.8.1. Data Quality………………………………………………………………….. 5.8.2. Scaling……………………………………………………………………….. 5.8.3. Security………………………………………………………………………….. 5.8.4. Choosing Technology set……………………………………………………… 5.8.5. Paying loads of money……………………………………………………….. 5.9. Benefits of Big Data Architecture…………………………………………………….. 6. Big Data Technologies……………………………………………………………………. 7. Applications……………………………………………………………………………. 7.1.1. Tracking Customer Spending Habit………………………………………………... 7.1.2. Shopping Behaviour…………………………………………………………….. 7.1.3. Recommendation……………………………………………………………….. 7.1.4. Smart Traffic System……………………………………………………………. 7.1.5. Secure Air Traffic System………………………………………………………. 7.1.6. Auto Driving Car………………………………………………………………….. 7.1.7. Virtual Personal Assistant Tool……………………………………………………. 7.1.8. IoT…………………………………………………………………………………. 7.1.9. Education Sector Energy Sector……………………………………..................... 7.1.10. Media and Entertainment Sector…………………………………………………. 8. Challenges in Big Data………………………………………………………………………. 8.1.1. Data complexity……………………………………………………………………. 8.1.2. Computational complexity………………………………………………………… 8.1.3. System complexity………………………………………………………………… 9. Advantages and Disadvantages…………………………………………………………. 10. Conclusion…………………………………………………………………………………….
  • 3. 3 1.INTRODUCTION Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making. And better decisions can mean greater operational efficiency, costreductions and reduced risk. Analysis of data sets can find new correlations, to "spotbusiness trends, prevent diseases, combat crime and so on." Scientists, practitioners of media and advertising and governments alike regularly meet difficulties with large data sets in areas including Internet search, finance and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, and biological and environmental research. Data sets grow in size in part because they are increasingly being gathered by cheap and numerous information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensornetworks. The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5 exabytes (2.5×1018) of data were created; The challenge for large enterprises is determining who should own big data initiatives that straddle the entire organization. Work with big data is necessarily uncommon; most analysis is of "PC size" data, on a desktop PC or notebookthat can handle the available data set. Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. The work instead requires "massively parallel software running on tens, hundreds, or even thousands of servers". What is considered "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make Big Data a moving target. Thus, what is considered to be "Big" in one year will become ordinary in later years. "Forsome organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. Forothers, it may take tens or hundreds of terabytes before data size becomes a significant consideration."
  • 4. 4 2.WHAT IS BIG DATA Big Data? In it’s purest form, Big Data is used to describe the massive volume of both structured and unstructured data that is so large it is difficult to process using traditional techniques. So Big Data is just what it sounds like — a whole lot of data. The conceptof Big Data is a relatively new one and it represents both the increasing amount and the varied types of data that is now being collected. Proponents of Big Data often refer to this as the “datification” of the world. As more and more of the world’s information moves online and becomes digitized, it means that analysts can start to use it as data. Things like social media, online books, music, videos and the increased amount of sensors have all added to the astounding increase in the amount of data that has become available for analysis. Everything you do online is now stored and tracked as data. Reading a book on your Kindle generates data about what you’re reading, when you read it, how fast you read it and so on. Similarly, listening to music generates data about what you’re listening to, when how often and in what order. Your smart phone is constantly uploading data about where you are, how fast you’re moving and what apps you’re using.
  • 5. 5 3.3V’S OF BIG DATA Big data analytics can be a difficult conceptto grasp onto, especially with the vast varieties and amounts of data today. To make sense of the concept, experts broken it down into 3 simple segments. These three segments are the three big V’s of data: variety, velocity, and volume. 3.1.Volume Volume refers to the quantity of data generated and stored by a Big Data system.Here lies the essential value of Big Data sets – with so much data available there is huge potential for analysis and pattern finding to an extent unavailable to human analysis or traditional computing techniques. Given the size of Big data sets, analysis cannot be performed by traditional computing resources. Specialized Big Data processing, storage and analytical tools are needed. To this end, Big Data has underpinned the growth of cloud computing, distributed computing and edge computing platforms, as well as driving the emerging fields of machine learning and artificial intelligence. Example: Just think of all the emails, Twitter messages, photos, video clips and sensordata that we produceand share every second. It is not about terabytes, but zettabytes or brontobytes of data. On Facebookalone one send 10 billion messages per day,billion times and upload 350 million new pictures each and every day. If all the data generated in the world between the beginning of time and the year 2000, it is the same amount it is now generated every minute! This increasingly makes data sets too large to store and analyze using traditional database technology. With big data technology, one can now store and use these data sets with the help of distributed systems, where parts of the data is stored in different locations, connected by networks and brought together by software 3.2.Variety The Internet of Things is characterized by a huge variety of data types. Data varies in its format and the degree to which it is structured and ready for
  • 6. 6 processing.With data typically accessed from multiple sources and systems, the ability to deal with variability in data is an essential feature of Big Data solutions. Because Big Data is often unstructured or, at best, semi-structured one of the key challenges is the task of standardizing and streamlining data.Products like Open Automation Software specialise in smoothing out your big data by rendering data in an open format ready for consumption by other systems. Example: Just think of social media messages going viral in minutes, the speed at which credit card transactions are checked for fraudulent activities or the milliseconds it takes trading systems to analyze social media networks to pick up signals that trigger decisions to buy or sell shares. Big data technology now allows us to analyze the data while it is being generated without ever putting it into databases. 3.1.Velocity The growth of global networks and the spread of the Internet of Things in particular means that data is being generated and transmitted at an ever increasing pace.Much of this data needs to be analyzed in real time so it is critical that systems are able to copewith the speed and volume of data generated. Systems must be robust and scalable and employ technologies specifically designed to protect the integrity of high speed and realtime data. handle the rate such as advanced caching and buffering technologies.Big Data systems rely on networking features that can handle huge data throughputs while maintaining the integrity of real time and historical data Example:- In the past, the focus was on structured data that neatly fits into tables or relational databases such as financial data (for example, sales by productor region). In fact, 80 percent of the world’s data is now unstructured and therefore can’t easily be put into tables or relational databases—think of photos, video sequences or social media updates. With big data technology, one can now harness differed types of data including messages, social media conversations, photos, sensordata, and video or voice recordings and bring them together with more traditional, structured data.
  • 7. 7 4.TYPES OF BIG DATA Following are the types of Big Data:  Structured  Unstructured  Semi-structured 4.1.Structured Any data that can be stored, accessed and processedin the form of fixed format is termed as a 'structured' data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes. 4.2.Unstructured Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately, they don'tknow how to derive value out of it since this data is in its raw form or unstructured format. 4.3.Semi-structured Semi-structured data can contain both the forms of data. We can see semi- structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML
  • 8. 8 WHY BIG DATA IMPORTANT.? The importance of big data does not revolve around how much data a company has but how a company utilises the collected data. Every company uses data in its own way; the more efficiently a company uses its data, the more potential it has to grow. The company can take data from any source and analyse it to find answers which will enable: Cost Savings : Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring costadvantages to business when large amounts of data are to be stored and these tools also help in identifying more efficient ways of doing business. Time Reductions : The high speed of tools like Hadoop and in-memory analytics can easily identify new sources ofdata which helps businesses analyzing data immediately and make quick decisions based on the learnings. Understand the market conditions : By analyzing big data you can get a better understanding of current market conditions. For example, by analyzing customers’ purchasing behaviors, a company can find out the products that are sold the most and produceproducts according to this trend. By this, it can get ahead of its competitors. Control online reputation: Big data tools can do sentiment analysis. Therefore, you can get feedback about who is saying what about your company. If you want to monitor and improve the online presence of your business, then, big data tools can help in all this. Using Big Data Analytics toBoost Customer Acquisition andRetention:
  • 9. 9 The customer is the most important asset any business depends on. There is no single business that can claim success without first having to establish a solid customer base. However, even with a customer base, a business cannot afford to disregard the high competition it faces. If a business is slow to learn what customers are looking for, then it is very easy to begin offering poorquality products. In the end, loss of clientele will result, and this creates an adverse overall effect on business success. The use of big data allows businesses to observe various customer related patterns and trends. Observing customer behaviour is important to trigger loyalty. Using Big Data Analytics toSolve Advertisers ProblemandOffer Marketing Insights Big data analytics can help change all business operations. This includes the ability to match customer expectation, changing company’s productline and of courseensuring that the marketing campaigns are powerful. Big Data Analytics As aDriver of Innovations and Product Development Another huge advantage of big data is the ability to help companies innovate and redevelop their products.
  • 10. 10 BIG DATA ARCHITECTURE Introduction Big Data architecture is a system used for ingesting, storing, and processing vast amounts of data (known as Big Data) that can be analyzed for business gains. It is a blueprint of a big data solution based on the requirements and infrastructure of business organizations. A robustarchitecture saves the company money. It helps them to predict future trends and improves decision making. It is designed for handling:  Batch processing of big data sources.  Real-time processing of big data.  Predictive analytics and machine learning. Data sources: Data sources govern Big Data architecture. It involves all those sources from where the data extraction pipeline gets built. Data Sources are the starting point of the big data pipeline. Data arrives through multiple sources including relational databases, sensors, companyservers, IoT devices, static files generated from apps such as Windows logs, third-party data providers, etc. This data can be batch data
  • 11. 11 or real-time data. Big Data architecture is designed in such a way that it handles this vast amount of data Data Storage: Data Storage is the receiving end for Big Data. Data Storage receives data of varying formats from multiple data sources and stores them. It even changes the format of the data received from data sources depending on the system requirements. Forexample, Big Data architecture stores unstructured data in distributed file storage systems like HDFS or NoSQL database. It stores structured data in RDBMS. Real-time MessageIngestion: We need to build a mechanism in our Big Data architecture that captures and stores real-time data that is consumed by stream processing consumers. It is simply a datastore where the new messages are dropped inside the folder. There are a number of solutions that require the necessity of a message-based ingestion store that acts like a message buffer and supports scale based processing. They provide reliable delivery along with the other messaging queuing semantics. It may include options like Apache Kafka, Event hubs from Azure, Apache Flume, etc. Batch Processing: The architecture requires a batch processing system for filtering, aggregating, and processing data which is huge in size for advanced analytics. These are generally long-running batch jobs that involve reading the data from the data storage, processing it, and writing outputs to the new files. The most commonly used solution for Batch Processing is Apache Hadoop. Stream Processing: There is a little difference between stream processingand real-time message ingestion. Stream processing handles all streaming data which occurs in windows or streams. It then writes the data to the output sink. It includes Apache Spark, Storm, Apache Flink, etc.
  • 12. 12 Analytical Data Store: After processing data, we need to bring data in one place so that we can accomplish an analysis of the entire data set. The analytical data store is important as it stores all our process data at one place making analysis comprehensive. It is optimized mainly for analysis rather than transactions. It can be a relational database or cloud-based data warehouse depending on our needs. Analytics and Reporting: After ingesting and processingdata from varying data sources we require a tool for analyzing the data. For this, there are many data analytics and visualization tools that analyze the data and generate reports or a dashboard. Companies use these reports for making data-driven decisions. Orchestration: Moving data through these systems requires orchestration in some form of automation. Ingesting data, transforming the data, moving data in batches and stream processes, then loading it to an analytical data store, and then analyzing it to derive insights must be in a repeatable workflow. This allows us to continuously gain insights from our big data. Challenges in Designing Big DataArchitecture Data Quality: Data quality is a challenge while working with multiple data sources. The architecture must ensure data quality. The data formats must match, no duplicate data, and no data must be missed. The architecture must be designed in such a way that it analyses and prepares the data before bringing data together with other data for analysis. Scaling: Big Data architecture must be designed in such a way that it can scale up when the need arises. Otherwise, the system performance can degrade significantly.
  • 13. 13 Security: Data Security is the most crucial part. It is the biggest challenge while dealing with big data. Hackers and Fraudsters may try to add their own fake data or skim companies’ data for sensitive information. Cybercriminal would easily mine company data if companies do not encrypt the data, secure the perimeters, and work to anonymize the data for removing sensitive information. Choosing Technology set: There are many tools and technologies with their pros and cons for big data analytics like Apache Hadoop, Spark, Casandra, Hive, etc. Choosing the right technology set is difficult. Companies must be aware that whether they need Spark or the speed of Hadoop MapReduce is enough. Also they must know whether to store data in Cassandra, HDFS, or HBase. Paying loads of money: Big data architecture entails lots of expenses. During architecture design, the Big data company must know the hardware expenses, new hires expenses, electricity expenses, needed framework is open-sourceor not, and many more. Benefits ofBig Data Architecture 1. Reducing costs:Big data technologies such as Apache Hadoop significantly reduce storage costs. 2. Improve decisionmaking: The use of Big data architecture streaming component enables companies to make decisions in real-time. 3. Future trends prediction: Big Data analytics helps companies to predict future trends by analyzing big data from multiple sources. 4. Creating new Products: Companies can understand the customer’s requirements by analyzing customer previous purchases and create new products accordingly.
  • 14. 14 BIG DATA TECHNOLOGIES Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. A 2011 McKinsey report suggests suitable technologies include A/B testing, crowdsourcing, data fusion and integration, genetic algorithms, machine learning, natural language processing, signal processing, simulation, time series analysis and visualisation. Multidimensional big data can also be represented as tensors, which can be more efficiently handled by tensor-based computation, such as multilinear subspacelearning. Additional technologies being applied to big data include massively parallel-processing (MPP) databases, search-based applications, data mining, distributed file systems, distributed databases, cloud based infrastructure (applications, storage and computing resources)and the Internet. Some but not all MPP relational databases have the ability to store and manage petabytes of data. Implicit is the ability to load, monitor, backup, and optimize the use of the large data tables in the RDBMS. DARPA’s Topological Data Analysis program seeks the fundamental structure of massive data sets and in 2008 the technology went public with the launch of a company called Ayasdi. The practitioners of big data analytics processes are generally hostile to slower shared storage, preferring direct-attached storage (DAS) in its various forms from solid state drive (SSD) to high capacity SATA disk buried inside parallel processing nodes. The perception of shared storage architectures—Storage area network (SAN) and Network-attached storage (NAS) —is that they are relatively slow, complex, and expensive. These qualities are not consistent with big data analytics systems that thrive on system performance, commodity infrastructure, and low cost. Real or near-real time information delivery is one of the defining characteristics of big data analytics. Latency is therefore avoided whenever and wherever possible. Data in memory is good—dataon spinning disk at the other end of a FC SAN connection is not. The costof a SAN at the scale needed for analytics applications is very much higher than other storage techniques. There are advantages as well as disadvantages to shared storage in big data analytics, but big data analytics practitioners as of 2011 did not favour it.
  • 15. 15 APPLICATION In today’s world, there are a lot of data. Big companies utilize those data for their business growth. By analyzing this data, the useful decision can be made in various cases as discussed below: Tracking Customer Spending Habit, Shopping Behavior: In big retails store (like Amazon, Walmart, Big Bazar etc.) management team has to keep data of customer’s spending habit (in which product customer spent, in which band they wish to spent, how frequently they spent), shopping behavior, customer’s most liked product(so that they can keep those products in the store). Which product is being searched/sold most, based on that data, production/collection rate of that productget fixed. Banking sector uses their customer’s spending behavior-related data so that they can provide the offer to a particular customer to buy his particular liked product by using bank’s credit or debit card with discount or cashback. By this way, they can send the right offer to the right person at the right time. Recommendation: By tracking customer spending habit, shopping behavior, Big retails store provide a recommendation to the customer. E-commerce site like Amazon, Walmart, Flipkart does product recommendation. They track what product a customer is searching, based on that data they recommend that type of product to that customer.As an example, supposeany customer searched bed cover on Amazon. So, Amazon got data that customer may be interested to buy bed cover. Next time when that customer will go to any google page, advertisement of various bed covers will be seen. Thus, advertisement of the right product to the right customer can be sent.YouTube also shows recommend video based on user’s previous liked, watched video type. Based on the content of a video, the user is watching, relevant advertisement is shown during video running. As an example supposesomeone watching a tutorial video of Big data, then advertisement of some other big data course will be shown during that video.
  • 16. 16 Smart Traffic System: Data about the condition of the traffic of different road, collected through camera kept beside the road, at entry and exit point of the city, GPS device placed in the vehicle (Ola, Uber cab, etc.). All such data are analyzed and jam-free or less jam way, less time taking ways are recommended. Such a way smart traffic system can be built in the city by Big data analysis. One more profit is fuel consumption can be reduced. Secure Air Traffic System: At various places of flight (like propeller etc) sensors present. These sensors capture data like the speed of flight, moisture, temperature, other environmental condition. Based on such data analysis, an environmental parameter within flight are set up and varied. By analyzing flight’s machine-generated data, it can be estimated how long the machine can operate flawlessly when it to be replaced/repaired. Auto Driving Car: Big data analysis helps drive a car without human interpretation. In the various spotof car camera, a sensor placed, that gather data like the size of the surrounding car, obstacle, distance from those, etc. These data are being analyzed, then various calculation like how many angles to rotate, what should be speed, when to stop, etc carried out. These calculations help to take action automatically. Virtual Personal Assistant Tool: Big data analysis helps virtual personal assistant tool (like Siri in Apple Device, Cortana in Windows, Google Assistant in Android) to provide the answer of the various question asked by users. This tool tracks the location of the user, their local time, season, other data related to question asked, etc. Analyzing all such data, it provides an answer.As an example, supposeone user asks “Do I need to take Umbrella?”, the tool collects data like location of the user, season and weather condition at that location, then analyze these data to conclude if there is a chance of raining, then provide the answer.
  • 17. 17 IoT: Manufacturing company install IOT sensor into machines to collect operational data. Analyzing such data, it can be predicted how long machine will work without any problem when it requires repairing so that company can take action before the situation when machine facing a lot of issues or gets totally down. Thus, the cost to replace the whole machine can be saved.In the Healthcare field, Big data is providing a significant contribution. Using big data tool, data regarding patient experience is collected and is used by doctors to give better treatment. IoT device can sense a symptom of probable coming disease in the human body and prevent it from giving advance treatment. IoT Sensor placed near-patient, new-born baby constantly keeps track of various health condition like heart bit rate, blood presser, etc. Whenever any parameter crosses the safe limit, an alarm sent to a doctor, so that they can take step remotely very soon. Education Sector: Online educational course conducting organization utilize big data to search candidate, interested in that course. If someone searches for YouTube tutorial video on a subject, then online or offline course provider organization on that subject send ad online to that person about their course. Energy Sector: Smart electric meter read consumed power every 15 minutes and sends this read data to the server, where data analyzed and it can be estimated what is the time in a day when the power load is less throughout the city. By this system manufacturing unit or housekeeper are suggested the time when they should drive their heavy machine in the night time when power load less to enjoy less electricity bill. Media and Entertainment Sector: Media and entertainment service providing company like Netflix, Amazon Prime, Spotify do analysis on data collected from their users. Data like what type of video, music users are watching, listening most, how long users are spending on site, etc are collected and analyzed to set the next business strategy.
  • 18. 18 CHALLENGES IN BIG DATA There are many challenges in harnessing the potential of big data today, ranging from the design of processingsystems at the lower layer to analysis means at the higher layer, as well as a series of open problems in scientific research. Among these challenges, some are caused by the characteristics of big data, some, by its current analysis models and methods, and some, by the limitations of current data processingsystems. In this section, we briefly describe the major issues and challenges. Data complexity The study of data complexity metrics is an emergent area in the field of data mining and is focus on the analysis of several data set characteristics to extract knowledge from them. This information used to supportthe election of the proper classification algorithm Computational complexity Three of the key features of big data, namely, multisources, huge volume, and fast-changing, make it difficult for traditional computing methods (such as machine learning, information retrieval, and data mining) to effectively supportthe processing, analysis and computation of big data. Such computations cannot simply rely on paststatistics, analysis tools, and iterative algorithms used in traditional approaches for handling small amounts of data. New approaches will need to break away from assumptions made in traditional computations based on independent and identical distribution of data and adequate sampling for generating reliable statistics. When solving problems involving big data, we will need to re-examine and investigate its computability, computational complexity, and algorithms. New approaches for big data computing will need to address big data-oriented, novel and highly efficient computing paradigms, provide innovative methods for processing and analyzing big data, and supportvalue-driven applications in specified domains. New features in big data processing, such as insufficient samples, open and uncertain data relationships, and unbalanced
  • 19. 19 distribution of value density, not only provide great opportunities, but also pose grand challenges, to studying the computability of big data and the development of new computing paradigms. System complexity Big data processing systems suitable for handling a diversity of data types and applications are the key to supporting scientific research of big data. For data of huge volume, complex structure, and sparsevalue, its processing is confronted by high computational complexity, long duty cycle, and real-time requirements. These requirements not only posenew challenges to the design of system architectures, computing frameworks, and processing systems, but also impose stringent constraints on their operational efficiency and energy consumption. The design of system architectures, computing frameworks, processing modes, and benchmarks for highly energy-efficient big data processing platforms is the key issue to be address in system complexity. Solving these problems can lay the principles for designing, implementing, testing, and optimizing big data processing systems. Their solutions will form an important foundation for developing hardware and software system architectures with energy-optimized and efficient distributed storage and processing.
  • 20. 20 ADVANTAGE AND DISADVANTAGE Advantages of Big Data: ➨Big data analysis derives innovative solutions. Big data analysis helps in understanding and targeting customers. It helps in optimizing business processes. ➨It helps in improving science and research. ➨It improves healthcare and public health with availability of record of patients. ➨It helps in financial tradings, sports, polling, security/law enforcement etc. ➨Any one can access vast information via surveys and deliver anaswer of any query. ➨Every second additions are made. ➨One platform carry unlimited information. DisadvantagesofBig Data: ➨Traditional storage can costlot of money to store big data. ➨Lots of big data is unstructured. ➨Big data analysis violates principles of privacy. ➨It can be used for manipulation of customer records. ➨It may increase social stratification. ➨Big data analysis is not useful in short run. It needs to be analyzed for longer duration to leverage its benefits. ➨Big data analysis results are misleading sometimes. ➨Speedyupdates in big data can mismatch real figures.
  • 21. 21 CONCLUSION The availability of Big Data, low-cost commodity hardware, and new information management and analytic software have produced a unique moment in the history of data analysis. The convergence of these trends means that we have the capabilities required to analyze astonishing data sets quickly and cost-effectively for the first time in history. These capabilities are neither theoretical nor trivial. They represent a genuine leap forward and a clear opportunity to realize enormous gains in terms of efficiency, productivity, revenue, and profitability. The Age of Big Data is here, and these are truly revolutionary times if both business and technology professionals continue to work together and deliver on the promise.