2. What is Data?
• The quantities, characters, or symbols on which operations are
performed by a computer, which may be stored and
transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.
• What is Big Data?
• So before I explain what is Big Data, let me also tell you what it
is not!
• The most common myth associated with it is that it is just
about the size or volume of data.
• But actually, it’s not just about the “big” amounts of data being
collected.
• Big Data refers to the large amounts of data which is pouring
in from various data sources and has different formats.
• Even previously there was huge data which were being stored
in databases, but because of the varied nature of this Data, the
traditional relational database systems are incapable of
handling this Data.
3. • Big data refers to the large, diverse sets of information
that grow at ever-increasing rates.
• It encompasses the volume of information, the velocity or
speed at which it is created and collected, and the variety
or scope of the data points being covered.
• Big data often comes from multiple sources and arrives in
multiple formats.
• To really understand big data, it’s helpful to have some
historical background. Here is Gartner’s definition, circa
2001 (which is still the go-to definition): Big data is data
that contains greater variety arriving in increasing volumes
and with ever-higher velocity.
• big data is larger, more complex data sets, especially from
new data sources. These data sets are so voluminous that
traditional data processing software just can’t manage
them
4. The History of Big Data
•Although the concept of big data itself is relatively new, the
origins of large data sets go back to the 1960s and '70s,
• when the world of data was just getting started with the first
data centers and the development of the relational database.
•Around 2005, people began to realize just how much data
users generated through Facebook, YouTube, and other online
services.
• Hadoop (an open-source framework created specifically to
store and analyze big data sets) was developed that same year.
• NoSQL also began to gain popularity during this time.
•The development of open-source frameworks, such as
Hadoop was essential for the growth of big data because they
make big data easier to work with and cheaper to store.
• In the years since then, the volume of big data has
skyrocketed.
5. •Users are still generating huge amounts of data—but it’s
not just humans who are doing it.
•With the advent of the Internet of Things (IoT), more
objects and devices are connected to the internet, gathering
data on customer usage patterns and product performance.
•The emergence of machine learning has produced still more
data.
•While big data has come far, its usefulness is only just
beginning.
6. What is Big Data?
Big Data is also data but with a huge size. Big Data
is a term used to describe a collection of data that is
huge in volume and yet growing exponentially with
time.
In short such data is so large and complex that none
of the traditional data management tools are able to
store it or process it efficiently.
7. Examples Of Big Data
Following are some the examples of Big Data-
1. The New York Stock Exchange generates
about one terabyte of new trade data per day.
8. 2.Social Media
The statistic shows that 500+terabytes of new data
get ingested into the databases of social media
site Facebook, every day. This data is mainly
generated in terms of photo and video uploads,
message exchanges, putting comments etc.
9. 3. A single Jet engine can generate 10+terabytes of
data in 30 minutes of flight time. With many
thousand flights per day, generation of data
reaches up to many Petabytes.
11. 1. Volume
•With big data, you’ll have to process high volumes of
low-density, unstructured data.
•This can be data of unknown value, such as Twitter
data feeds, click streams on a webpage or a mobile app,
or sensor-enabled equipment.
•For some organizations, this might be tens of terabytes
of data. For others, it may be hundreds of petabytes.
2. Velocity
•Velocity is the fast rate at which data is received and
(perhaps) acted on.
•Normally, the highest velocity of data streams directly
into memory versus being written to disk.
•Some internet-enabled smart products operate in real
time or near real time and will require real-time
evaluation and action.
12. 3. Variety
•Variety refers to the many types of data that are available.
•Traditional data types were structured and fit neatly in a
relational database.
•With the rise of big data, data comes in new unstructured data
types. Unstructured and semi-structured data types, such as
text, audio, and video, require additional preprocessing to
derive meaning and support metadata.
4. Value
•Data has intrinsic value. But it’s of no use until that value is
discovered
5. Veracity(Truth)
•How truthful is your data—and how much can you rely on it?
•This refers to the inconsistency which can be shown by the
data at times, thus hampering the process of being able to
handle and manage the data effectively.