This document discusses Big Data and provides definitions and examples. It defines Big Data as very large and loosely structured data sets that are difficult to process using traditional database and software techniques. Examples of Big Data sources include social networks and machine-to-machine data. The document also discusses Hadoop and NoSQL databases as tools for managing and analyzing Big Data, and provides examples of companies using these technologies.
4. BigData Definition(1)
Big Data(BD) / /
, ,
DB (McKinsey, 2011)
What - SW , ,
is
DB (IDC, 2011)
BigData? - Big Data
( ) , ,
, ,
Gartner McKinsey
Economist (2011.03) (2011.05)
(2010.05)
21 /
SNS M2M ,
/
, , , , 5
Information silo 6
: Big Data, (KT )
5. BigData Definition(2)
Very large, distributed aggregations of loosely
structured data
Petabytes/exabytes of data,
Millions/billions of people,
Billions/trillions of records,
Loosely-structured and often distributed data,
Flat schemas with few complex interrelationships,
Often involving time-stamped events,
Often made up of incomplete data,
Often including connections between data elements that must
be probabilistically inferred,
Applications that involved Big-data can be
Transactional (e.g., Facebook, PhotoBox), or,
Analytic (e.g., ClickFox, Merced Applications).
http://wikibon.org/wiki/v/Enterprise_Big-data
6. Big-data Analytics Complements Data Warehouse
Traditional Data Warehouse
- Complete record from transactional system
- All data centralized
- Analytics designed against stable environment
- Many reports run on a production basis
Big-data Analytic Environment
- Data from many sources inside and outside of organization
(including traditional DW)
- Data often physically distributed
- Need to iteration solution to test/improve models
- Large-memory analytics also part of iteration
- Every iteration usually requires complete reload of information
http://wikibon.org/wiki/v/Enterprise_Big-data
7. Facebook Social plug-in
Transactional
process over 20 billion events per
day (200,000 events per second)
with a lag of less than 30 seconds.
Feedback
Analytic
8. BigData
Collecting Store Analysis Reporting/Searching
, SNS
/
Senti-
Cluster- Classifi-
mental Indexing
ing cation
Analysis
( )
/
Repository/
User Define
Query Script
Robot
RSS Reader (DBMS,
ETL
NoSQL)
OpenAPI
Index
Data Aggregator
9. Twitter : backtype
Workers choose queue to enqueue All updates for same URL
to using hash/mod of URL guaranteed to go to same worker
Workers share the load of
schemifying tweets
Distribute tweets randomly Workers schemify tweets Workers update statistics on URLs by
on multiple queues and append to Hadoop incrementing counters in Cassandra
10. BigData
Architectural Requirements ?
Scalability
- Scale-out -
- Elasticity -
Reliability -
- - Hadoop
-
Flexibility Component ,
- Easy for adding Analysis Rule
- Support various data format
Latency
- Real time, Near Real time, Batch
High Throughput IBM, HP, Oracle
- Global web scale traffic
- ~ /sec
-
- BI/DW