2. Overview architecture
Web Apps
Web Service APIs
Mobile Apps
4. Front-end
SSO
User
ranking
1. Core User
User Data Storage
Real-time
Notification
News Feed
2. User Activity System
User Activity Storage
3. Others
Real-time Chat
Search System Suggestion System
3. Big Data System
Big Data Storage
…
External Apps
Service Data
User
Administrator
5. Definitions
• Wiki. Big data is a broad term for data sets so large or complex that traditional data
processing applications are inadequate. Challenges include analysis, capture,
curation, search, sharing, storage, transfer, visualization, and information privacy. The
term often refers simply to the use of predictive analytics or other certain advanced
methods to extract value from data, and seldom to a particular size of data set
• Intel. Big data opportunities emerge in organizations generating a median of 300
terabytes of data a week. The most common forms of data analyzed in this way are
business transactions stored in relational databases, followed by documents, e-mail,
sensor data, blogs, and social media.
• Microsoft. “Big data is the term increasingly used to describe the process of applying
serious computing power—the latest in machine learning and artificial intelligence—
to seriously massive and often highly complex sets of information.”
• Oracle. Big data is the derivation of value from traditional relational database-driven
business decision making, augmented with new sources of unstructured data.
6. Definitions
• Gartner. The increasing size of
data, the increasing rate at which
it is produced and the increasing
range of formats and
representations employed. This
report predated the term “dig
data” but proposed a three-fold
definition encompassing the
“three Vs”: Volume, Velocity and
Variety. This idea has since
become popular and sometimes
includes a fourth V: veracity, to
cover questions of trust and
uncertainty.
11. Machine learning
Applications:
• Data analysis: stock market, financial market, user action …
• Weather forecast
• Natural Language Processing
• Search engine
12. Machine learning
Methods:
• Supervised learning
• Unsupervised learning
• Semi-supervised learning
• Reinforcement learning
• Data mining
• Data exploration
1 2 3 4
x y z
DATA SET
CLUSTERS
DATA SET
CLASSES
13. Machine learning
Supervised learning application
• Classify data set
Supervised learning algorithms
1. Decision tree
2. Neuron network
3. Naive Bayes classifier
…
1. Decision tree
2. Neuron network
18. Machine learning
Classifying Steps:
for( i=0; i<n; i++ ){
1. Split data set
• Training set: n-1
• Test set: 1
• n=10 is optimal
2. Training
→Model
3. Test
→ Error rate
}
19. Advances
• Build training set
• Reinforcement learning and applications
• Machine leaning algorithms