Talk presented at BRACIS 2012. A discussion about the Grand Challenges in Natural Computing Research and two real-world applications, one in Social Media Mining and another in E-Commerce.
2012: Natural Computing - The Grand Challenges and Two Case Studies
1. Natural Computing: The Grand Challenges and
Two Case Studies Leandro Nunes de Castro Lnunes@mackenzie.br @lndecastro
Computing and Informatics Faculty & Graduate Program in Electrical Engineering Natural Computing Laboratory (LCoN) www.mackenzie.br/lcon.html
1
2. •Natural Computing
–An Overview
–The Grand Challenges in Natural Computing Research
•Case Studies
–Social Media Mining
–Mining Association Rules for Recommender Systems
•Discussion
2
Summary
3. Natural Computing
An Overview*
3
* de Castro, L. N. (2007), “Fundamentals of Natural Computing: An Overview”, Physics of Life Reviews, 4(1), pp. 1-36.
4. •1940s: Study of automatic computing;
•1950s: Study of information processing;
•1960s: Study of phenomena surrounding computers;
•1970s: Study of what can be automated;
•1980s: Study of computation;
•2000s: Study of information processes, both natural and artificial.
4
Computing: Yesterday, Today and Tomorrow*
* Denning, P. (2008), “Computing Field: Structure”, In B. Wah (Ed.), Wiley Encyclopedia of Computer Science and Engineering, Wiley Interscience.
5. 5
From the early days of computer science, by the 1940s, researchers have been interested in tracing parallels and designing computational models and abstractions of natural phenomena.
6. The GCs aim at defining research questions that tend to be important in the long term, identifying and characterizing potential grand research problems. These may allow the formulation of projects capable of producing major scientific advancements, with practical applications for society and technology. Emphasis is in advancing science, a vision beyond specific projects, a clear and objective success evaluation and a great ambition.
6
The Grand Challenges (GCs)
7. Theoretical Works
Empirical Works
Natural Computing
Mathematical Models
Bioinspiration
Computational Synthesis of Natural Phenomena
Computing with Natural Materials
Natural Computing: The Old View
8. Natural Computing: The New Perspective
Natural Computing
Computer Modeling of Nature
Nature- Inspired Computing
Computer Synthesis of Natural Phenomena
Computing with New Materials
Natural computing is a science concerned with the investigation and design of information processing in natural and computational systems.
9. Natural Computing
The Grand Challenges*
9
* de Castro, L. N.; Xavier, R. S.; Pasti, R.; Maia, R. D.; Szabo, A.; Ferrari, D. G. (2012), "The Grand Challenges in Natural Computing Research: The Quest for a New Science", Int. J. Nat. Comp. Res., 2(4), p. 16.
11. 11
Natural Computing
Biology
Physics
Chemistry
Computer Science
GC 1: How to transpose Natural Computing into a transdisciplinary context?
12. 12
“Computer science differs from physics in that it is not actually a science. It does not study natural objects. Neither is it mathematics. It’s like engineering – about getting to do something, rather than dealing with abstractions”.*
“Biology is today an information science”**
* Feynman, R. P. (1996), “The Feynman Lectures on Computation”, In A. J. G. Hey and R. W. Allen (Ed.), (Reading, MA: Addison-Wesley).
** Denning, P. J., (2001) (Ed.), The Invisible Future: The Seamless Integration of Technology in Everyday Life, McGraw- Hill.
13. 13
GC 2: What is the Natural Computing role in this Informational Natural Sciences Era?
Overcoming this challenge will bring two important benefits to Computing and Nature:
• A Rethinking (and probably Redesign) of Computing
• A New Form of Interacting With and Using Nature
14. 14
Natural systems are open systems that communicate with the environment presenting a complex and emergent behavior. Complex biological systems must be modeled as self-referential, self- organizing, and auto-generative systems whose computational behavior goes far beyond the TM/VN paradigm. The system restructures itself in a hardware-software non-dissociable interaction: the hardware defines the software, and the software defines the hardware.
15. 15
Are there standards to design (engineer) natural computing systems?*
GC 3: To what degree defining standards for the engineering of Natural Computing systems is a limiting factor for the creative development of the field?
* Brueckner, S. A.; Serugendo, G. D. M.; Karageorgos, A.; Nagpal, R., (2005), Engineering Self-Organizing Systems, Lecture Notes in Artificial Intelligence, 3464, Springer.
* de Castro, L. N. (2001), Immune Engineering: Development and Application of Computational Tools Inspired by Artificial Immune Systems, Ph. D. Thesis presented at the Computer and Electrical Engineering School, Unicamp, Brazil.
* Fernandez-Marquez, J. L.; Serugendo, G. D. M.; Montagna, S.; Viroli M.; Arcos J. L (2012), “Description and Composition of Bio-Inspired Design Patterns: A Complete Overview”, Natural Computing, Online, DOI 10.1007/s11047-012-9324-y.
* Nagpal, R.; Mamei, M. (2004), “Engineering Amorphous Computing Systems”, Multiagent Systems, Artificial Societies, and Simulated Organizations, 11, Part V, pp. 303-320.
18. 18
110 billion minutes spent in social networks
13 years = 50 million people
9 months = 100 million users
250 million tweets/day
(Nielsen, 2011)
(Alé, 2012)
(Alé, 2012)
(Datasift, 2012)
Data and Social Media
19. 19
Qualitative analysis of tweets.
Methodology based on text mining, natural language processing and ontologies for Sentiment Analysis (SA).
Word Sense Disambiguation (WSD).
Research Focus
Social Media Analysis Tool
Text Mining; NLP; Web Semantics
Context
Twitter
20. 20
Social media and Microblog.
Messages (tweets) with up to 140 characters.
Stimulates simultaneous activities.
Informal, allows the creation of new terms, slangs, mix of languages, ironies.
Twitter Features
21. 21
Text Mining
Semi- or unstructured data
Data Mining
Structured Data
Unstructured Data Analysis
• Tokens
•Stopwords removal
• Stemming
• Representation
• Term (feature) selection
• Association
• Classification
• Clustering
• APIs
• Crawlers
•Confusion Matrix
• Accuracy
• Precision
• Recall
• F-measure
22. 22
Text Analysis
t1
t2
tc
d1
w11
w12
...
w1c
d2
w21
w22
...
w2c
...
...
...
...
...
dN
wN1
wN2
...
wNc
Vector Space Model
23. 23
Objeto
Entrar
Trancar
Porta
Molho
Guardar
Abrir
Pessoa
Presidente
Ditador
Hugo
Venezuela
Pessoa
SBT
Madruga
Kiko
Chiquinha
Bruxa do 71
TV
Girafales
Chaves
In Portuguese
24. 24
Sentiment Analysis:
Text classification based on the author’s opinion.
Word Sense Disambiguation:
Polysemic word: different meanings in different contexts.
Word Sense Disambiguation: appropriate meaning to a text with polysemic words.
WSD: words are classified according with a predefined set of meanings.
Research Focus
25. 25
Predicted Class
Correct Class
Positive Negative
Positive TP FN
Negative FP TN
TP FN
TP
P
TP
TPR
FP TN
FP
N
FP
FPR
TP FP TN FN
TP TN
ACC
FP TP
TP
Pr
FN TP
TP
Re
ered
levant ered
ecision
Recov
Re Recov
Pr
levant
levant ered
call
Re
Re Recov
Re
Interest Measures
26. 26
Context-Based Word Sense Disambiguation (CBWSD):
Polysemic words: e.g. Chaves, Estrelas, Na Brasa, Agora é tarde.
Context (semantic graph): OntoGeneral; OntoSpecific.
Classification based on the semantic graph.
Sentiment analysis based on Emoticons, Ontologies and Natural Computing:
Need to train the classifier.
Emoticon: graphic representation of a facial expression.
Example: :) :( :| :D
Ontology: concepts and their relations within a domain.
Case Study: Social TV
27. 27
Materials and Methods: CBWDS
Tweets about “Agora é tarde”:
Total: 6030 tweets
Period: 6-7 July 2012 (24 hours).
Generation of the Semantic Graph.
Case Study: Social TV
28. •INCLUDE NEW RESULTS
28
Partial Results
Without the Neutral Class
Predicted Class
Measure
Result
Measure
Positive
Negative
Positivo
Negativo
ACC
0.9580
Precision
0.9558
0.0544
Correct Class
Positive
2877
0
TPR
1
Recall
1
0.5521
Negative
133
164
FPR
0.4478
F-measure
0.9774
0.0991
Total: 142766 ms - Per tweet: 36 ms
Neutral as Positive
Predicted Class
Measure
Result
Measure
Positive
Negative
Positive
Negative
ACC
0.9689
Precision
0.9741
0.0318
Correct Class
Positive
5015
33
TPR
0.9934
Recall
0.9934
0.5521
Negative
133
164
FPR
0.4478
F-measure
0.9837
0.0602
Total: 118310 ms - Per tweet: 30 ms
30. •Discovery of association relations between items (attributes) in transactional databases.
30
Association Rules
Milk Bread Cereals Butter Milk Biscuit Cereals Chocolate Bread Coffee Eggs Sugar Bread Coffee Yogurt Sweetener
31. •Given a set of transactions, where each transaction is a set of items, na association rule is a rule X Y in which X and Y are itemsets.
•Concepts:
–Coverage or support: number of transactions for which the prediction rule is correct.
–Accuracy or confidence: number of objects that the rule predicts correctly, proportionally to the instances to which it applies.
support(A B) = P(A B) = (Freq. of A and B) / (Total of T).
confidence(A B) = P(B|A) = (Freq. of A and B) / (Freq. of A).
31
Association Rules
32. The problem of mining association rules corresponds to finding all the rules that satisfy a minimal support and confidence.
32
33. 33
Evolutionary Design of ARs
•Approaches:
–Pittsburgh: each individual represents the whole set of rules.
–Michigan: each individual represents a single rule, and the whole population composes the set of rules.
•Encoding scheme:
A
B
C
D
E
F
G
H
11
00
01
10
00
11
10
00
00: antecedent
11: consequent
01 ou 10: not part of the rule
34. •Comprehensibility:
•Interestingness:
•Operators:
–Binary encoding allos the use of standard operators, such as single-point mutation and crossover.
34
Interest Measures and Operators
C1(R) = log(1 + |C|)/log(1 + |A C|).
I(R) = (|A C|/|A|) * (|A C|/|C|) * (1(|A C|/|D|)).
C2(R) = log(1 + |C|) + log(1 + |A C|).
35. 35
Algorithms Evaluated
procedure [P] = eGA(pc,pm,pe,D) initialize P f := evaluate(P,D); P := select(P,f,pe); while not_stopping_criterion do, P := reproduce(P,f,pc); P := variate(P,pm); f := evaluate(P,D); P := select(P,f,pe); t := t+1; end while end procedure
procedure [P] = CLONALG1-2(D,max_it,n1,n2) initialize P t := 1; while t >= max_it do, f := evaluate(P); P1 := select(P,n1,f)**; C := clone(P1,f); C := mutate(C,f); f1 := evaluate(C1); P1 := select(C1,n1,f1); P := replace(P,n2); t ← t + 1; end while end procedure Evolutionary
Immune
39. •Focus on:
–Designing novel nature-inspired algorithms.
–Synthesizing natural phenomena.
–Using natural materials for computing.
•Real-world applications are unquestionable, but the field seems to be stuck on the same types of algorithms.
•Researchers are taking efforts to look at and formalize information processing in natural and computational systems.*
39
The Past and Present
* Zenil, H. (2012) (Ed.), A Computable Universe: Understanding Computation & Exploring Nature as Computation, World Scientific.
40. •Grand Challenges for the field:
–Transforming Natural Computing into a Transdisciplinary Discipline.
–Unveiling and Harnessing Information Processing in Natural Systems.
–Engineering Natural Computing Systems.
40
And the Future?