Enviar pesquisa
Carregar
2011 04 troussov_graph_basedmethods-weakknowledge
•
0 gostou
•
187 visualizações
Natalia Ostapuk
Seguir
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 97
Baixar agora
Baixar para ler offline
Recomendados
Statistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical Semantics
Martin Thorsen Ranang
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
cscpconf
Using ontology for natural language processing
Using ontology for natural language processing
cracaoanu constantin sergiu
B046021319
B046021319
IJERA Editor
Use of ontologies in natural language processing
Use of ontologies in natural language processing
ATHMAN HAJ-HAMOU
The basics of ontologies
The basics of ontologies
AIMS (Agricultural Information Management Standards)
THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON MODELS
THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON MODELS
IJMIT JOURNAL
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Daniel Lewis
Recomendados
Statistics-based Approaches to Lexical Semantics
Statistics-based Approaches to Lexical Semantics
Martin Thorsen Ranang
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
cscpconf
Using ontology for natural language processing
Using ontology for natural language processing
cracaoanu constantin sergiu
B046021319
B046021319
IJERA Editor
Use of ontologies in natural language processing
Use of ontologies in natural language processing
ATHMAN HAJ-HAMOU
The basics of ontologies
The basics of ontologies
AIMS (Agricultural Information Management Standards)
THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON MODELS
THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON MODELS
IJMIT JOURNAL
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference
Daniel Lewis
Aist academic writing
Aist academic writing
Natalia Ostapuk
Zizka aimsa 2012
Zizka aimsa 2012
Natalia Ostapuk
семинар Spb ling_v3
семинар Spb ling_v3
Natalia Ostapuk
Rule b platf
Rule b platf
Natalia Ostapuk
Presentation
Presentation
Natalia Ostapuk
Bonch-Osmolovskaya 3.3.2012
Bonch-Osmolovskaya 3.3.2012
Natalia Ostapuk
Ponomareva
Ponomareva
Natalia Ostapuk
Nlp seminar.kolomiyets.dec.2013
Nlp seminar.kolomiyets.dec.2013
Natalia Ostapuk
Cross domainsc new
Cross domainsc new
Natalia Ostapuk
TOPOLOGY
TOPOLOGY
saranyajey
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Fulvio Rotella
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
University of Bari (Italy)
Finding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKS
ESCOM
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Marko Rodriguez
2 prayla
2 prayla
IAEME Publication
2 prayla
2 prayla
prjpublications
Ontology development
Ontology development
Stefano Bragaglia
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic Search
IDES Editor
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Witology
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
aciijournal
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
aciijournal
Experiences on integrating explicit knowledge on information access tools in ...
Experiences on integrating explicit knowledge on information access tools in ...
Manuel de la Villa
Mais conteúdo relacionado
Destaque
Aist academic writing
Aist academic writing
Natalia Ostapuk
Zizka aimsa 2012
Zizka aimsa 2012
Natalia Ostapuk
семинар Spb ling_v3
семинар Spb ling_v3
Natalia Ostapuk
Rule b platf
Rule b platf
Natalia Ostapuk
Presentation
Presentation
Natalia Ostapuk
Bonch-Osmolovskaya 3.3.2012
Bonch-Osmolovskaya 3.3.2012
Natalia Ostapuk
Ponomareva
Ponomareva
Natalia Ostapuk
Nlp seminar.kolomiyets.dec.2013
Nlp seminar.kolomiyets.dec.2013
Natalia Ostapuk
Cross domainsc new
Cross domainsc new
Natalia Ostapuk
Destaque
(9)
Aist academic writing
Aist academic writing
Zizka aimsa 2012
Zizka aimsa 2012
семинар Spb ling_v3
семинар Spb ling_v3
Rule b platf
Rule b platf
Presentation
Presentation
Bonch-Osmolovskaya 3.3.2012
Bonch-Osmolovskaya 3.3.2012
Ponomareva
Ponomareva
Nlp seminar.kolomiyets.dec.2013
Nlp seminar.kolomiyets.dec.2013
Cross domainsc new
Cross domainsc new
Semelhante a 2011 04 troussov_graph_basedmethods-weakknowledge
TOPOLOGY
TOPOLOGY
saranyajey
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Fulvio Rotella
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
University of Bari (Italy)
Finding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKS
ESCOM
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Marko Rodriguez
2 prayla
2 prayla
IAEME Publication
2 prayla
2 prayla
prjpublications
Ontology development
Ontology development
Stefano Bragaglia
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic Search
IDES Editor
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Witology
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
aciijournal
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
aciijournal
Experiences on integrating explicit knowledge on information access tools in ...
Experiences on integrating explicit knowledge on information access tools in ...
Manuel de la Villa
Ijetcas14 639
Ijetcas14 639
Iasir Journals
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
ijwmn
The Semantic Web: status and prospects
The Semantic Web: status and prospects
Guus Schreiber
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
Mohammed Bennamoun
AINL 2016: Nikolenko
AINL 2016: Nikolenko
Lidia Pivovarova
AI Chapter VIIProblem Solving Using Searching .pptx
AI Chapter VIIProblem Solving Using Searching .pptx
wekineheshete
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
ssuser2624f71
Semelhante a 2011 04 troussov_graph_basedmethods-weakknowledge
(20)
TOPOLOGY
TOPOLOGY
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Finding Structure in Time NEURAL NETWORKS
Finding Structure in Time NEURAL NETWORKS
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
2 prayla
2 prayla
2 prayla
2 prayla
Ontology development
Ontology development
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic Search
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
Rostislav Yavorsky - Research Challenges of Dynamic Socio-Semantic Networks
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
NEURAL MODEL-APPLYING NETWORK (NEUMAN): A NEW BASIS FOR COMPUTATIONAL COGNITION
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Neural Model-Applying Network (Neuman): A New Basis for Computational Cognition
Experiences on integrating explicit knowledge on information access tools in ...
Experiences on integrating explicit knowledge on information access tools in ...
Ijetcas14 639
Ijetcas14 639
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
COMPARING THE IMPACT OF MOBILE NODES ARRIVAL PATTERNS IN MANETS USING POISSON...
The Semantic Web: status and prospects
The Semantic Web: status and prospects
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
AINL 2016: Nikolenko
AINL 2016: Nikolenko
AI Chapter VIIProblem Solving Using Searching .pptx
AI Chapter VIIProblem Solving Using Searching .pptx
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
EDGEFORMERS: GRAPH-EMPOWERED TRANSFORMERS FOR REPRESENTATION LEARNING ON TEXT...
Mais de Natalia Ostapuk
Gromov
Gromov
Natalia Ostapuk
Aist academic writing
Aist academic writing
Natalia Ostapuk
Tomita одесса
Tomita одесса
Natalia Ostapuk
Mt engine on nlp semniar
Mt engine on nlp semniar
Natalia Ostapuk
Tomita 4марта
Tomita 4марта
Natalia Ostapuk
Konyushkova
Konyushkova
Natalia Ostapuk
Braslavsky 13.12.12
Braslavsky 13.12.12
Natalia Ostapuk
Клышинский 8.12
Клышинский 8.12
Natalia Ostapuk
Zizka synasc 2012
Zizka synasc 2012
Natalia Ostapuk
Zizka immm 2012
Zizka immm 2012
Natalia Ostapuk
Analysis by-variants
Analysis by-variants
Natalia Ostapuk
место онтологий в современной инженерии на примере Iso 15926 v1
место онтологий в современной инженерии на примере Iso 15926 v1
Natalia Ostapuk
Text mining
Text mining
Natalia Ostapuk
Additional2
Additional2
Natalia Ostapuk
Additional1
Additional1
Natalia Ostapuk
Seminar1
Seminar1
Natalia Ostapuk
2011 04 troussov_graph_basedmethods-weakknowledge
2011 04 troussov_graph_basedmethods-weakknowledge
Natalia Ostapuk
Angelii rus
Angelii rus
Natalia Ostapuk
17.03 большакова
17.03 большакова
Natalia Ostapuk
Авиком
Авиком
Natalia Ostapuk
Mais de Natalia Ostapuk
(20)
Gromov
Gromov
Aist academic writing
Aist academic writing
Tomita одесса
Tomita одесса
Mt engine on nlp semniar
Mt engine on nlp semniar
Tomita 4марта
Tomita 4марта
Konyushkova
Konyushkova
Braslavsky 13.12.12
Braslavsky 13.12.12
Клышинский 8.12
Клышинский 8.12
Zizka synasc 2012
Zizka synasc 2012
Zizka immm 2012
Zizka immm 2012
Analysis by-variants
Analysis by-variants
место онтологий в современной инженерии на примере Iso 15926 v1
место онтологий в современной инженерии на примере Iso 15926 v1
Text mining
Text mining
Additional2
Additional2
Additional1
Additional1
Seminar1
Seminar1
2011 04 troussov_graph_basedmethods-weakknowledge
2011 04 troussov_graph_basedmethods-weakknowledge
Angelii rus
Angelii rus
17.03 большакова
17.03 большакова
Авиком
Авиком
Último
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Nanddeep Nachan
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Zilliz
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Zilliz
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
Último
(20)
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Architecting Cloud Native Applications
Architecting Cloud Native Applications
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
2011 04 troussov_graph_basedmethods-weakknowledge
1.
Alexander Troussov, Ph.D.,
IBM Dublin Software Lab 16th of April 2011, Mathlingvo Seminar, St.Petersburg State University, Russia Graph-based methods to exploit “weak” knowledge © 2011 Alexander Troussov
2.
About AT
IBM Ireland Center for Advanced Studies - Chief Scientist IBM LanguageWare group – the Architect National Geophysical Data Center, Boulder, CO, USA - Visiting scientist – Fuzzy logic based search engine for search in large databases when exact parameters of search are hard to define Observatoire de la Côte d’Azur, Nice, France – Visiting scientist – numerical simulation in stochastic physics Institute of Physics of the Earth (Russian Academy of Sciences) and the International Institute for Earthquake Prediction Theory and Mathematical Geophysics, Moscow, Russia - Lead Researcher – R&D in geophysics and geoinformatics System programming at the Institute of Precise Mechanics, Moscow PhD in Mathematics from Lomonosov Moscow State University 2 © 2011 Alexander Troussov
3.
Natural Language Understanding
is Inferencing (?) From computational point of view natural language understanding is inferencing – Text which mentions Malahide is probably about Canada (??) Malahide (Canada 2006 Census population 8,828) is a township in Elgin County, Ontario, Canada Source: Troussov et al. MITACS, Canada, 2010 3 © 2011 Alexander Troussov
4.
Inferencing
Terms are ambiguous, and our knowledge is never “the truth, the whole truth, and nothing but the truth” – Malahide, Co. Dublin – Malahide is a township in Elgin County, Ontario, Canada. – Paradis Gisenyi Malahide is a hotel in Rwanda Solution (Troussov et al. MITACS, Canada, 2010 ): propagation from multiple concepts, for instance, the initial seed for the activation propagation starts at two nodes in a geographical taxonomy: Malahide (Ontario) and Malahide (Co. Dublin) as well as from other concepts mentioned in the text • Text which mentions Malahide and Europe – is a little bit more likely to be about Ireland than about Canada • Text which mentions Malahide and Clontarf – is more likely to be about Ireland than about Canada • … • Cohesive coherent text which mentions: Malahide, Mulhuddart, Lansdowne, Clontarf, Donabate - is almost for sure about Dublin 4 © 2011 Alexander Troussov
5.
Knowledge, Lexico-Semantic Resource
Text Relevancy 5 © 2011 Alexander Troussov
6.
Text – Semantic
Network NETWORK OF CONCEPTS Finding “focus” concept Mapping of term mentions to concepts . Mention Mention Mention Mention TEXT © 2011 Alexander Troussov
7.
NLU as inferencing
The concept of a car is relevant to a text. Car IS-A “on-land travel” (?) Therefore “on-land travel” is somewhat relevant to the text, … 7 © 2011 Alexander Troussov
8.
Text – Semantic
Network NETWORK OF CONCEPTS Finding “focus” concept Mapping of term mentions to concepts . Mention Mention Mention Mention TEXT © 2011 Alexander Troussov
9.
Demo
– 2 1 Spreading Activation.pdf 9 © 2011 Alexander Troussov
10.
Agenda Introduction Building
Semantic Model SA Research Challenges – Why SA – Relayability of inferencing – What is the purpose of graph operations Centrality, network flow methods Zoo of algorithms Nepomuk Recommender © 2011 Alexander Troussov
11.
Text – Semantic
Network NETWORK OF CONCEPTS Finding “focus” concept Mapping of term mentions to concepts . Mention Mention Mention Mention TEXT © 2011 Alexander Troussov
12.
Spreading Activation Methods
© 2011 Alexander Troussov
13.
There is an
increased need for a new generic and formal understanding of spreading activation as a class of algorithms rather than a particular algorithm with many parameters Spreading activation (also known as spread of activation) is a method for searching associative networks, neural networks or semantic networks. The method is based on the idea of quickly spreading an associative relevancy measure over the network. Our goal is to give an expanded introduction to the method. We will demonstrate and describe in sufficient detail that this method can be applied to very diverse problems and applications. We present the method as a general framework. First we will present this method as a very general class of algorithms on large (or very large) so-called multidimensional networks which will serve a mathematical model. Source: Troussov, Levner, Bogdan, Judge, Botvich “Spreading activation methods” 13 © 2011 Alexander Troussov
14.
We present spreading
activation in a generic form, as a set of methods suitable for mining multidimensional networks with oriented weighted links. These graphmining methods might produce results similar to those which might be achieved by soft clustering and fuzzy inferencing. The input object is a function on nodes of the network, and the spread of activation is a technique which provides “spreading” of this function through the network links. The result of the spreading activation is a new function on the nodes. The properties of that function strongly depend on the original function and the parameters of the spreading activation. For instance, when the underlying network is a network of ontological concepts, parameters governing spread might be chosen in such a way that allows “smoothing” of the original function and interpreting the resulting function as “conceptual” summaries of the initial non-zero valued nodes. 14 © 2011 Alexander Troussov
15.
Origin of Spreading
Activation Methods In neurophysiology interactions between neurons is modeled by way of activation which propagates from one neuron to another via connections called synapses to transmit information using chemical signals. The first spreading activation models were used in cognitive psychology to model this processes of memory retrieval (Collins, A.M. & Loftus, E.F., 1975; Anderson, J.,1983). This framework was later exploited in Artificial Intelligence (AI) as a processing framework for semantic networks and ontologies, and applied to Information Retrieval (Crestani, F., 1997; Aleman-Meza, Halaschek, Arpinar, & Sheth, 2003; Rocha, C, Schwabe, D. & Poggi de Aragao, M., 2004; …) as the result of direct transfer of information retrieval ideas from cognitive sciences to AI. 15 © 2011 Alexander Troussov
16.
Notation
A multidimensional network can be modeled as a directed graph, which is a pair G = (V,E) where V – is the set of vertices vi E – is the set of edges ej (although in oriented graphs edges are referred to as arcs) init: E → V – is the mapping which provides initial nodes for arcs term: E → V – is the mapping which provides terminal nodes for arcs imp – is importance value of arcs and nodes. For instance, imp(v) where the node v is a geographical location, might be the population. Imp(e) number of phone calls from person init(e) to person term(e). w – “weights”, for instance, the sigmoidal function of imp. w(ej)=0 means that effectively arc ej is ignored w(ej)=1 means that activation of init(ej) strongly affects the activation of term(ej). For instance, when the nodes represent “words”, synonym links might be assigned the value 1. F(E) – is the “activation” function, usually a real valued function on nodes of the network. 16 © 2011 Alexander Troussov
17.
Generic description of
spreading activation methods (SAM) framework 1. Initialisation Sets the parameters of the algorithm, network, and initial F(E) as a list of non-zero valued nodes V n 2. Iterations (each iteration is one pulse of SAM) – a. List Expansion the list is expanded to include neighbors (including both neighbors following outgoing links, and neighbors which have links to the nodes in the list). Newly added nodes receive a zero valued level of activation – b. Recomputation the value at each node in the list is recomputed based on the values of the function on nodes which have links to the given node and types of connections – c. List Purging The list is purged - we exclude the nodes with the values less than a threshold. – d. Conditions Check To Break Iterations like maximum number of iterations to be performed. 3. Output The list of nodes (value of the function after spread of activation) ranked according F values. 17 © 2011 Alexander Troussov
18.
Generic description of
recomputation phase We have the list of nodes V n . 1. Input/Output Through Links Computation. – For each node v we compute the input signal to each arc e, such that init(e)=v. When the signal (“activation”) passes through a link e, the activation usually experiences decay by a factor w(e) 2. Input/Output of Node Activation – Before the pulse, the node v has the activation level F(v). • Through incoming links v get more activation, By dissipating the activation through outgoing links, the node v might lose activation. 3. Computation of the New Level of Activation – A new value F(v) is computed based on F(v), Input (v), and Output (v) 18 © 2011 Alexander Troussov
19.
Generic description of
recomputation phase 1. Input/Output Through Links Computation. For each node v we compute the input signal to each arc e, such that init(e)=v. This computation can be based on the value F(v), the outdegree of a node etc. For instance, if the node v has n outgoing arcs of the same type, each arc e might get input signal: I (e) = F(init(e)) · (1 / outdegree(v)**beta ) where beta might be equal to 1. It could be also less than one, in which case the node v will propagate more activation to its neighbors than it has. When the signal (“activation”) passes through a link e, the activation usually experiences decay by a factor w(e): O (e) = I(e) · w(e) 19 © 2011 Alexander Troussov
20.
Generic description of
input/output phase 2. Input/Output of Node Activation Before the pulse, the node v has the activation level F(v). Through incoming links v get more activation: Input(v) = Σ O(e) for all links e such that init(e) ∈V n, term(e) = v. By dissipating the activation through outgoing links, the node v might lose activation: Output(v) = Σ I(e) for all links e such that init(e) = v, term(e) ∈V n 20 © 2011 Alexander Troussov
21.
Generic description of
recomputation phase 3. Computation of the New Level of Activation A new value F(v) is computed based on F(v), Input (v), and Output (v), for example Fnew(v) = F(v) + Input (v) 21 © 2011 Alexander Troussov
22.
SAM and Methods
of Numerical Simulation in Physics Spreading activation algorithms were introduced in 1990s; however the same iterative methods were used long before in numerical simulation in physics, mechanics, chemistry and engineering sciences. The major distinctions of these algorithms from what is called now as spreading activation are: – a) in physics – such algorithms usually work on a regular mesh (so that the local topology of the graph is encoded into formulas of the recomputation stage) – b) in physics – initial conditions, or initial activation – are usually assigned to all nodes on the mesh; and the use of algorithms for efficient graph traversal is not needed. For instance, steps 2a (List expansion) and 2b (List Purging) in the generic description of SAM framework might be skipped. For instance, one dimensional heat transfer equations might be numerically simulated on a one-dimensional mesh, by iterative methods. On each iteration recomputation stage is based on the formula below: Fnew (v) = ( F(RightNeighbor(v)) + F(LeftNeighbor(v)) ) / 2 Using a different formula, one can simulate the behavior of an oscillating string (although this will require storing tree values at each node - position, mass and velocity of the material point corresponding to the node). © 2011 Alexander Troussov
23.
SAM and Methods
of Numerical Simulation in Physics Using the same iterative algorithm, with one set of parameters one can emulate heat transfer; with another set of parameters the same algorithm will show us the behavior of oscillating strings. But the phenomena of heat propagation and string oscillation are quite different (for instance, heat propagation might lead to “thermal death” - the state of equilibrium where the level of activation is the same for all nodes, while oscillation might continue forever). Our illustration concern only basics, while real modeling might be much more complicated, for instance, hear transfer might lead to combustion, where after reaching some level of activation a node generates more “heat” than it gets from neighboring nodes. 23 © 2011 Alexander Troussov
24.
24
© 2011 Alexander Troussov
25.
Spreading Activation as
a Graphmining Technique The technique of SAM is quite polymorphic. On this slide we interpret the results of spreading activation in terms of graph mining. – First of all, one can think that after running SAM the most activated nodes will be those nodes, which get the activation from multiple sources, or, in other words, those nodes which minimize the “distance” to the nodes which were initially activated. Therefore these nodes might be considered as potential centroids of strong clusters induced by the initial activation. Since partitioning of the nodes according to these clusters is not immediately available (and is not needed in many applications), SAM algorithms might be considered as methods of soft clustering. – On the other hand, the most activated nodes are those nodes, which are connected to the initial conditions by particular types of directed links (arcs with large weights). Therefore we might consider SAM as an efficient scheme for computing fuzzy inferencing. For such applications replacing a single valued function F by a vector function might be useful. We conclude by noting that SAM algorithms might be used for soft clustering and fuzzy inferencing on networks. 25 © 2011 Alexander Troussov
26.
Γαλλία
People Παρίσι Ναπολέων Αλέξανδρος Geographical artifacts Relations • Friends • Part of, Instance of, Subcluss • Created 26 © 2011 Alexander Troussov
27.
France
Russia Paris Moscow Napoleon Alexander Borodino Kutuzov Meeting: Battle of Austerlitz Meeting: Battle of Borodino Project: Invasion of Russia 27 © 2011 Alexander Troussov
28.
Diagram on the
previous slide … What it represents? How it can be used? 28 © 2011 Alexander Troussov
29.
France
Russia Paris Moscow Napoleon Alexander Borodino Kutuzov Meeting: Battle of Austerlitz Meeting: Battle of Borodino How this diagram could be used? 1.Network flow process could show the nodes most relevant to the pair “Napoleon” & “Meeting” - Selection WHO – whom to invite Project: - Other nodes – explain recommendations Invasion of Russia 2.When Napoleon opens email or a web page containing W&P he will be advised that the content of this resource is relevant to his project “Invasion of Russia”0 29 © 2011 Alexander Troussov
30.
Diagram on the
previous slide … What it represents? Data from Facebook, data from Napoleon’s Lotus Notes calendar, structure of a Wiki, network of collocations or relations between the entities in W&P, … – The proliferation of Web 2.0 and Enterprise 2.0 technologies has lead to the emergence of massive networks connecting people and various digital artifacts. These networks can be treated as a “weak” knowledge, which nevertheless might be used recommendations and even for such traditional applications as knowledge-based text processing Or instantiation of an ontology related to W&P by Leo Tolstoy – In which case we would probably know that Napoleon is emperor of France, Paris is the capital (not instantiation of a subclass) of France, etc. Ontology provides conceptualization, allow inferencing, but these advantages per se are useless without tedious manual work to encode the rules how to use this additional knowledge. While the knowledge encoded in the topology of the multidimensional network is ready to use provided that methods are tolerant to errors and inconsistencies in data - i.e. the methods are methods of “soft mathematic” – fuzzy inferencing, soft clustering, … 30 © 2011 Alexander Troussov
31.
Social Context =
Knowledge ? A New Mathematical Model of Horse Racing Assume, without the loss of generality, that each horse in the horse racing is modelled by a wooden ball of radius Ri. = a ball ? ☺ 31 © 2011 Alexander Troussov
32.
Representing social context
as a knowledge allows us to benefit from the experience of knowledge based applications. 32 © 2011 Alexander Troussov
33.
For instance, the
social context modeled as a network is not much different from semantic networks which are formed from concepts represented in ontologies. And it is possible to use such networks for knowledge based text processing. Representing social context as knowledge allows us to draw experience from such mature R&D area as knowledge-based text processing 33 © 2011 Alexander Troussov
34.
How to model
the social context As multidimensional networks – The primary source - network models of instantiations of techno-social systems As a “Knowledge” – represented as objects, clauses, XML, graphs, some combination of these 34 © 2011 Alexander Troussov
35.
The primary source
– network models of techno-social systems Invited Joined Log-files of Techno-Social systems (like Created Facebook or IBM’s Lotus Connections) keep track about who did what. Triples could be aggregated into a network. 35 © 2011 Alexander Troussov
36.
Examples of Graph
Models: Folksonomies: – Tripartite Hypergraph Social bookmarking systems (Del.icio.us, …) – Where to keep my bookmarks? – Users (actors), resources, tags In social bookmarking systems users describe bookmarks by keywords called tags. The structure behind these social systems, called folksonomies, can be viewed as a tripartite hypergraph of actors, tag and resource nodes. – Three types of citizens of the first class citizens, and hyperplanes – If hyperplanes are made from rubber, they could be schinked to a node, so the hyperplanes will also be citizens of the first class Advantages of the network models (see next slide) – Extensibility – Easy of merge heterogeneous information Source: Hypergraphs: see Jäschke et al. "Logsonomy — A Search Engine Folksonomy" MediaICWSM 2008AAAI Press (2008) 36 © 2011 Alexander Troussov
37.
Inferencing – “Soft
methods” could provide reliable inferencing For instance, the social context modeled as a network is not much different from semantic networks which are formed from concepts represented in ontologies. And it is possible to use such networks for knowledge based text processing. Representing social context as knowledge allows us to draw experience from such mature R&D area as knowledge-based text processing 37 © 2011 Alexander Troussov
38.
Natural Language Understanding
is Inferencing (?) From computational point of view natural language understanding is inferencing – Text which mentions Malahide is probably about Canada (??) Malahide (Canada 2006 Census population 8,828) is a township in Elgin County, Ontario, Canada Source: Troussov et al. MITACS, Canada, 2010 38 © 2011 Alexander Troussov
39.
Inferencing
Terms are ambiguous, and our knowledge is never “the truth, the whole truth, and nothing but the truth” – Malahide, Co. Dublin – Malahide is a township in Elgin County, Ontario, Canada. – Paradis Gisenyi Malahide is a hotel in Rwanda Solution (Troussov et al. MITACS, Canada, 2010 ): propagation from multiple concepts, for instance, the initial seed for the activation propagation starts at two nodes in a geographical taxonomy: Malahide (Ontario) and Malahide (Co. Dublin) as well as from other concepts mentioned in the text • Text which mentions Malahide and Europe – is a little bit more likely to be about Ireland than about Canada • Text which mentions Malahide and Clontarf – is more likely to be about Ireland than about Canada • … • Cohesive coherent text which mentions: Malahide, Mulhuddart, Lansdowne, Clontarf, Donabate - is almost for sure about Dublin Such rapid “phase transition” from uncertainty to certainty is similar to the transition related to percolation threshold 39 © 2011 Alexander Troussov
40.
from Uncertainty to
Certainty in Inferencing: phase transitions as a function of seed size in analogy to ones in percolation In (semantic) networks with high local density the reliability of inferencing from a single concept is almost never sufficient, reliability could be low when inferencing starts from a small number of seed concepts, but inferencing becomes very reliable at some level of the number of the initial seed concepts (which could be explained by combinatorics) Reliability of inferencing 40 Number of nodes in the seed © 2011 Alexander Troussov
41.
And could be
explained by combinatorics A graph showing the approximate probability of at least two people sharing a birthday amongst a certain number of people. In probability theory, the birthday problem, or birthday paradox, pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 366 (ignoring February 29 births). But perhaps counter-intuitively, 99% probability is reached with a mere 57 people, and 50% probability with 23 people. 41 © 2011 Alexander Troussov
42.
Simulation
The network (such as a taxonomy of geographical locations) is the tree of 20,000 nodes. Text is modeled as a list of 100 terms each of which is ambiguous and could be mapped into 8 network nodes. When such mapping happens, we consider that the node (the geographical location represented by the node) could be relevant to the text. We are looking for clusters such as the groups of N nodes each of them is mentioned in the text and the graph distance between each pair of nodes in the cluster is less than three. Such graph structures have low probability of occurrence for small N (N=1 or 2), and their probability sharply decreases to zero for bigger N; correspondingly, our certainty that the graph structure signifies the topicality of the text increases to 1.0 – Text which mentions Malahide, Mulhuddart, Lansdowne, Clontarf, Donabate - is almost for sure about Dublin (Ireland) Source: F. Darena and A. Troussov 2010 42 © 2011 Alexander Troussov
43.
Processes in Networks
How we study the Earth? – By looking at the results of the propagation of waves through the Earth Propagation of seismic wave in the ground and the effect of presence of land mine Similarly, one can study the networks by network flow methods – introducing the processes where something is flowing from node to node across the edges © 2011 Alexander Troussov
44.
Processses
Used goods- trail Money - walk Gossip - replication rather than transference (trails rather than walks) E-mail - diffusion by replication Attitudes - spread through replication rather than transfer Infection - spreads like gossip, but does not re-infect Packages - usually the shortest route possible Relevancy in semantic networks Trust - Shortest path or volume? 44 © 2011 Alexander Troussov
45.
45
© 2011 Alexander Troussov
46.
we are talking
about consumability of centrality measurements produced by network flow methods like these (DEMO) 46 © 2011 Alexander Troussov
47.
Key difference between
SNA and other approaches to social science Social sciences usually have focus on attributes of individual actors 47 © 2011 Alexander Troussov
48.
Key difference between
SNA and other approaches to social science SNA focus on relationships between actors “Social network analysis reflects a shift from the individualism common in the social sciences towards a structural analysis”. Garton et al. Studying Online Social Networks Structuralism is an approach to the human sciences that attempts to analyze a specific field (for instance, mythology) as a complex system of interrelated parts. лингвистс Романа Якобсона и Ник. Трубецкоj антрополог Леви-Стросс ~ Complex systems Sociogram: – Jacob Levy Moreno (1889-1974) was a Austrian-American leading psychiatrist and psychosociologist, thinker and educator, the founder of psychodrama, and the foremost pioneer of group psychotherapy. Among Moreno’s primary contributions to sociometrics was the sociogram. The sociogram is a method of representing individuals as points on graphs and using lines and arcs to represent the relationships between the individuals. Graphics from Prof. Hendrik Speck's tutorial at 5th Karlsruhe Symposium for Knowledge Management in Theory and Praxis, 2007 48 © 2011 Alexander Troussov
49.
Prominence
The study of structural properties of networks and their interplay with the processes taking place on the network is one of the main problems in the last years in the field of complex network analysis A primary use of graph theory in social network analysis is to identify “important” actors. Centrality and prestige concepts seek to quantify graph theoretic ideas about an individual actor’s prominence within a network by summarizing structural relations among the graph nodes. An actor’s prominence reflects its greater visibility to the other network actors (an audience). An actor’s prominent location takes account of the direct sociometric choices made and choices received (outdegrees and indegrees), as well as the indirect ties with other actors. The two basic prominence classes: – Centrality: Actor has high involvement in many relations, regardless of send/receive directionality (volume of activity) – Prestige: Actor receives many directed ties, but initiates few relations (popularity > extensivity) Source: Wasserman&Faust "Social Network Analysis“ (W&F) 49 © 2011 Alexander Troussov
50.
Centrality: Eigenvector Centrality
Eigenvector centrality was introduced by Phillip Bonacich in 1987 “Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on an SNA concept - Bonacich Power Centrality. – Bonacich (1987) hypothesized that someone's power in society depends on the power of his or her social contacts. Bonacich formalized this mathematically: ci = B(c1Ri1 + c2Ri2 + ... + cnRin) , where ci is the person in question, B is the magnitude of the effect, and Rij is the strength of the relationship between the person in question, i, and each of the other people, j, under consideration. If B=1 , the formula becomes eigenvector centrality, of which PageRank is a variant. Now, Page, et al. (1998) do not cite Bonacich, I am not claiming that they stole the idea - I am merely stating that a social network analyst appears to me to have been the first to think up the concept”. Solomon Messing http://www.stanford.edu/~messing/RforSNA.html 50 © 2011 Alexander Troussov
51.
Centrality and the
network flow methods Most of the centrality measurement are based on the network flow process, “that focuses on the outcomes for nodes in a network where something is flowing from node to node across the edges” (Borgatti and Everett, M. 2006 ] We interpret this “something” as a relevancy measure; for instance, the initial seed input value which shows nodes of interest in the network. Propagating the relevancy measure through outgoing links allows us to compute the relevancy measure for other network nodes and dynamically rank these nodes according to the relevancy measures. The same paradigm could be used to address the centrality measurements in social network analysis. Centralisation of the network can be achieved when we assume that all the nodes are equally important, and iteratively recompute the relevancy measure based on the connections between nodes. 51 © 2011 Alexander Troussov
52.
Master Equation
Numerical Solution Bonacich Power Centrality, Eigenvector Centrality, Google’s PageRank – “Google's workhorse search engine ranking algorithm, PageRank, is actually a variant on an SNA concept - Bonacich Power Centrality. Bonacich (1987) hypothesized that someone's power in society depends on the power of his or her social contacts. Bonacich formalized this mathematically: ci = B(c1Ri1 + c2Ri2 + ... + cnRin) , where ci is the person in question, B is the magnitude of the effect, and Rij is the strength of the relationship between the person in question, i, and each of the other people, j, under consideration. If B=1 , the formula becomes eigenvector centrality, of which PageRank is a variant. Now, Page, et al. (1998) do not cite Bonacich, I am not claiming that they stole the idea - I am merely stating that a social network analyst appears to me to have been the first to think up the concept”. Solomon Messing http://www.stanford.edu/~messing/RforSNA.html 52 © 2011 Alexander Troussov
53.
Master Equation
Numerical Solution Computation Master equation easily leads us to a numerical solution 53 © 2011 Alexander Troussov
54.
It is great
to have “the right master equation”! What is the shape of a hanging chain? – What is the shape of a hanging chain when supported at its ends and acted on only by its own weight? Plotting geometric arrangements and forces acting on small segments of the chain Integrating the results 54 © 2011 Alexander Troussov
55.
It is great
to have “the right master equation”! What is the shape of a hanging chain? What is the shape of a hanging chain when supported at its ends and acted on only by its own weight? • Galileo: “This chain will assume the form of a parabola” y=x2 Plotting geometric arrangements and forces acting on small segments of the chain Integrating the results 55 © 2011 Alexander Troussov
56.
It is great
to have “the right master equation”! What is the shape of a hanging chain? What is the shape of a hanging chain when supported at its ends and acted on only by its own weight? • Galileo: “This chain will assume the form of a parabola” y=x2 • But the shape is different: y = (a / 2) ( ex/a + e-x/a ) which was established later by applying calculus Plotting geometric arrangements and forces acting on small segments of the chain Integrating the results ." In 1669, Jungius disproved Galileo's claim that the curve of a chain hanging under gravity would be a parabola (MacTutor Archive). The curve is also called the alysoid and chainette. The equation was obtained by Leibniz, Huygens, and Johann Bernoulli in 1691 in Leibniz's solution is on the left. response to a challenge by Jakob Bernoulli”. Huygen's illustation is on the right. http://mathworld.wolfram.com/Catenary.html 56 © 2011 Alexander Troussov
57.
“Plotting geometric arrangements
and forces acting on small segments” evolved into – Finite difference method • In mathematics, finite-difference methods are numerical methods for approximating the solutions to differential equations using finite difference equations to approximate derivatives. – Stencil • In mathematics, especially the areas of numerical analysis concentrating on the numerical solution of partial differential equations, a stencil is a geometric arrangement of a nodal group that relate to the point of interest by using a numerical approximation routine. Stencils are the basis for many algorithms to numerically solve partial differential equations. 57 © 2011 Alexander Troussov
58.
Numerical Solution
NO Master Equation “Integrating” evolved into … – Well, in financial mathematics solutions are tuned on “stencils”. Numerical solutions are known. Master equation is not known, and is not interesting to know. “Master equation is not known” – this is ok. – But we need to be aware about emergency effects in complex systems: learning how to do something right in a small scale, doesn’t necessarily imply that we’ll do right things in a bigger scale 58 © 2011 Alexander Troussov
59.
Leibniz, Huygens, and
Johann Bernoulli knew geometry and mechanics. We don't know "geometry" and "mechanics” of techno-social systems (and we don’t even know "geometry" and "mechanics” of semantic network, social networks, …) but we can create small "nodal arrangements" modeling multidimensional networks (for instance, folksonomies) Apply known and novel numerical algorithms and utilize state of the art knowledge to decide which algorithms provides better results. The next step - to check if good properties of the numerical solutions on the micro-level hold true on the mezzo-level Source: Troussov at MITACS Workshop in Vancouver, Canada, 2010 59 © 2011 Alexander Troussov
60.
Recommender systems and
global/local ranking Link analysis is frequently employed for ranking and navigation Graph-based recommender systems should recommend “Important” objects (nodes, links, subgraphs) which are also are – Close enough to the initial points of interests (query, focus, initial seed) (for instance, in physical space) Global ranking ~ PageRank Breadth first search (BFS) ? Local Ranking !? Recommending a suitable restaurant near the NY 9th avenue (next slide) or the music you might like, the advertisement you should see, etc 60 © 2011 Alexander Troussov
61.
Graphics:
http://strangemaps.wordpress.com/2007/02/07/72-the-world-as-seen-from-new-yorks-9th-avenue/ 61 © 2011 Alexander Troussov
62.
Global Ranking (like
Google’s PageRank) – a view on the network from external point - modern, “Copernican” approach Source: NOAA 62 © 2011 Alexander Troussov
63.
Local Ranking –
is needed for recommenders – should rely on Ego- centered Ptolemaic view (actually, Poly-Centered, see next slide) LOCAL RANKING Ego-centered or "personal“ networks provide an Ptolemaic views of their networks from the perspective of the persons (egos) at the centers of their network. Graphics: http://strangemaps.wordpress.com/2007/02/07/72-the-world-as-seen-from-new-yorks-9th-avenue/ 63 © 2011 Alexander Troussov
64.
POLY-CENTRIC Poly-Centric
In physical space – navigation is from one point to another. In applications to virtual spaces - navigation is not simply browsing from a single object to another, but by dealing with several objects at the same time . For instance, to get better results in Google we add terms, we remove terms, … To compute recommendation “Whom invite to the meeting”, one can start navigation from two objects representing the user whom recommendation is for and the meeting in question 64 © 2011 Alexander Troussov
65.
.
Graph-based recommender systems should recommend “Important” objects (nodes) which are also located Close to the initial points of interests (query, initial seed) One of the leading approaches in recommenders is: Results of Global Ranking (Link analysis) are “filtered” according to their proximity to the query In this paper we introduce novel algorithms which could replace two step procedure mentioned above with one step: Local Ranking which simultaneously computes proximity and importance 65 © 2011 Alexander Troussov
66.
Web and Communities
Communities in Social Sciences: A tribe learning to survive, a group of engineers working on similar problems, … Communities in computer sciences - any empirically found group of people Recent advances in digital technologies invite consideration of organizing as a process that is accomplished by global, flexible, adaptive, and ad hoc networks that can be created, maintained, dissolved, and reconstituted with remarkable alacrity”. Prof. N. Contractor 66 © 2011 Alexander Troussov
67.
Community detection …
but What is a Community? Are you Russian? Yes. Are you Irish? Yes. Are you mathematician? Yes. Are you practitioner? Yes. – Communities easily overlap, multiple membership and fuzzy belongings At the same time, some communities SHOULD be kept separate – Remember “Strange Case of Dr Jekyll and Mr Hyde” (Robert Louis Stevenson, 1886). • How Google had failed to understand an essential property of real-world social networks • So by testing their social service inside a single context (Google employees only), the developers failed to notice that in real life, people participate in multiple contexts (family, work, friends, etc) that they work actively to keep separate. The reasons for wanting to keep these groups separate can range from wanting to keep an illicit affair secret from your spouse to political activists in oppressive regimes wanting to keep certain connections secret from the government. Another important reason to keep our communities separate, is that we often play different roles - and communicate differently http://www.iq.harvard.edu/blog/netgov/2010/03/worlds_colliding.html 67 © 2011 Alexander Troussov
68.
New methods for
community detection are needed Multiple membership – Are you Russian? Yes. Are you Irish? Yes. Are you mathematician? Yes. Are you practitioner? Yes. … Fuzzy-belongings – We don’t know the social structures behind on-line “communities” members of an on-line community don’t necessarily have the sense of identity as members of real-life social communities, on-line communities could be project teams or networks of knowledge, … High performance and scalability (agglomerative, local, …) – Clustering as simply partitioning is ruled out because of multimembership – Clustering as partitioning is not possible in real time for many business applications • IBM Intranet: 400K employee, 10K on-line communities (the biggest 23K members), ... Contextualisation of Community Detection – Collaborative filtering systems provide recommendations based on the detection of like- minded users. But the user of a techno-social system whom the prediction is for could be "Matematician", "Irish" etc., or a kind of Dr. Jeckyll / Mr. Hyde persons, etc.(see next 68 slide) © 2011 Alexander Troussov
69.
An example of
clustering around a node using propagation 69 © 2011 Alexander Troussov
70.
70
© 2011 Alexander Troussov
71.
Future work in
local dynamic clustering Troussov et al “Vectorised Spreading Activation” 2010 theorize that the future development of spreading activation (SA) methods might be driven by “physics-inspired” and “logic-inspired” algorithms – SA algorithms have roots in numerical simulation of various physics phenomena, particularly by finite difference methods. – From the other hand, the iterative procedure of SA is essentially the same as the procedure that determines the new state of a cell in cellular automata such as Conway’s Game of Life. Although cellular automata usually perform on rectangular (cubic, etc.) grids, the extension to arbitrary networks is feasible. ~ Marker propagation, MajorClust, Chinese whispers graph clustering algorithm, … 71 © 2011 Alexander Troussov
72.
Conway's Game of
Life 72 © 2011 Alexander Troussov
73.
Conway's Game of
Life 73 © 2011 Alexander Troussov
74.
Conway's Game of
Life 74 © 2011 Alexander Troussov
75.
Logic-inspired VSA
Finite difference approximations to differential equations were one of precursors of cellular automata (Stephen Wolfram "A New Kind of Science") and of the method of spreading activation (Troussov et al 2009) Iterative computational procedures in cellular automata are the same as in SA. The identity of the computational procedures allows to develop VSA algorithms with hybrid operations over the components of the activation vector. – For instance, “physical” operations could be responsible for the propagation of the activation around the initial seeds, the level of the activation indicates the relevancy of the nodes to the initial seeds. – “Logical” operations could propagate markers, which indicate potential belongings of nodes to clusters. Such hybrid operations will combine ranking with clustering; and is computationally efficient on massive networks since the major time consuming operations – retrieval of nodes – serve both “physical” and “logical” operations. The clustering does not involve partitioning of the whole network. 75 © 2011 Alexander Troussov
76.
VSA & Marker
propagation – combining ranking with clustering My University An Expert A topic I’m interested in 76 © 2011 Alexander Troussov
77.
VSA & Clustering
(Cont.) 77 © 2011 Alexander Troussov
78.
VSA & Clustering
(Cont.) 78 © 2011 Alexander Troussov
79.
VSA & Clustering
(Cont.) 79 © 2011 Alexander Troussov
80.
My University
An Expert A topic I’m interested in 80 © 2011 Alexander Troussov
81.
Tasks / Methods
Various terminology in various domains (for instance, from the point of view of IM many tasks falls into the category of hidden knowledge discovery) Multidimensional network Techno-Social Systems Networks Theory and Graph point of view (A.T.): tasks Theory terminology Recommender Systems Random walks Centralisation PageRank etc Eigenvector centrality Expertise location Recommender systems Motifs Local topology Link prediction Ad hoc generalisation across Expertise location Clustering dimensions Recommender Systems © 2011 Alexander Troussov
82.
Tasks
Avenues to deep socio-semantic analytics and the possibility of high- quality functionalities for techno-social systems (like recommending people to invite into your social network) hinge on the availability of engines which are able – to provide hidden knowledge discovery like • Structural importance of nodes • discovering a new relation in a network that based on the strength of multiple connectivity between the nodes of a social network one can conclude that Dr. Jekyll is related to Mr. Hide), • provide ad hoc generalisation across dimensions. • For instance, the ability to detect that a particular person might serve as an representative of a community or as an expert on a particular topic (the example of such generalisation is the expression frequently attributed to Louis XIV "L'e'tat s'est moi (I'm the State).") 82 © 2011 Alexander Troussov
83.
“Three steps away”
? John B. Axel P. Dan B. Tim B. Why recommender decided that this three steps away connection is a strong connection? 83 83 © 2011 Alexander Troussov
84.
John and Tim
– Recommender computes that this is a strong connection because of multiple ways of connections Shortest Path vs. Volume of traffic Friends-of-Friends Interest Workplace 84 84 © 2011 Alexander Troussov
85.
John and Derek Recommender
computes that such type of connectivity is a weak connection 85 85 © 2011 Alexander Troussov
86.
Tasks: Generalisation Across
Domains - Whom is Claudia connected with? All of these people Dirk Martin Claudia Elaine Researcher John Hanna 86 © 2011 Alexander Troussov
87.
Ranking
2 1 3 87 © 2011 Alexander Troussov
88.
Ranking
3 1 2 88 © 2011 Alexander Troussov
89.
Ranking
1 2 … … 89 © 2011 Alexander Troussov
90.
Nepomuk Recommender
NEPOMUK (Networked Environment for Personalized, Ontology-based Management of Unified Knowledge) is an open-source software specification that is concerned with the development of a social semantic desktop that enriches and interconnects data from different desktop applications using semantic metadata stored as RDF. Initially, it was developed in the EU 6th framework integrated project Nepomuk (2006-2008) - 17 million Euros, of which 11.5 million was funded by the European Union 90 © 2011 Alexander Troussov
91.
Nepomuk Recommender (Cont.)
Troussov et al “Social Context as Machine Processable Knowledge” presented the architecture of the hybrid recommender system in the activity centric environment Nepomuk- Simple (EU 6th Framework Project NEPOMUK). “Real” desktops usually have piles of things on them where the users (consciously or unconsciously) grouped together items which are related to each other or to a task. The so called “Pile” UI, used in the Nepomuk-Simple imitates this type of data and metadata organisation which helps to avoid premature categorisation and reduces the retention of useless documents. Metadata describing the user data are stored in the Nepomuk personal information management ontology (PIMO). Proper recommendations, such as recommendation of additional items to add to the pile, apparently should be based on the PIMO, on the textual content of the items in the pile. Although methods of natural language processing for information retrieval could be useful, the most important type of textual processing are those which allows to related concepts in PIMO to the processed texts. Since PIMO changes over the time, this type of natural language processing can’t be performed as preprocessing of all textual context related to the user. Hybrid recommendation needs on-the fly textual processing with the ability to aggregate the current instantiation of PIMO with the results of textual processing. 91 © 2011 Alexander Troussov
92.
Nepomuk
Representing and modeling this ontology as a multidimensional network allows to augment the ontology on the fly by new information, such as the “semantic” content of the textual information in user documents. Recommendations in the Nepomuk-Simple are computed on the fly by graph-based methods performing in the unified multidimensional network of concepts from the personal information management ontology augmented with concepts extracted from the documents pertaining to the activity in question. Troussov et al. 2008 classify Nepomuk-Simple recommendations into two major types. – The first type of recommendations is recommendation of the additional items to the pile, when the user is working on an activity. – The second type of recommendations arises, for instance, when the user is browsing Web; the Nepomuk-Simple can recommend that current resource might be relevant to one or more activities performed by the user. In both cases there is a need to operate with Clouds (fuzzy sets of PIMO nodes): Clouds describe topicality of documents in terms of PIMO, the pile itself is a Cloud. 92 © 2011 Alexander Troussov
93.
Pile UI 93
© 2011 Alexander Troussov
94.
Nepomuk use case:
activity management A user started to work on a new project CID. Using the Nepomuk SSD, she collects a “pile” of resources she needs while working on the project: MS-Word documents, contacts, etc by drag-and-dropping resources from her desktop, by linking resources from e-mail (Mozilla Thunderbird) and web browser (Firefox) applications. 94 © 2011 Alexander Troussov
95.
Nepomuk use case:
activity management using IBM recommender codenamed “Galaxy” Galaxy (IBM hybrid recommender) analyses the pile content and linkage structure as a multidimensional network of concepts extracted from documents and links between concepts, projects, project participants, meetings, document authors, … . and provides handy recommendations of resources she might possibly need 95 © 2011 Alexander Troussov
96.
Nepomuk use case:
activity management Galaxy can spot what the user might miss: “This web page might be relevant to your CID activity” 96 © 2011 Alexander Troussov
97.
Thank you !
© 2011 Alexander Troussov
Baixar agora