Anúncio
Anúncio

A Perspective on Graph Theory and Network Science

1. A Perspective on Graph Theory and Network Science Marko A. Rodriguez http://markorodriguez.com http://twitter.com/twarko http://www.slideshare.net/slidarko Santa Fe Public School District – Santa Fe, New Mexico – July 6, 2010 July 5, 2010
2. Abstract The graph/network domain has been driven by the creativity of numerous individuals from disparate areas of the academic and the commercial sector. Examples of contributing academic disciplines include mathematics, physics, sociology, and computer science. Given the interdisciplinary nature of the domain, it is diﬃcult for any single individual to objectively realize and speak about the space as a whole. Any presentation of the ideas is ultimately biased by the formal training and expertise of the individual. For this reason, I will simply present on the domain from my perspective—from my personal experiences. More speciﬁcally, from my perspective biased by cognitive and computer science. This is an autobiographical lecture on my life (so far) with graphs/networks.
3. The Graph/Network The term graph is used primarily in mathematics and the term network is used primarily in physics. Both refer to a type of structure in which there exists vertices (i.e. nodes, dots) and edges (i.e. links, lines). There are numerous types of graphs/networks which yield more or less expressivity (i.e. more or less structure).
4. The Purpose of a Graph for Mathematicians • Mathematicians are concerned with the abstract structure of a graph. • Mathematicians deﬁne operations to analyze and manipulate graphs. Moreover, they develop theorems based upon structural axioms.
5. The Purpose of a Network for Physicists • Physicists are concerned with modeling real-world structures with networks. • Physicists deﬁne algorithms that compress the information in a network to more simple values (e.g. statistical analysis).
6. Much of the World has a Graphical/Network Structure • Social networks: deﬁne how persons interact (collaborators, friends, kins). • Biological networks: deﬁne how biological components interact (protein, food chains, gene regulation). • Transportation networks: deﬁne how cities are joined by air and road routes. • Dependency networks: deﬁne how software modules use each other. • Communication networks: deﬁne the relationships between Internet routers. • Language networks: deﬁne the relationships between words.
7. The Tour • University of California at San Diego (1997-2001) • University of California at Santa Cruz (2001-2007) • Vrije Universiteit Brussel (2004-2005) • Los Alamos National Laboratory (2005-2010) • AT&T Interactive (2010-Present)
8. Undergrad at the University of California at San Diego • Studied Cognitive Science (B.S.) and Computer Music (Minor) at the University of California at San Diego. (1997-2001)
9. Cognitive Science at UCSD • Neural networks: simpliﬁed models of how the brain encodes and processes information.1 Neural networks exclude seemingly non-relevant aspects of the biological counterpart (e.g. neurotransmitters, axon/soma/dendrite distinctions). No two signals on the brain are ever the same, yet we perceive a consistent (object-oriented) world. Can be generally applied to classiﬁcation irrespective of the signal being “human oriented” (e.g. non-sensory information). Neural networks are usually trained through experience. 1 Please see: http://arxiv.org/abs/0811.3584
10. Cognitive Science at UCSD Neural Network Classiﬁcation of the Signal Signal from the World Mice cortical networks are grown on multi-electrode arrays in order to study the information properties of the structure through its development (left – done at LANL during my PostDoc). Artiﬁcial neural networks are simpliﬁed models of the suﬃcient components needed to process and classify information (right).
11. Computer Music at UCSD • Spatial compositions: focused on the composition of music which accounted for/represented sound in 3D space. Amplitude (loud/quiet), pitch (high/low), timbre (guitar/drum), but what about music beyond stereo (left/right)? Developed algorithms to “trick the ear” into hearing sounds at particular points in space. Made use of a data ﬂow sound processing language called Max/MSP (see http://cycling74.com/). ∗ Data ﬂow languages allow one to deﬁne “process graphs” (dependencies between functions represented as a graph).
12. Computer Music at UCSD My data ﬂows programs (left) take/generate sound, process it algorithmically, and emit it through a 6-channel circular surround sound system (right). My senior thesis was a live concert using a computer music system I developed called Monkey Space Colony 6.
13. Graduate at the University of California at Santa Cruz • Studied Computer Science (M.S. and Ph.D.) at the University of California at Santa Cruz. (2001-2007)
14. Collective Intelligence at UCSC • Collective decision making: applications of collective intelligence to the design of techo-government architectures.2 (2001-2004) We do not have the same restrictions as our founding fathers (e.g. communication limited by space). Is it possible to remove the representative layer of government by leveraging expertise/representation in social networks? What does a modern day direct democracy look like? Can any actively participating subset of the population yield an accurate model of the population as a whole. Maintaining ﬁdelity in that subset model is the point of dynamically distributed democracy. 2 Please see: 1.) http://arxiv.org/abs/cs/0412047 2.) http://arxiv.org/abs/cs/0609034 3.) http://arxiv.org/abs/0901.3929 4.) http://escholarship.org/uc/item/04h3h1cr
15. Collective Intelligence at UCSC 0.20 correct decisions 0.00 0.05 0.10 0.15 0.95 direct democracy dynamically distributed democracy 0.80 proportion oferror 0.65 dynamically distributed democracy direct democracy 0.50 100 90 80 70 60 50 40 30 20 10 100 90 80 70 60 50 40 30 20 10 0 0 percentage of active citizens percentage of active citizens (n) Fig. 5. The relationship between k and evote for direct democracy (gray k line) and dynamically distributed democracy (black line). The plot provides People do not vote for a representative. Instead, theyproportion of identical, correct decisions over a ideas they respect the maintain a ego-network of whose simulation that was run in with 1000 artiﬁcially generated networks composed of 100 citizens each. certain domains (e.g. health care, military, etc.). People in one’s network can be friends, family members, Fig. 6. A visuali citizen’s color deno scientists, public ﬁgures, etc. Any one, through the Internet, can vote on any decision. However, the is 1, and purple i Reingold layout. moment they abstain from voting, their vote power is transferred stated, lettheir network (according to the As previously through x ∈ [0, 1]n denote the political tendency of each citizen in this population, where xi is the domain of decision). Power aggregates at those that participate in theand, for the purpose of simulation, is tendency of citizen i current decision. determined from a uniform distribution. Assume that every 1 n “vote power citizen in a population of n citizens uses some social network- such that the to based system to create links to those individuals that they 1. Let y ∈ Rn+ believe reﬂect their tendency the best. In practice, these links ﬂowed to each may point to a close friend, a relative, or some public ﬁgure a ∈ {0, 1}n de whose political tendencies resonate with the individual. In in the current other words, representatives are any citizens, not political values of a are
16. Visiting Researcher at the Vrije Universiteit Brussel • Studied collective intelligence as a Visiting Researcher at the Center for Evolution, Complexity, and Cognition of the Vrije Universiteit Brussel. (2004-2005)
17. Collective Intelligence at the Vrije Universiteit Brussel • Automating the scholarly process: Designed algorithms that exploit bibliographic networks in order to support the scholarly communication process. (2004-2005)3 Can the network of scholars, articles, journals, universities, conferences, funding sources, etc. be leveraged to algorithmically support the scholarly process? ∗ Can you ﬁnd me articles related to my interests? ∗ Can you ﬁnd me collaborators to work with me on my ideas? ∗ Can you ﬁnd me a venue to publish my work in? ∗ Can you ﬁnd me experts to peer-review a submitted article? ∗ Can you ﬁnd me people to talk to (and concepts to talk about) at the conference I’m going to? 3 Please see: 1.) http://arxiv.org/abs/cs/0601121 2.) http://arxiv.org/abs/cs/0605112 3.) http://arxiv.org/abs/0905.1594
18. Collective Intelligence at the Vrije Universiteit Brussel Example: Determining experts to peer-review an article can be done automatically and with a sensitivity to conﬂict of interest situations. The spreading activation algorithm used is analogous, in many ways, to neural networks. Can we think of the networks we (as a society) implicitly create as a some sort of “collective neural substrate?” Can we then apply similar algorithms that are found in biological systems? Can our implicitly generated networks serve as a substrate for problem-solving?
19. Graduate Researcher at Los Alamos National Laboratory • Studied bibliometrics as a graduate student on the Digital Library Research and Prototyping Team of the Los Alamos National Laboratory. (2005-2007)
20. Bibliometrics at Los Alamos National Laboratory • Bibliometrics: the study of the scholarly process through the digital footprint left by scholars — (“the science of science”) (2005-2007)4 Wrote my dissertation while with the Digital Library Research and Prototyping Team (Johan Bollen, Herbert Van de Sompel, and Alberto Pepe). A very fruitful time in my academic career. Continued my work with problem-solving in scholarly networks. Studied how scholars use information by studying how they download articles (see http://mesur.org). 4 Please see: 1.) http://arxiv.org/abs/cs/0601030 2.) http://arxiv.org/abs/0708.1150 3.) http://arxiv.org/abs/0804.3791 4.) http://arxiv.org/abs/0801.2345 5.) http:// arxiv.org/abs/0807.0023 6.) http://dx.doi.org/10.1371/journal.pone.0004803 7.) http: //arxiv.org/abs/0911.4223 8.) http://arxiv.org/abs/cs/0605110
21. Bibliometrics at Los Alamos National Laboratory Each vertex (node) is a particular journal. Colors denote the journal domain. A directed edge (link) denotes that a scholar read an article in journal A then one in journal B . This map provides us a collectively generated representation of the knowledge transfer between domains (i.e. “folksonomy” of domains).
22. Web of Data at Los Alamos National Laboratory • Web of Data: the representation of the world’s data within the global URI (super class of URL) address space.5 For the most part, data is local to a computer with no easy way for data on one computer to reference data on another. ∗ The World Wide Web provided a way to link documents across computers, but what about data? By placing data “on the Web” in a similar manner to how we place documents on the Web, we can turn the Web into a distributed database. ∗ This heterogenous network/graph of data opens the door to new types of problem-solving. 5 Please see: 1.) http://arxiv.org/abs/0904.0027 2.) http://arxiv.org/abs/0908.0373 3.) http://arxiv.org/abs/1006.1080 4.) http://arxiv.org/abs/0905.3378 5.) http://arxiv.org/ abs/0704.3395 6.) http://arxiv.org/abs/0802.3492 7.) http://arxiv.org/abs/0903.0194
23. Web of Data at Los Alamos National Laboratory data set domain data set domain data set domain audioscrobbler music govtrack government pubguide books bbclatertotp music homologene biology qdos social bbcplaycountdata music ibm computer rae2001 computer bbcprogrammes media ieee computer rdfbookmashup books budapestbme computer interpro biology rdfohloh social chebi biology jamendo music resex computer crunchbase business laascnrs computer riese government dailymed medical libris books semanticweborg computer dblpberlin computer lingvoj reference semwebcentral social dblphannover computer linkedct medical siocsites social dblprkbexplorer computer linkedmdb movie surgeradio music dbpedia general magnatune music swconferencecorpus computer doapspace social musicbrainz music taxonomy reference drugbank medical myspacewrapper social umbel general eurecom computer opencalais reference uniref biology eurostat government opencyc general unists biology ﬂickrexporter images openguides reference uscensusdata government ﬂickrwrappr images pdb biology virtuososponger reference foafproﬁles social pfam biology w3cwordnet reference freebase general pisa computer wikicompany business geneid biology prodom biology worldfactbook government geneontology biology projectgutenberg books yago general geonames geographic prosite biology ...
24. Web of Data at Los Alamos National Laboratory homologenekegg projectgutenberg homologenekegg projectgutenberg symbol libris symbol libris bbcjohnpeel unists chebi cas diseasome dailymed w3cwordnet cas bbcjohnpeel diseasome dailymed pubchem mgi hgnc omim unists eurostat wikicompany geospecies w3cwordnet geneid drugbank chebi worldfactbook reactome pubmed magnatune opencyc hgnc freebase pubchem eurostat uniparc linkedct taxonomy uniprot geneontology interpro mgi omim wikicompany geospecies uniref pdb umbel pfam yago dbpedia geneid govtrack bbclatertotp prosite reactome drugbank worldfactbook prodom flickrwrappropencalais uscensusdata magnatune pubmed surgeradio opencyc uniparc lingvoj linkedmdb virtuososponger freebase rdfbookmashup linkedct uniprot musicbrainz taxonomy dblpberlinswconferencecorpus geonames interpro myspacewrapper uniref revyu geneontology pubguide pdb umbel rdfohloh jamendo yago bbcplaycountdata pfam dbpedia bbclatertotp govtrack semanticweborg siocsites riese foafprofiles prosite dblphannover openguides prodom audioscrobbler bbcprogrammes flickrwrappropencalais crunchbase doapspace uscensusdata surgeradio budapestbme flickrexporter qdos lingvoj linkedmdb virtuososponger semwebcentral eurecom ecssouthampton dblprkbexplorer newcastle rdfbookmashup pisa rae2001 eprints irittoulouse swconferencecorpus geonames musicbrainz myspacewrapper laascnrs acm citeseer ieee dblpberlin pubguide resex ibm revyu jamendo rdfohloh bbcplaycountdata Each vertex (node) represents a data set. A directed edge (link) denotes that data set A semanticweborg siocsites riese foafprofiles makes reference to data in data set B . openguides audioscrobbler bbcprogrammes dblphannover crunchbase doapspace flickrexporter budapestbme qdos semwebcentral eurecom
25. Web of Data at Los Alamos National Laboratory Application 1 Application 2 Application 3 Application 1 Application 2 Application 3 processes processes processes processes processes processes Web of Data structures structures structures structures structures structures 127.0.0.1 127.0.0.2 127.0.0.3 127.0.0.1 127.0.0.2 127.0.0.3 Data is currently in silos (left). For example, Amazon.com can only recommend other Amazon.com products. What about recommending a job to take based upon the books you read, the people you know, etc. (right). Can a collectively generated model of the world help people to ﬁnd their place in the life? (http://bit.ly/cLWL3F)
26. Web of Data at Los Alamos National Laboratory urn:uuid: rdf:type demo:Human 4fa0f752 hasMethod "example"^^xsd:string Method urn:uuid: xsd:boolean RVM xsd:boolean hasMethodName 6e400b42 [1] [1] hasBlock methodReuse halt Block urn:uuid: 4e0bada0 programLocation Fhat nextInst operandTop hasFrame Equals returnTop urn:uuid: Block 51b8d4a0 urn:uuid: [0..1] [0..1] [0..1] falseInst currentFrame 67bbd072 [0..1] [0..1] nextInst Operand Instruction ReturnStack Branch Block nextInst Stack hasLeft urn:uuid: urn:uuid: PushValue rdf:rest rdf:rest blockTop trueInst rdf:ﬁrst [0..1] [0..*] 51b8d4a0 610eb4b0 rdf:ﬁrst urn:uuid: 6d451a1e [0..1] hasRight nextInst [0..1] forFrame Frame [1] LocalDirect PushValue hasValue rdfs:Resource Instruction urn:uuid: urn:uuid: LocalDirect rdf:li 54e14d4c 5c4d5bc2 urn:uuid: LocalDirect [0..*] 62e8b8dc hasURI urn:uuid: hasValue Frame [0..1] Block [0..1] 5869b878 LocalDirect hasURI nextInst Variable Stack hasURI urn:uuid: "a"^^xsd:string 6425e5ec rdf:rest hasSymbol hasValue fromBlock nextInst rdf:ﬁrst "2"^^xsd:int hasURI "marko"^^xsd:string [0..1] [1] [0..*] [1] Return urn:uuid: urn:uuid: 008e999a 0748e1c6 "1"^^xsd:int Block xsd:string rdfs:Resource Block Return A more esoteric body of work was developed at this time that dealt with the encoding of not only data into the Web of Data, but also process. This included the distributed representation of computing instructions (left) and virtual machines (right).
27. PostDoc Researcher at Los Alamos National Laboratory • Studied graph theory and ethics as a Director’s Fellow PostDoc at the Center for Nonlinear Studies of the Los Alamos National Laboratory. (2007-2010)
28. Path Algebra at Los Alamos National Laboratory • Path Algebra: concerned with how to move through a graph in an intelligent, directed manner in order to solve problems using graphs.6 The algebra contains a set of elements: vertices and edges. The algebra contains a set of operations: traverse, ﬁlter, clip, merge, split, not, etc. The algebra provides a theory for how to develop graph traversal engines (i.e. graph processors). 6 Please see: 1.) http://arxiv.org/abs/0806.2274 2.) http://arxiv.org/abs/0803.4355 3.) http://gremlin.tinkerpop.com 4.) http://pipes.tinkerpop.com
29. Path Algebra at Los Alamos National Laboratory The general theme of controlling how a walker moves through a graph has numerous applications including searching, ranking, scoring, recommendation, etc. within a graph.
30. Eudaemonics at Los Alamos National Laboratory • Eudaemonics: an ethical theory stating that it is everyone’s moral responsibility to be “happy” (i.e. to live engaged in the world). See the work of Aristotle and David L. Norton.7 Are recommender systems evolving to become eudaemonic engines? ∗ Movies (e.g. NetFlix), books (e.g. GoodReads), life partners (e.g. Match.com), careers (e.g. Montster), etc. ∗ Can we interrelate all this data and traverse it for problem-solving? 7 Please see: 1.) http://arxiv.org/abs/0903.0200 2.) http://arxiv.org/abs/0904.0027
31. Graph Systems Architect at AT&T Interactive • Work in theoretical and applied models of problem-solving with graph traversals and graph databases. (2010-present)
32. Graphs at AT&T Interactive • Graph Traversal: the development of theories and applications of graph traversals in real-world problem-solving situations.8 Continue to work on path algebra (extensions to include a non-matrix based, ring theoretic model and a diﬀusion model). Continue to work on open source graph-related technologies to support graph related eﬀorts at AT&Ti (see http://www.tinkerpop.com). • Recommender Systems: the development of applications for real-time, “themed” recommendations (i.e. a problem-solving graph engine). AT&Ti maintains a collection of interesting data sets. Make use of such data for numerous types of recommendation. 8 Please see: 1.) http://arxiv.org/abs/1004.1001 2.) http://arxiv.org/abs/1006.2361
33. Conclusion • Graphs/networks touch numerous disciplines. • Many aspects of the world can be modeled as a graph/network. • Graph traversal algorithms show promise as a general-purpose style/pattern for computing.
Anúncio