How software developers need to manage metadata and data dictionaries to make software integration faster and more cost effective. This presentation is a general overview of the concepts around data semantics for college-level students. This presentation was originally created for a seminar at Carleton College.
Scanning the Internet for External Cloud Exposures via SSL Certs
Semantic Integration Patterns
1. Patterns of Semantic Integration Dan McCreary President Dan McCreary & Associates dan@danmccreary.com (952) 931-9198 M D Metadata Solutions
2. Licensed Under Creative Commons 3.0 2 Creative Commons 3.0 Attribution. You must attribute the work in the manner specified by the author or licensor. Noncommercial. You may not use this work for commercial purposes. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. BY: $
3. Patterns of Semantic Integration Our ever increasing understanding of solid-state physics has allowed Moore’s Law to proceed unabated for the last 40 years. Exciting developments in quantum physics, nanotechnology and molecular self-assembly will continue this trend for the foreseeable future. But why is it that an instructor can’t quickly import a database of 10,000 subject-appropriate lesson plans and quiz items into their learning-management system and dynamically adjust classroom content and assessments to individual student learning styles and interests? The key to this and other computer-to-computer interoperability challenges lie in the difficulty computer systems have in finding and precisely exchanging data. Enter the Semantic Web. The designers of the current world-wide-web realized that the gateway to this does not require faster computers and networks but instead lies in the careful publishing and exchange of data semantics (or meaning) and the precise publishing data-that-describes-data (metadata) in a machine-readable structure. This presentation will review patterns that researches around the world are using to make the job of computer integration easier allowing even ultimate frisbee™ coaches access to vast amounts of structured information. 3
4. Background for Dan McCreary Carleton Class of ’82 Physics Major First year of “Computer Science Concentrations” ever granted to a Carleton graduate Worked in computer center and Carleton Library with Les Lacroix doing VMS/RMS programming to create first on-line card catalog for science library Helped blow up lab equipment for Bruce Thomas Semantic Solutions Consultant in Minneapolis 4
6. 6 Physics 123 … intended to give students some perspective on the kinds of work done by people with a physics background…discuss their work and work-related experiences Physics taught me how to create and use precise models of the world and to discover underlying patterns Computer to computer communication also requires precise models the discovery of underlying patterns
7.
8. 8 Bruce’s Integration Challenge The PDP-8 Gamma Ray Spectrometer Uranium samples from Columbia mines Ohio Scientific 6502 Carleton VAX 1024 ChannelAccumulator FFT (Fortran) Tektronics 4014 Terminal 8=bitteletype port RS-232 port
9. 9 1970 Sci-Fi Classic: “The Forbin Project” A New Intersystem Language! Lesson: Before you take over the world you mustexchange semantically precise metadata!
10. 10 Moore’s Law Note: Log Scale Creative Commons 1.0 Courtesy of Ray Kurzweil and Kurzweil Technologies, Inc
11. 11 Thesis: We Need Semantics For the next revolution in computing We don’t need faster CPUs We don’t need larger hard drives We don’t need faster networks We don’t need more HTML linking We need to link our concepts using semantic technologies There are standard patterns that are used to solve these problems
12. 12 Patterns “Design Patterns” were developed by Christopher Alexander in 1979 in the building architecture domain Applied by “Gang of Four” to object-oriented software in 1994 Each pattern has: Name, Icon Problem Description Solution Description Diagrams Examples Related Patterns
13. 13 The Agent Vision The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. The Semantic Web A new form of Web content that is meaningful tocomputers will unleash a revolution of new possibilities By Tim Berners-Lee, James Hendler and Ora Lassila
14. Overlapping Terminology Data Mining Statistical Analysis HTML Web PatternDiscovery Business Semantics Data Dictionary Data Warehouse Enterprise Application Integration (EAI) SemanticWeb Relational Database Metadata Metadata Discovery 14
15. XML GUI Proc(i1, i2, o1) Object-orientedProgramming DO I=1, 100I=I+1 StructuredProgramming MOV R0, A1BNE F32C FORTRAN 10100101 AssemblyLanguage MachineLanguage Computer Science Is About Abstraction Level ofAbstraction Time 15
16. 16 Person to Person Dialog higherabstraction Problem Solving Conversation Sentences Concepts Words Sound
17. 17 Computer to Computer Dialog You Are Here Agents Semantic Integration Graphs/Ontologies/RDF/OWL Documents/XML Schema XML Tags Internet
18. 18 Semantic Triangle A pattern of neural activity in our brain Concept Refers To Symbolizes Symbol Referent “cat” “gato” (Spanish) Stands For “katze” (German) Physical Objects Ogden, C. K., & Richards, I. A. (1923) The Meaning of Meaning
19. 19 Symbols Can Only Directly Link to Concepts The link between a symbol is an INDIRECT link The referent MUST pass through the Concept Only symbols can be transmitted between computers Concept Referent Symbol “cat” Ogden, C. K., & Richards, I. A. (1923) The Meaning of Meaning
20. 20 The Problem of Semantic Ambiguity context=hardware context=food Did you say you were looking for mixed nuts? People use context to derive the correct meaning.
21. 21 59 meanings of "run" Context tally "the Yankees scored a run in the bottom of the 9th" test "The experiment ran for over an hour" footrace "she broke mile run record" 18 noun "senses" streak "her run of luck was just starting" play "the football 3rd down play was a run" … "13 other noun meanings…" "run" "the kids ran to the store" move fast scat "I would run from a ticking bomb." 41 verb "senses" go "The path runs up the hill." operate "you need training to run this machine." has form "the movie plot runs like this." … "36 other verb meanings…" Source:WordNet at http://wordnet.princeton.edu/
22. 22 Analogy: English Dictionary Term Metadata (data about data) Definitions Note: people use context to find the correct meaning. source: www.m-w.com
23. 23 Word Senses footrace streak duration play test go operate tally move fast has form scat A single word maps To many concepts “run”
24. 24 Synonym Ring Joe Smith Refers To Symbolizes Many symbols forthe same object Stands For <Person>Joe Smith<Person> <Individual>Joe Smith<Individual> <Human>Joe Smith<Human>
25. 25 I’m Thinking of an Animal… Note: since “concepts” are neural patterns in the brain theconcept of “exact” is difficult to measure It has four legs It has fur It has whiskers It chases mice It goes “meow” If you describe enough of the properties of a concept, you can havereasonable assurances that they are the same
26. 26 Concept Linking symbol Question: How can you tell if two concepts are the same if twosystems don’t share the same symbol? Answer: If they have the same properties (and relationships) you can assume with reasonable probability they arethe same concepts
28. 28 Semantics is About Concept Linking Wouldn’t it be nice… If computers could name things internally or on a web site however they liked (keep using the current web) But we could always link those names back to a centralized database of concepts Computers could do this automatically just like they translate domain names (www.google.com) into IP addresses (64.233.187.99) Then we could communicate precisely without dictating the names that are used inside a computer system or on a web page
29. 29 HTML Sample <title>The Problem of Semantics</title> <p>This is a standard document that is sent between two computers using the <a href="http://w3c.org/Protocols">HTTP<a> protocol. Note that other then the markup tags like <b>bold</b> there is very little that a computer can do to understand the meaning of the text.</p> Unless computers "understand" the words in the English language it will be very difficult for them to understand the meaning or semantics of the web.
30.
31.
32. 32 Which external computers may not understand <PersonGivenName>Dan</PersonGivenName> <PersonFamilyName>McCreary</PersonFamilyName> <Address>123 Main Street</Address> <City>Minneapolis</City> <Phone>(651) 555-1234</Phone> Without a “data dictionary”, it is difficult to know what the meaning of the data elements is. The tags appear in patterns but what they mean is still a mystery to a computer.
33. 33 Metadata Metadata & Ontologies Metadata is any data that describes other data Metadata is itself data and is stored in specialized structures (directed graphs) to aid comparison with other metadata A controlled store of metadata is called a “registry” Complex directed graphs can evolve into “ontologies” describes Data source-code RDBMS web navigation tables org-chart columns document keywords product-specs
34. 34 Hypertext Links and Data Element Links The Hypertext Web MetadataRegistry A MetadataRegistry B The Semantic Web The semantic web is about linking conceptual data elements in published metadata registries The current HTML web is focused on linking published documents with HTML
35. 35 Enter the URI… Today's web allows documents to be accessed by people if people put links in between documents – the hypertext web But it is very difficult for machines to "understand" what we are saying and what we mean and what to do with the data But machines CAN determine if two URIs match: <SurName>Smith<SurName> <LastName>Smith</LastName> Hey, you both “mean” the same thing! http://www.shared_dictionary.com/PersonGivenName MDR
36. 36 Subject-Verb-Object Triple Person Has-a-Given-Name The person is named “Joe”. “Joe” <PersonGivenName>Joe</PersonGivenName>
37. 37 Triples are Almost all URIs http://MyDictionay/DataElement/Person http://MyDictionay/DataElement/PersonGivenName “Dan” The “type” of link. URIs can point to a standard location in a metadata registry.
42. 40 Semantic Web Standards Stack Trusted Semantic Web Proof Logic Rules/Query Signature Encryption Ontology (OWL) RDF Model & Syntax XML Query XML Schema XML Namespaces URI/IRI Unicode Source: Tim Berners-Lee www.w3c.org http://www.w3.org/Consortium/Offices/Presentations/SemanticWeb/34.html
44. 42 Hub and Spokes Goal: create semantic maps to a few metadata standard, not many standards R1 R1 R2 RN R2 RN ESB R3 R3 R7 R7 R4 R6 R4 R6 R5 R5 Mapping from one to many metadata registry to N other metadata registries: The O(N2) problem Mapping to one metadata registryThe O(N) problem (ESB-Enterprise Service Bus)
45. 43 Metaphor: The Translator Agent Coming right up! May I have a beer? Me gusteria una cerveza Translation Service (Speaks Spanishand English) Internal Server (English Only) Customer (Spanish Only)
46. 44 Metadata Registry Metadata Translation Service RDF Queries Metadata Mappings XML Results Model A Model B SQL or XMLA Queries In ModelB Data Warehouse (RDBMS) XMLResponse In Model A TDS In ModelB Semantic Mappers and Semantic Brokers Report Request In Model A XMLA: XML for Analysis Gartner: Vocabulary-based transformation
47. 45 Wikipedia Rocks! Knowledge is growing at an exponential rate The more there is out there, the more need there is to re-use rather that reinvent knowledge Tools can extract 50M RDF triples How many instructors share their database of exam questions and the effectiveness of each question? See: Wikipedia: “Semantic Wiki”
49. 47 Retrieving Data: An Evolution Increasing Responsiveness Monthly “Green Bar” Reports BrowseableGraphical Interface (PivotTables, Cognos) Shorten the time-to-report interval Allow users to "browse" data sets interactively Remove programmers with "backlogs" of reports Users frequently waited days, weeks for months to get a custom report created
50. 48 Metadata Discovery Tools that “scan” data sources and create new ontologies or mappings to existing ontologies Relational Database Metadata Registry Data Source Mappings
51. 49 Classification and Categorization Whenever we decide to break the continuous observable world into a predefined list of categories when each category has a label we call this a categorical value. These will then become the "dimensions" of our cube. Discrete breaks in continuous values become “rules” "green" "red" "blue" Note: NO OVERLAP! $500 $0 “normal expense" “large expense“ (requires supervisor approval) George Lakoff: Women, Fire and Other Dangerous Things: What Categories Revel about the Mind
54. 51 Cost of Poor Semantics Information Technology Departments can spend 40-60% of their costs on Integration 90% of integration costs are due to poor semantics If every application used and "published" a machine readable ontology with mappings to published ontologies integration could be almost "automatic"
55. 52 Gartner Metadata cast into formal logics will drive interoperability, automation, cost cutting, better search capabilities and new business opportunities. Semantic Web Drives Data Management, Automation and Knowledge and Discovery Alexander Linder March 2005 G00125145
57. 54 Structures for Increased Semantics HTML PDF Word PowerPoint Excel Access Server XML RDBMS RDF Taxonomies Ontologies SOA WSDL Increased Semantic Precision Source: Network Inference
63. 56 Ontology Architectures One "big" ontology (see CycCorp cyc.com) Using a single "Uber-Ontology" Akin to "Boiling the Ocean" Compared to: Many smaller ontologies Micro-formats (RDF/A) How to combine? CYC contains over 3 Million "assertions" Source: cyc.com
64. 57 If You Give A Kid A Hammer… …the whole world becomes a nail People solve problems with the tools they know Semantics are new tools for solving computer-to-computer communication problems Intelligent agents will be prevalent when we teach organization to publish their metadata Example: Procedural vs. Declarative Programming
65. 58 Cognitive Styles The way we solve problems is dependant on the tools we know how to use. Shoshana Zuboff (1988) In the Age of the Smart Machine Technology creates: - new ways of thinking - new ways of approaching and solving problems - new sets of "Cognitive Styles" It is only if we share these cognitive styles that we will be able to create a coherent technology strategy that everyone understands
66. 59 Metadata Publishing Open The Door To The Semantic Web! Agents Metadata publishing is hard It is a foundation upon which the Semantic Web will be built The benefits are indirect and need strong executive sponsorship Metadata publishing is no “silver bullet” I believe it is the most direct way to get to the Semantic Web This will be the most practical way to build intelligent agents
67. 60 Top AI Researchers Agree… If software is ever going to be able to effectively inter-operate (in ways that were not explicitly preconceived and engineered), it will be because applications share enough of the semantics of their data elements. Doug Lenat, Cycorp Semantic Technology Conference 2005