Building core vocabularies is becoming important to enable seamless digital communication and use of open data. Based on the experience to build the core vocabulary, I will talk about easiness and difficulty of building the core vocabulary and furthermore those of bridging between different core vocabularies across languages and
domains.
Presented in Glocal KO Workshop, Thursday August 13, 2015, Copenhagen
Automating Google Workspace (GWS) & more with Apps Script
Bridging gaps across languages and domains through core vocabularies
1. Some thoughts about the gaps across
languages and domains
through the experience on building the
core common vocabularies
Hideaki Takeda
National Institute of Informatics
takeda@nii.ac.jp
Glocal KO Workshop, Thursday August 13, 2015, Copenhagen
2. Who am I?
Hideaki Takeda, Dr., Eng.
• Professor, National Institute of Informatics
– Research Institute mainly for Computer Science
• Background: Computer Science, in particular, Artificial
Intelligence
• Current interest: Semantic Web, Ontology, Linked Open
Data (LOD), Social Media Analysis
• Social activities
– President, Linked Open Data Initiative (NPO)
– Founder, Dbpedia Japanese Chapter
– Specialist, Information-technology Promotion Agency,
Japan (IPA)
– Chair, Japan Link Center (Registration Agency of
International DOI Foundation)
– Board, ORCID
3. Core Vocabularies
• Background
– Everything is on infosphere, i.e., web
– Lots of information, lots of data, lots of systems
• Problems
– Misunderstanding/mis-matching/”missing
links“ across different domains
– Gap between human and machines (computers)
4. Core Vocabularies
• Aim
– Increase interoperability of information/data
– Bridge human and machine understanding
• Target
– Governmental documents/data
• Method
– Define a set of concepts which bridge (human-
readable) terms and (computer-processable) symbols
(URIs)
– Starting from the most common concepts
5. Core Vocabularies
• Activities worldwide
– USA: NIEM Core
• NIEM (National Information Exchange Model)
– Europe: ISA Core Vocabularies
– UN: United Nations Centre for Trade Facilitation
and Electronic Business (UN/CEFACT)
• Core Components Library (UN/CCL)
– Japan: IMI Core Vocabulary
10. IMI Project
• Supported by
– Ministry of Economy, Trade,
and Industry, Japan
• Technical Framework
– Data Model
– Core Vocabulary
– Design Rules
• Support Framework
– Tools
• for data developer
• for schema developer
– Database
• schema / tools / templates/ …
Person Type
Name
Gender
Gender Code
Birth Date
Address
…
Name Type
Type
Name
Family Name
Given Name
…
Address Type
Type
Notation
Zip Code
Prefecture
City
…
String
String
String
Code TypeString
String
String
String
String
String
Code Type
Type
Value
Name Type
Address Type
Codelist Type
String
Thing Type
10
11. IMI as a template for schema
Registration form for Confere
Name:
Address:
Gender:
Affiliation:
Affiliation
Address:
Attending date: -
M /
Person Type
Name
Gender
Gender Code
Birth Date
Address
…
Name Type
Type
Name
Family Name
Given Name
…
Address Type
Type
Notation
Zip Code
Prefecture
City
…
String
String
String
Code Type
String String
String
String
String
String
Code Type
Type
Value
Name Type
Address Type
Codelist Type
String
Thing Type
IMI Individual Form
Person Type
Name
Gender
Address
Affiliation
Name Type
Name
Address Type
Notation
Zip-code
String
String
String
String
Name
Address
Org.
Person
Date
Event Participation Type
Participant
Date
Design Schema
Remove unnecessary items
Add necessary items
12. Roles of IMI
• Structured concept dictionary
– Concept dictionary
• Terms as notation of concepts
– The entry is concept, not term
• Class concept and relation concept
• General-specific relation
– Structured dictionary
• Concepts form a network of concepts which in tern represents
meaning of individual concepts
• A class concept consists of relation concepts representing
attributes and general/specific relations
• A relation concept consists of class concepts connected as
domains and ranges and general/specific relations
• Template for schemata
– Add or remove items for the specific needs
13. Use of IMI
• Define the concept model
• “Serialize” it into specific “physical” forms
• Use suitable a physical form
IMI Concept
Model
RDF XML
Natural
Language Form
For Open Data For data exchange For spread sheets and documents
• Relax definition
• Interoperability
with other open data
schemata
• Strict definition
• Interoperability with DB
schemata
• Relax definition with simple
structure
• Readability by humans
14. IMI Core vocabulary v2.2
• Published on Feb.3 2015
• 48 core class terms
– person, address, facility, location, date, …
• 206 core property terms
– name of person, birth date, birth country, …
• Multi format
– rdf schema, xml schema
and documents for human
http://imi.ipa.go.jp/ns/core/2/ 14
15.
16. Class definition (person class)
person 人
説明:人の情報を表現するためのデータ型 Data Type to describe a person
継承(inherit from) : ic:実体型
property Data type cardinality 説明 (ja) Description (en)
ID ID ic:ID型 0..n ID Identification of a Person
Name of person 氏名 ic:氏名型 0..n 氏名 Name of a Person
Gender 性別 xsd:string 0..1 性別の表記 Gender of a Person
Gender code 性別コード ic:コード型 0..1 性別コード Gender of a Person
Birth date 生年月日 ic:日付型 0..1 生年月日 Date of Birth of a Person
Death date 死亡年月日 ic:日付型 0..1 死亡年月日 Date of Death of a Person
Residence address 住所 ic:住所型 0..n 現住所 Present address of a Person
Domicile of origin 本籍 ic:住所型 0..1 本籍 Legal residence address of a Person
Contact information 連絡先 ic:連絡先型 0..n 連絡先 Contact information of a Person
Nationality 国籍 xsd:string 0..n 国籍の表記
A county that assigns rights, duties, and
privileges to a person because of the birth or
naturalization of the person in that country.
Nationality code 国籍コード ic:コード型 0..n
住民基本台帳で利用さ
れている国籍コード
A county that assigns rights, duties, and
privileges to a person because of the birth or
naturalization of the person in that country.
Birth country 出生国 xsd:string 0..1 生まれた国名 A location where a person was born.
Birth country code 出生国コード ic:コード型 0..1 生まれた国のコード A location where a person was born.
Birth place 出生地 ic:住所型 0..1 生まれた場所 A location where a person was born.
16
17. Class Structure
person 人
name ic:氏名型
Contact ic:連絡先型
: :
氏名
Family name xsd:string
Romanized Family name xsd:string
: :
contact 連絡先
Phone number ic:電話番号型
Address ic:住所型
: :
電話番号
: :
address 住所
Country xsd:string
Prefecture xsd:string
: :
A class term has a property term as a sub element and the property term can refer a class
term. Again, the class term has a list of property terms. That constructs a layered structure
of terms as the following figure.
phone number
name
18. Concept of the IMI framework
International interoperability is highly
considered in preparing IMI.
Core
Vocabulary
Shelter
Location
Hospital
Station
Geographical Space
/Facilities
Transportation
Disaster
Prevention
Finance
Domain-specific
Vocabularies
Disaster
Restoration
Cost
Cross Domain
Vocabulary
IMI
Japanese
Local
government
Standard
(APPLIC)
DE fact
Standards
(DC, foaf,
etc)
NIEM
(US)
ISA
(EU)
Schema.org
18
19. Mapping between concepts in
different core vocabularies
• Difficulty of concept-concept mapping
– Matching of meaning tends to be very abstract
discussion
Concept
reference
Ontology
Real world
Concept
reference
?
20. Mapping between concepts in
different core vocabularies
• Difficulty of concept-concept mapping
– Matching of meaning tends to be very abstract
discussion
– Matching of references is easier
Concept
reference
Ontology
Real world
Concept
reference
?
21. Mapping between concepts in
different core vocabularies
• Difficulty of concept-concept mapping
– Syntactical mapping vs. semantic mapping
• Just consider what it refers in the real world, not how it
is represented in systems.
Concept
reference
Ontology
Concept
reference
?
Systems World
Cognitive World
22. Person
person 人
説明:人の情報を表現するためのデータ型 Data Type to describe a
person
継承(inherit from) : ic:実体型
prop
erty
Data
type
cardi
nalit
y
説明 (ja) Description (en)
ID ID ic:ID型 0..n ID Identification of a Person
Name of
person
氏名
ic:氏名
型
0..n 氏名 Name of a Person
Gender 性別
xsd:strin
g
0..1 性別の表記 Gender of a Person
ender code
性別
コード
ic:コード
型
0..1 性別コード Gender of a Person
Birth date
生年月
日
ic:日付
型
0..1 生年月日 Date of Birth of a Person
Death date
死亡年
月日
ic:日付
型
0..1 死亡年月日 Date of Death of a Person
Residence
address
住所
ic:住所
型
0..n 現住所 Present address of a Person
Domicile of
origin
本籍
ic:住所
型
0..1 本籍
Legal residence address of a
Person
Contact
nformation
連絡先
ic:連絡
先型
0..n 連絡先
Contact information of a
Person
Nationality 国籍
xsd:strin
g
0..n 国籍の表記
A county that assigns rights,
duties, and privileges to a
person because of the birth or
naturalization of the person in
that country.
住民基本台帳
A county that assigns rights,
duties, and privileges to a
?
?
Systems World
Cognitive World
24. Semantic Mapping
• Semantic Mapping
– Mapping on the cognitive layer
– Two ways of judging mapping
• Extensional Mapping
– Check whether ‘things’ are shared
– e.g., person
– Mostly for Class Mapping
• Intensional Mapping
– Check whether ‘values’ are shared
– e.g., postal-code
– Mostly for Property Mapping
• Syntactical Mapping
– Mapping on the systems layer
25. Types of matching: SKOS
• Exact Match
• Close Match
• Broad/Narrow Match
• Related Match
26. Close match
• Close match: nearly matched but not exactly
matched.
• Extensional mapping
– Coverage of ‘things’ are overlapped so much
• Coverage of ‘Country’ is slightly different
– ‘things’ are close
• Reference of ‘Person’ is slightly different (person vs. legal
Person)
• Intensional mapping
– Coverage of ‘values’ are overlapped so much
27. Broad match/narrow match
• Broad/narrow match
– One subsumes the other
• Extensional mapping
– Coverage of ‘things’ are subsumed, i.e., the subset
is exact match
• Intensional mapping
– Coverage of ‘values’ are subsumed, i.e., the subset
is exact match
28. More different matching
• Complicated match
– An element of a system matches a combination of
two or more elements.
– “Pathway” match
• A single property matches the combination of two or
more properties
– “Conditional” match
• An element matches the other element if some
condition is hold
IdentifierIssuingAuthority Link Has related match IMI ic:ID型.ic:ID体系.ic:発行者
LegalEntityRegisteredAddress Link Has broad match IMI ic:法人型.ic:住所 It is exact match if the value of ic:住所.種別 should be "登記住所".
29. Results
Core Vocabulary Identifier Link Mapping relation Data model Identifier
Address Link Has exact match IMI ic:住所型
AddressAddressArea Link Has narrow match IMI ic:住所型.ic:町名
AddressAddressArea Link Has narrow match IMI ic:住所型.ic:丁目
AddressAddressArea Link Has narrow match IMI ic:住所型.ic:番地補足
AddressAddressArea Link Has narrow match IMI ic:住所型.ic:番地
AddressAddressArea Link Has narrow match IMI ic:住所型.ic:号
AddressAddressID Link Has exact match IMI ic:住所型.ic:ID
AddressAdminUnitL1 Link Has exact match IMI ic:住所型.ic:国
AddressAdminUnitL2 Link Has narrow match IMI ic:住所型.ic:都道府県
AddressFullAddress Link Has exact match IMI ic:住所型.ic:表記
AddressLocatorDesignator Link Has narrow match IMI ic:住所型.ic:ビル番号
AddressLocatorDesignator Link Has narrow match IMI ic:住所型.ic:部屋番号
AddressLocatorName Link Has narrow match IMI ic:住所型.ic:ビル名
AddressPOBox Link Has related match IMI ic:住所型.ic:方書
AddressPostCode Link Has exact match IMI ic:住所型.ic:郵便番号
AddressPostName Link Has narrow match IMI ic:住所型.ic:市区町村
AddressPostName Link Has narrow match IMI ic:住所型.ic:区
AddressThoroughfare Link Has no match IMI
Agent Link Has exact match IMI ic:実体型
30. Results
Identifier Link Has exact match IMI ic:ID型
IdentifierIdentifier Link Has exact match IMI ic:ID型.ic:識別値
IdentifierIssueDate Link Has no match IMI
IdentifierIssuingAuthority Link Has related match IMI ic:ID型.ic:ID体系.ic:発行者
IdentifierIssuingAuthorityURI Link Has exact match IMI ic:ID型.ic:ID体系.ic:URI
IdentifierType Link Has no match IMI
JurisdictionIdentifier Link Has related match IMI ic:国籍コード
JurisdictionName Link Has related match IMI ic:国籍
LegalEntity Link Has exact match IMI ic:法人型
LegalEntityAddress Link Has broad match IMI ic:法人型.ic:住所
LegalEntityAlternativeName Link Has no match IMI
LegalEntityCompanyActivity Link Has close match IMI ic:法人型.ic:事業種目
LegalEntityCompanyStatus Link Has related match IMI ic:法人型.ic:活動状況
LegalEntityCompanyType Link Has exact match IMI ic:法人型.ic:組織種別
LegalEntityIdentifier Link Has exact match IMI ic:法人型.ic:ID
LegalEntityLegalIdentifier Link Has no match IMI
LegalEntityLegalName Link Has broad match IMI ic:法人型.ic:名称.表記
LegalEntityLocation Link Has related match IMI ic:法人型.ic:地物.説明
LegalEntityRegisteredAddress Link Has broad match IMI ic:法人型.ic:住所
Location Link Has exact match IMI ic:場所型
LocationAddress Link Has exact match IMI ic:場所型.ic:住所
LocationGeographicIdentifier Link Has broad match IMI ic:場所型.ic:地理識別子
LocationGeographicName Link Has exact match IMI ic:場所型.ic:名称.ic:表記
LocationGeometry Link Has exact match IMI ic:場所型.ic:地理座標
31. Results
Person Link Has exact match IMI ic:人型
PersonAddress Link Has exact match IMI ic:人型.ic:住所
PersonAlternativeName Link Has broad match IMI ic:人型.ic:氏名.ic:姓名
PersonBirthName Link Has broad match IMI ic:人型.ic:氏名.ic:姓名
PersonCitizenship Link Has no match IMI
PersonCountryOfBirth Link Has exact match IMI ic:人型.ic:出生国
PersonCountryOfDeath Link Has no match IMI
PersonDateOfBirth Link Has exact match IMI ic:人型.ic:生年月日
PersonDateOfDeath Link Has exact match IMI ic:人型.ic:死亡年月日
PersonFamilyName Link Has exact match IMI ic:人型.ic:氏名.ic:姓
PersonFullName Link Has exact match IMI ic:人型.ic:氏名.ic:姓名
PersonGender Link Has exact match IMI ic:人型.ic:性別コード
PersonGivenName Link Has exact match IMI ic:人型.ic:氏名.ic:名
PersonIdentifier Link Has broad match IMI ic:人型.ic:ID
PersonPatronymicName Link Has no match IMI ic:人型.ic:氏名.ic:姓名
PersonPlaceOfBirth Link Has narrow match IMI ic:人型.ic:出生地
32. Bridging core and domain vocabularies
(working in progress)
• Aim: Core vocabulary would be extended to
domain vocabularies
– Agriculture
– Finance
– Traffic
– …
• Task:
– Can concepts be shared between core and domains?
really?
33. Agricultural Activity Ontology (AAO)
Agricultural activity
crop production activity
activity for propagation
activity in the vegetative growth stage
activity in the reproductive growth stage
activity for environment control
activity for soil control
activity for climate control
activity for water control
activity for biotic control
activity for chemical control
post production activity
activity for harvesting
activity for processing
activity for extending shelf-life
activity for wrapping
indirect activity
activity for preparing materials
activity for cleaning
activity for transport
activity for monitoring
activity for maintaining farm equipment
administrative activity
activity for business administration
http://cavoc.org/aao/
34. An example: “activity” (and “event”)
• S: (n) activity (any specific behavior) "they avoided all recreational activity"
– direct hyponym / full hyponym
– direct hypernym / inherited hypernym / sister term
• S: (n) act, deed, human action, human activity (something that people do or cause to happen)
– S: (n) event (something that happens at a given place and time)
– [WordNet]
• Each activity is a Happening which involves volition and participants. It has
temporal dimension. It is distinguished from Events by the fact that the activity
does not trigger change of state and does not have a conceptual end point.
– [PROTON Extent module (a lightweight upper-level ontology)]
• Activity: This class represents the abstract content of an event, which may be
repeated many times, once or never. For example a training course, or a play.
– [The Event Programme Vocabulary (prog)]
• E5 Event
– Subclass of: E4 Period
– Superclass of: E7 Activity, E63 Beginning of Existence, E64 End of Existence
• E7 Activity
– Subclass of: E5 Event
– Superclass of: E8 Acquisition, E9 Move, E10 Transfer of Custody, E11 Modification,
E13 Attribute Assignment, E65 Creation …
– [CIDOC Conceptual Reference Model]
35. Summary
• Sharing concepts is a very long way
• No ground truth
– Step-by-step understanding of the world
– Careful consensus making
• More flexible framework is needed
– Simple mapping is not so happy