8. • A scholar is just a library way of making
another library
– Daniel Dennett, “Memes and the Exploitation of
Imagination”, 1990
• A scholar is just a data way of making
another data
11. Data Life Cycle
• Data is created, shared, published, and archived
• But, just “published” is not enough, it should be
“openly published” (open data)
Data ShareCreate Publish Archive
Research Phase In Progress Results
12. Open Data
• “A piece of data or content is open if anyone is free
to use, reuse, and redistribute it — subject only, at
most, to the requirement to attribute and/or share-
alike.” http://opendefinition.org/
• Open data is data publication with some open
license
– Open license ensues the above condition
13. Data Life Cycle
• Different tools for different stages of life cycle
– Data sharing: generating, federating, …
– Data publishing: searching, harvesting, …
– Data archiving: migration, …
• The architecture CAN be shared
Data ShareCreate Publish Preserve
Research Phase In Progress Results
Stakeholder
Research Institute
Researcher/R. Group
14. オープンサイエンス対応 - 研究データ基盤
14
• 機関リポジトリ+分野別リポジトリやデー
タリポジトリとも連携
• 研究者や所属機関、研究プロジェクトの情
報とも関連付けた知識ベースを形成
• 研究者による発見のプロセスをサポート
長期保存対応ストレージ領域
Cold
Storage
Cold
Storage
Cold
Storage
Hot
Storage
Hot
Storage
Hot
Storage
データ公開基盤
メタデータ集約・管理
知識ベースの構築
成果論文 研究データ
機関向け研究データ管理公開・蓄積管理・保存
検索・利用
非公開 共有 公開
• データ管理基盤における簡便な操作で研究
成果の公開が可能
• 図書館員やデータキュレータによる、メタ
データや公開レベル統計情報などの管理機
能の提供
• データ収集装置や解析用計算機とも連携
• 研究遂行中の研究データなどを共同研究者
間やラボ内で共有・管理
• 組織が提供するストレージに接続した利用
が可能
分野別
リポジトリ
海外の
研究データ
公開基盤
DOI ORCIDデータ検索基盤
for Data
for Data
直結
アクセスコントロール
実験データ
収集装置
解析用
計算機
データ管理基盤
16. FAIR Data Principles
• Findable
– F1. (meta)data are assigned a globally unique and eternally persistent identifier.
– F2. data are described with rich metadata.
– F3. (meta)data are registered or indexed in a searchable resource.
– F4. metadata specify the data identifier.
• Accessible
– A1 (meta)data are retrievable by their identifier using a standardized communications
protocol.
– A1.1 the protocol is open, free, and universally implementable.
– A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
– A2 metadata are accessible, even when the data are no longer available.
• Interoperable
– I1. (meta)data use a formal, accessible, shared, and broadly applicable language for
knowledge representation.
– I2. (meta)data use vocabularies that follow FAIR principles.
– I3. (meta)data include qualified references to other (meta)data.
• Re-usable
– R1. meta(data) have a plurality of accurate and relevant attributes.
– R1.1. (meta)data are released with a clear and accessible data usage license.
– R1.2. (meta)data are associated with their provenance.
– R1.3. (meta)data meet domain-relevant community standards.
https://www.force11.org/group/fairgroup/fairprinciples
17. Repository
Architecture of data sharing
Data
Format
Metadata
Metadata Schema
Systematic Integration across the layersInteroperability on each layer
Access Control
Identifier
18. Database, Search, Maintenance
ス Description Language, Schema design, Registry, Interoperability
Continuous Development, Community of PracticeRepository
Architecture of data sharing
Data
Format
Metadata
Metadata Schema
Authentication/authorization/audit, ID federation, securityAccess Control
Organization, systems, ID federationIdentifier
Re-usableFindable InteroperableAccessible
19. Database, Search, Maintenance
ス Description Language, Schema design, Registry, Interoperability
Continuous Development, Community of PracticeRepository
Architecture of data sharing
Data
Format
Metadata
Metadata Schema
Authentication/authorization/audit, ID federation, securityAccess Control
Organization, systems, ID federationIdentifier
DataCite CrossRef JaLC Dublin Core DCAT CKAN Linked Data
Organization Schema System Technology
Coordination and Competition
Dspace Fedora Weko
DOI ORCID FundRef
21. Research Activities and Related Entities
Survey
Article Writing
Data
Digital
Articles
Acquiring Data
Publishing Data
Funding agencies
Research
Institutions
affiliated
Projects
Supported
Academic Societies
Digital objects Digital objects
Topics
22. Research Activities and Related Entities
Survey
Article Writing
Data
Digital
Articles
Acquiring Data
Publishing Data
Funding agencies Projects
Research
Institutions
affiliated
Supported
Academic Societies
Digital objects Digital objects
Topics
ID
ID ID
ID
ID ID
ID
ID
IDID
ID
23. Research Activities and Related Entities
Survey
Article Writing
Acquiring Data
Publishing Data
Funding agencies Projects
affiliated
Supported
ID
ID ID
ID
ID ID
ID
ID
IDID
ID
Data
Digital
Articles
Research
Institutions
Academic Societies
Topics
24. Global Infrastructure for Scholarly
Communication
ID
ID ID
ID
ID
ID
ID
ID
IDID
ID
• ID for
– Article
– Data
– Researcher
– Institutions, affiliation
– Funding agency, funded project
– Academic society
– Topic
– …
25. Identifies for research
• A research activity is represented with a
structure of identifies
– Planned and submitted
– Organized and executed
– Concluded and evaluated
ID
ID ID
ID
ID
ID
ID
ID
IDID
ID
37. 図書館学での例
• 図書館コミュニティは先駆者
• 分類 Classification
– Universal Decimal Classification (UDC)
• 統制語彙 Controlled Vocabulary
– 人名、組織、場所に関する典拠authority
• Library of Congress : 8百万, MADS &SKOS
• British Library: 2.6 百万, foaf & BIO (A vocabulary for biographical
information)
• 国立国会図書館: 1百万, foaf
• Deutsche Nationalbibliothek (DNB, Germany): 1.8 & 1.3百万 (人名 & 組織),
• Virtual International Authority File (VIAF): 4百万
• タキソノミー Taxonomy
– 件名標目 Subject Heading: LC, NDL,
• Library of Congress: MADS &SKOS
• British Library:
• National Diet Library (Japan): 0.1 百万, SKOS
• Deutsche Nationalbibliothek (DNB, Germany): 0.16 百万
38.
39.
40. UDC as Linked DataUDC ELEMENT DEFINITION SKOS TERM UDC
SUBPROPERTY
UDC number (notation) UDC notation is combination of symbols (numerals, signs and letters) that represent a class, its
position in the hierarchy and its relation to other classes. Notation is a language-independent
indexing term that enables mechanical sorting and filing of subjects. Also called 'UDC number'
and 'UDC classmark'
skos:notation ---
class identifier (URI) A unique identifier assigned to each UDC class. It identifies the relationship between a class'
meaning and its notational representation
skos:Concept ---
broader class (URI) Superordinate class: the class hierarchically above the class in question skos:broader ---
caption Verbal description of the class content skos:prefLabel ---
including note Extension of the caption containing verbal examples of the class content (usually a selection of
important terms that do not appear in the subdivision)
skos:note udc:includingN
ote
application note Instructions for number building, further extension and specification of the class skos:note udc:application
Note
scope note Note explaining the extent and the meaning of a UDC class. Used to resolve disambiguation or
to distinguish this class from other similar classes
skos:scopeNot
e
---
examples Examples of combination are used to illustrate UDC class building i.e. complex subject
statements
skos:example ---
see also reference Indication of conceptual relationship between UDC classes from different hierarchies skos:related ---
<skos:Concept rdf:about="http://udcdata.info/025553">
<skos:inScheme rdf:resource="http://udcdata.info/udc-schema"/>
<skos:broader rdf:resource="http://udcdata.info/025461"/>
<skos:notation rdf:datatype="http://udcdata.info/UDCnotation">510
<skos:prefLabel xml:lang="en">Mathematical logic</skos:prefLa
<skos:prefLabel xml:lang="ja">記号論理学</skos:prefLabel>
<skos:related rdf:resource="http://udcdata.info/000016"/>
http://udcdata.info/
69,000 records
40 Languages
49. CIDOC CRM
P102 has title (is title of): E35 Title
P1 is identified by (identifies): E41 Appellation
P137 exemplifies (is exemplified by): E55 Type
P2 has type (is type of): E55 Type
P56 bears feature (is found on): E26 Physical Feature
P59 has section (is located on or within): E53 Place
P65 shows visual item (is shown by): E36 Visual Item
P58 has section definition (defines section): E46 Section Definition
P46 is composed of (forms part of): E18 Physical Thing
P45 consists of (is incorporated in): E57 Material
P57 has number of parts: E60 Number
P128 carries (is carried by): E90 Symbolic Object
P49 has former or current keeper (is former or current keeper of): E39 Actor
P50 has current keeper (is current keeper of): E39 Actor
P51 has former or current owner (is former or current owner of): E39 Actor
P52 has current owner (is current owner of): E39 Actor
P105 right held by (has right on): E39 Actor
P104 is subject to (applies to): E30 Right
P44 has condition (is condition of): E3 Condition State
P53 has former or current location (is former or current location of): E53 Place
P55 has current location (currently holds): E53 Place
E22 Man-Made Object
Subclass of: E19 Physical Object, E24 Physical Man-Made Thing
Superclass of: E84 Information Carrier
Properties
P54 has current permanent location (is current permanent locatio
P62 depicts (is depicted by): E1 CRM Entity
P130 shows features of (features are also found on): E70 Thing
P43 has dimension (is dimension of): E54 Dimension
P44 has condition (is condition of): E3 Condition State
P48 has preferred identifier (is preferred identifier of): E42 Identif
P49 has former or current keeper (is former or current keeper of)
P50 has current keeper (is current keeper of): E39 Actor
P51 has former or current owner (is former or current owner of):
P52 has current owner (is current owner of): E39 Actor
P53 has former or current location (is former or current location o
P55 has current location (currently holds): E53 Place
P56 bears feature (is found on): E26 Physical Feature
P59 has section (is located on or within): E53 Place
P62 depicts (is depicted by): E1 CRM Entity
P105 right held by (has right on): E39 Actor
P54 has current permanent location (is current permanent locatio
P3 has note: E62 String
P101 had as general use (was use of): E55 Type
P103 was intended for (was intention of): E55 Type
http://www.cidoc-crm.org/
50.
51.
52. SWEET Ontologies
• Semantic Web for Earth
and Environmental
Terminology (SWEET)
• Ontologies
– 6,000 concepts
– 200 separate ontologies
74. 共通語彙基盤の推進
• 情報を正しく効率的に交換、活用していくためには、人名、住所、物
等、データを体系的、かつ、構造的に定義して行く必要がある。
74
検索
オープンデータ
システム連携
三鷹市立第四小学校
ic:建物_所在
ic:場所_地名
ic:場所_地理識別子
ic:場所_住所
ic:住所_住所
東京都三鷹市下連雀1
丁目25−1
ic:住所_構造化住所
ic:構造化住所_国
ic:構造化住所_都道府県 東京都
ic:構造化住所_市区町村 三鷹市
ic:構造化住所_町名 下連雀
ic:構造化住所_街区符号 1
ic:構造化住所_住居番号 25
ic:構造化住所_地番 1
ic:構造化住所_方書
ic:方書_方書
ic:方書_ビル名
ic:方書_部屋番号
ic:構造化住所_郵便番号 181-0013
ic:構造化住所_住所ID
ic:構造化住所_住所コード
ic:場所_経緯度座標
ic:経緯度座標系_測地系コード
ic:経緯度座標系_緯度
ic:緯度_度
ic:緯度_分
ic:緯度_秒
ic:経緯度座標系_経度
ic:経度_度
ic:経度_分
ic:経度_秒
ic:場所_UTM座標
ic:UTM座標系_UTM座標
ic:UTM座標系_UTM測地系ID
ic:UTM座標系_東距
ic:UTM座標系_グリッドゾーンID
ic:UTM座標系_グリッドゾーン格子 ID
ic:UTM座標系_北距
ic:場所_MGRS座標
ic:MGRS座標系_MGRS座標
ic:MGRS座標系_MGRS座標格子ID
ic:建物_施設情報
ic:施設_ID
ic:証明_識別ID
ic:証明_証明種類
ic:証明_発行日
ic:証明_失効日
ic:証明_発行者
ic:施設_名称 三鷹市立第四小学校
ic:施設_種別 小学校
ic:施設_商用区分
ic:施設_概要
小・中一貫教育校「連
雀学園」に属する小学
校。
項目名(Type/Sub-properties) 項目名(エントリー名) 英語名 データタイプ データタイプ(英語) cardinality 項目説明 項目説明(英語) サンプル値 Mapping to NIEM Mapping to ISA Joinup
人型 ic:人型 PersonType 人の情報を表現するためのデータ型。 nc:PersonType Person
氏名 ic:人_氏名 PersonName ic:氏名型 ic:PersonNameType 0..1 氏名 Name of a Person - nc:PersonName
性別 ic:人_性別 PersonSex <抽象要素> <abstract element, no type> 0..1 性別 Gender of a Person 1 nc:PersonSex gender
Substitutable Elements: Substitutable Elements:
性別コード ic:人_性別コード + PersonSexCode codes:性別コード型 codes:GenderCodeType 性別コード Gender of a Person 1 nc:PersonSexCode
性別名 ic:人_性別名 + PersonSexText ic:テキスト型 ic:TextType 性別の名称。 Gender of a Person 男 nc:PersonSexText
生年月日 ic:人_生年月日 BirthDate ic:日付型 ic:DateType 0..1 生年月日 Date of Birth of a Person - nc:PersonBirthDate dateOfBirth
死亡年月日 ic:人_死亡年月日 DeathDate ic:日付型 ic:DateType 0..1 死亡年月日 Date of Death of a Person - nc:PersonDeathDate dateOfDeath
現住所 ic:人_現住所 PresentAddress ic:住所型 ic:AddressType 0..1 現住所 - nc:PersonResidenceAssociationTyperesidency
本籍 ic:人_本籍 LegalResidence ic:住所型 ic:AddressType 0..1 本籍 -
国籍 ic:人_国籍 Citizenship <抽象要素> <abstract element, no type> 0..n 国籍
A county that assigns rights, duties, and privileges to a person because of
the birth or naturalization of the person in that country.
- nc:PersonCitizenship citizenship
Substitutable Elements: Substitutable Elements:
国籍名 ic:人_国籍名 + CitizenshipText ic:テキスト型 ic:TextType 国籍の名称。
A county that assigns rights, duties, and privileges to a person because of
the birth or naturalization of the person in that country.
日本国 nc:PersonCitizenshipText
国籍コード ic:人_国籍コード + CitizenshipCode codes:国籍コード型 codes:CitizenshipCodeType 住民基本台帳で利用されている国籍コード。
A county that assigns rights, duties, and privileges to a person because of
the birth or naturalization of the person in that country.
392 nc:PersonCitizenshipFIPS10-4Code
ISO3166Alpha2 ic:人_ISO3166Alpha2 + ISO3166Alpha2 iso_3166:ISO3166Alpha2CodeTypeiso_3166:ISO3166Alpha2CodeType 国名コード。ISO3166Alpha2。2文字コード。
A county that assigns rights, duties, and privileges to a person because of
the birth or naturalization of the person in that country.
nc:PersonCitizenshipISO3166Alpha2Code
ISO3166Alpha3 ic:人_ISO3166Alpha3 + ISO3166Alpha3 iso_3166:ISO3166Alpha3CodeTypeiso_3166:ISO3166Alpha3CodeType 国名コード。ISO3166Alpha3。3文字コード。
A county that assigns rights, duties, and privileges to a person because of
the birth or naturalization of the person in that country.
nc:PersonCitizenshipISO3166Alpha3Code
ISO3166Numeric ic:人_ISO3166Numeric + ISO3166Numeric iso_3166:ISO3166NumericCodeTypeiso_3166:ISO3166NumericCodeType 国名コード。ISO3166Numeric。数字3桁コード。
A county that assigns rights, duties, and privileges to a person because of
the birth or naturalization of the person in that country.
nc:PersonCitizenshipISO3166NumericCode
出生国 ic:人_出生国 BirthCountry ic:場所型 ic:LocationType 0..1 生まれた国。 A location where a person was born. nc:PersonBirthLocation countryOfBirth
出生地 ic:人_出生地 BirthPlace ic:場所型 ic:LocationType 0..1 生まれた場所。 A location where a person was born. nc:PersonBirthLocation placeOfBirth
氏名型 ic:氏名型 PersonNameType 氏名を表現するためのデータ型。 nc:PersonNameType
姓名 ic:氏名_姓名 FullName ic:テキスト型 ic:TextType 0..1 氏名(姓、名)。 Full name of a Person 経済 太郎 nc:PersonFullName fullName
カナ姓名 ic:氏名_カナ姓名 KanaFullName ic:カタカナテキスト型 ic:TextType 0..1 氏名(姓、名)のカナ表記。 Full name in Katakana. ケイザイタロウ
ローマ字姓名 ic:氏名_ローマ字姓名 RomanFullName ic:テキスト型 ic:TextType 0..1 氏名(姓、名)のローマ字表記。 Full name in Roman alphabet. Keizai Taro
姓 ic:氏名_姓 FamilyName ic:テキスト型 ic:TextType 0..1 姓。 Family name of a Person 経済 nc:PersonSurName familyName
カナ姓 ic:氏名_カナ姓 KanaFamilyName ic:カタカナテキスト型 ic:TextType 0..1 姓のカナ表記。 Family name in Katakana. ケイザイ
ローマ字姓 ic:氏名_ローマ字姓 RomanFamilyName ic:テキスト型 ic:TextType 0..1 姓のローマ表記。 Family name in Roman alphabet.
名 ic:氏名_名 GivenName ic:テキスト型 ic:TextType 0..1 名。 Given name of a Person 太郎 nc:PersonGivenName given name
カナ名 ic:氏名_カナ名 KanaGivenName ic:カタカナテキスト型 ic:TextType 0..1 名のカナ表記。 Given name in Katakana. タロウ
ローマ字名 ic:氏名_ローマ字名 RomanGivenName ic:テキスト型 ic:TextType 0..1 名のローマ字表記。 Given name in Roman alphabet.
ミドルネーム ic:氏名_ミドルネーム MiddleName ic:テキスト型 ic:TextType 0..1 ミドルネーム。 Middle name of a person nc:PersonMiddleName alternativeName
カナミドルネーム ic:氏名_カナミドルネーム KanaMiddleName ic:カタカナテキスト型 ic:TextType 0..1 ミドルネームのカナ表記。 Middle name in Katakana.
ローマ字ミドルネーム ic:氏名_ローマ字ミドルネーム RomanMiddleName ic:テキスト型 ic:TextType 0..1 ミドルネームのローマ字表記。 Middle name in Roman alphabet.
旧姓 ic:氏名_旧姓 MaidenName ic:テキスト型 ic:TextType 0..1 旧姓。 Maiden name. nc:PersonMaidenName birthName
カナ旧姓 ic:氏名_カナ旧姓 KanaMaidenName ic:カタカナテキスト型 ic:TextType 0..1 旧姓のカナ表記。 Maiden name in Katakana.
ローマ字旧姓 ic:氏名_ローマ字旧姓 RomanMaidenName ic:テキスト型 ic:TextType 0..1 旧姓のローマ字表記。 Maiden name in Roman alphabet.
語彙(ボキャブラリ)、
情報交換パッケージ(IEP)
Schema.org
検索エンジン大手が整備する
構造化データマークアップの共通仕様
情報交換パッケージに
より、システム間を連携
・高速な情報連携
・設計の効率化
語彙で意味を確認し、情報
交換パッケージから、情報
を抽出
・サービス設計の効率化
・安定した情報連携
語彙間の整理をしておくこ
とで、検索を効果的に実施
・検索の利便性の向上
・効果的な広報の実施
共通語彙基盤は、用語の参照辞書を整備するこ
とで、各種データの同一性の確認を容易にし、そ
の結果として、システム間の連携やオープン
データの活用を容易にできるようにする仕組み。
http://goikiban.ipa.go.jp/
(IMI: Infrastructure for Multi-layer Interoperability)
90. オーサライズに関する考え方
• オーサライズ方式
– 専門家によるWGによる素案→公開→デファクトスタンダードとい
う方式
– コアの用語と実装用モデルを並行して開発
– StandardではなくReference Modelを提供する
– 世界各国の進め方もほぼ同じ方式
• 米国(NIEM) Core vocabulary + Information Exchange model
• 欧州(ISA) Core vocabulary + Application profile
• 日本(IMI) Core vocabulary + Data Model Description
• 理由
– 変化が速い
– 品質はプロセスと体制で担保
– 導入時のカスタマイズ範囲が大きい
戦略・方針
Model
プロモーション
91. 海外の状況• ヨーロッパ SEMIC
• 米国 NIEM
• 民間 schema.org
91
This particular release
incorporates approximately
280 new elements and 350
new types,
as well as updates to 2,300
existing elements and 230
existing types.
PUBLIC
EVENT
92. Schema.org
• Some highlighted vocabulary
– Creative works: CreativeWork, Book, Movie, MusicRecording, Recipe,
TVSeries ...
– Embedded non-text objects: AudioObject, ImageObject, VideoObject
– Event
– Health and medical types: notes on the health and medical types under
MedicalEntity.
– Organization
– Person
– Place, LocalBusiness, Restaurant ...
– Product, Offer, AggregateOffer
– Review, AggregateRating
– Action
93. Schema.org
• Web検索の高度化のためにYahoo!, Google, Yandex等
が設立。
– Webページに埋め込む/付随させるメタデータの共通化
– 対象領域:Webに出現する事物や事柄、プロセス、トラン
ザクションなど
• 構築体制
– Steering Group
• Oversight, review, approval of new release
– Community Group
• propose, discuss, prepare and review changes to schema.org
• via W3C ML, github
• 構築中のものの公開: webschema.org
– Pending.webschema.org (hosted extensionの場合)
94.
95.
96. schema.org extensions
• Additional schema to schema.org core schema
• Two types of extensions
– Reviewed/Hosted extensions:
• managed and reviewed by schema.org
• http://e1.schema.org for documentation, http://schema.org for namespace
• List
auto.schema.org
bib.schema.org
health-lifesci.schema.org
iot.schema.org
– External extensions: managed by managed and reviewed by other groups
• GS1 Web Vocabulary
• Features
– Add subclasses and/or properties
– Must be consistent to core
– Overlaps among extensions may occur