12. The past is a foreign place
• Archaeological
knowledge
acquisition is a
Interpretation dynamic process
• Dynamic
feedback allows
Synthesis theories/practice
to be tested or
revised
37. Grey Literature
• Archaeology units conduct most
excavations in the UK
• Problem
– Predominantly paper based recording (still)
– Primary record is difficult to access
– Excavations are written up as site reports
(interpretative and data summary)
– These reports are not published: hence Grey
Literature
50. Ideally we want to increase the size of the accessible knowledge so that
policy can be formed from a position of „perfect‟, or „near-perfect‟, knowledge.
51.
52.
53.
54.
55.
56. Why do we need to model?
• Problems
– An idealised view of the world
• Basic models do not support rich data
• Do not support change management
• Are not dynamic
– Semantically unclear
– Fundamental database issues
• Lack of atomicity
• Poor URIs
– INCONSISTENCY
57. Why do we need to model?
• Benefits
– Archaeoinformatics – to appropriately
represent the archaeological record
• Variations in
– Quality
– Certainty
• Provenance (who did what)
• Presence/absence
– Support and enhance process
• Inference
• Evidence
• Multivocality (handling multiple interpretations)
– Improving structure
58. Modelling : Ontologies
• An ontology is an explicit specification of a domain in
terms of entities and relationship between these entities.
• The relationship between entities provide the semantics
or meaning of the data.
• This information is normally hidden in a dataset, e.g.,
databases, but it is made more explicit in an ontology.
• This explicit specification facilitates
– Querying
– Complex reasoning
– Extension of the knowledge by inference
62. Modelling: SKOS
• SKOS provides a standard way to represent
knowledge organization systems in RDF:
– thesauri,
– classification schemes,
– subject heading systems
– taxonomies
• SKOS is a data model for data that can be published
on the web.
66. Modelling: CiDOC CRM ontology
• provides definitions and a formal structure for describing
the implicit and explicit concepts and relationships used
in cultural heritage documentation.
• enables information exchange and interchange between
heterogeneous sources of cultural heritage information
by providing a common and extensible semantic
framework
• promote a shared understanding of cultural heritage
information by transforming disparate, localised
information sources into a coherent global resource.
67. Modelling: CiDOC CRM ontology
• This is the most dominant ontology in cultural heritage.
• It is intended to cover the full spectrum of cultural
heritage knowledge - from Archaeology to Art history,
literary and musical entities.
• An ontology of 86 classes and 137 properties for culture
and more.
• With the capacity to explain hundreds of (meta)data
formats.
• International standard since 2006 - ISO 21127:2006.
• The ontology has been encoded in OWL2.0, OWLDL
and RDFS.
68. Conceptualization
?
approximates
explains,
motivates
Data structures &
Presentation model
organize
Data
Legacy
Legacy
bases
systems
World Phenomena systems Data in various forms
69. AD461 * * P82 at some time
AD453
within
P11 had participant: Death of Death of
P93 took o.o.existence: Leo I Attila
P92 brought i. existence: P82 at some time
* within AD452
before
P4 has time-span before
(is time- span of)
Attila
Pope Leo I P14 carried out by meeting P14 carried out by Attila
(performed) Leo I (performed)
before before
before
Birth of Birth of Deduction:
Leo I Attila
70. Modelling: CIDOC-CRM-EH
• Archaeology extension to CiDOC CRM
• Mostly extensions to classes not
properties
• Archaeological concepts expand the
scope of CIDOC concepts –e.g.
– EHE0003: AreaOfInvestigation(IsA E53:
Place)
– EHE0007 Context (IsA E53: Place)
– EHE0005 Group (IsA E53: Place)
72. What can we do with these
modelling tools
• The basics: effective data management
• Linked Open Data
– With limited semantic inconsistencies
• Deposition of data in CRM-EH RDF
• Leverage the archive
– Mapping to Ontologies and SKOS
– Dealing with documents (which in some
instances are the only archive)
• Open Access
74. Grey Literature
• Prof. Richard Bradley – Reading Uni
– http://www.nature.com/news/2010/100407/full/464826
a.html
– Recognised the potential of Grey Literature
– Visited contract units
• Collated „grey literature‟
– Analysed for
• Understanding of Bronze Age settlement patterns
and dynamics
– Transformed
• Theory and interpretative frameworks
• What about the backlog?
75. STAR project
• http://hypermedia.research.glam.ac.uk/kos
/star/
• Open up the grey literature to scholarly
research.
• Develop new methods for enhancing
linkages between digital archive database
resources and to associated grey
literature, exploiting the potential of a high
level, core ontology.
76. NLP- Rule Based Information
Extraction
• Aims to Enable „rich‟, semantic aware indexing of
Archaeology fieldwork reports (Grey Literature) with
respect to the CRM-EH Conceptual Reference Model
(Ontology)
• Grey Literature; source materials that can not be found
through the conventional means of publication
• OASIS
– Online AccesS to the Index of achaeological investigationS
– Coordinated by ADS
– Online index to Archaeological Grey Literature
– Accessed via ADS ArchSearch online Service
(http://www.oasis.ac.uk)
77. NLP- General Architecture for Text
Engineering
Java Pattern Engine
EH Thesaurus
Gazetteer Lists
ADS – OASIS
Grey Literature
XML structures to represent
semantic properties
78. Name Entity Recognition (NER)
• The NER phase is targeted to extract the following
annotation types with respect to the CIDOC-CRM.
– E4.Period
– E19.Physical Object
– E53.Place
– E57.Material
• Supports ambiguous and context searches.
80. Examples – Landscape stratigraphy
The matrix reflects the relative position
and stratigraphic contacts of observable
stratigraphic units, or contexts.
81. Examples – Landscape stratigraphy
• Computerized stratigraphy has a long research history
– (e.g. ArchEd, Stratify)
– Satisfies many visualization and error checking issues
• Some problems concern:
– the difficulty to manage large/huge datasets
– the difficulty to integrate a digital matrix representation with other
software (e.g. GIS)
– the difficulty to handle multilinear stratigraphic sequences
– the difficulty to manage uncertain or insufficient knowledge
• This is a real problem for DYNAMIC data
82. M. Cattani et al – Prolog system
Variables („Terms‟)
X, Y, Z, ...
Al p h a b e t
Constants („Terms‟)
us1, us2, us3, ...
Unary Predicates („Atoms‟)
cutUnit(X), trench(X), wall(X), ...
Binary Predicates („Atoms‟)
cover(X,Y), fill(X,Y), cut(X,Y), ...
Positive & Negative Literals („Atoms‟ or „Classically Negated
Atoms‟)
Ax i om
wall(X), cut(X,Y),..., cover(X,Y),
fill(X,Y), ...
Ground Literals
cover(u6,u3), filledBy(u4,u8), ...
naf-Literals („Atoms‟ or „Atoms preceeded by not‟)
..., not cover(X,Y), not fill(X,Y),
...
Rules
dirPostTo(Z,Y) :-
equalTo(X,Y),cover(Z,X).
85. Examples – Landscape stratigraphy
• Provides a formal framework
– Scale stratigraphic calculations
– Manage and articulate uncertainty
– Web-based
• A foundation to develop new approaches
• Based on: M. Cattani, G. Mantegari, A.
Mosca, M. Palmonari
• http://goo.gl/T5yH7
86. Examples – Heterogenous Pottery
Sequences
• Pottery is important for dating
sites and deposits
• Classification based on form
and fabric variations
• Dates derived from stratified
sequences (e.g. wells)
• Pottery sequences developed
locally and integrated –
– Regionally
– Nationally
87. Clumping and splitting
• Periodically sequences are
reviewed
– Clumping (owl:sameas)
– Splitting
– Refining date ranges
• Date changes impact on:
– Interpretation
– Knowledge
– Policy
– Think “Grey literature” but bigger!
• Unfortunately the data is
decoupled and not linked. The
primary and synthetic data is
never/rarely re-interpreted
88. Examples – Data integration
through CiDOC CRM
• Need to produce an information model that
reflected continuing best archaeological practice
• Not fossilize structures of existing systems
• Model new information requirements
• Map to specific old legacy data and new data
fields
• Integration of old & new project recording
systems ( CSV, MDB, MySQLet al)
• Use ontology to model conceptual relationships
between data
89. STAR & STELLAR
• Current situation is one of fragmented
datasets and applications, with different
terminology systems
• Need for integrative conceptual framework
– English Heritage extended CIDOC CRM
ontology for archaeology
• Need for terminology control
– English Heritage Thesauri
– Recording Manual glossaries augmented with
dataset glossaries
90. STAR/STELLAR architecture
Applications – Server Side, Rich Client, Browser
Web Services, SQL, SPARQL
RDF Based Common Ontology Data Layer (CRM / CRMEH / SKOS)
Indexing SKOS Conversion Data Mapping / Normalisation
EH STAN RRAD IADB LEAP RPRE
Grey
thesauri,
literature
glossaries
91. STELLAR Data Conversions
SQL2CSV SQLEXECUTE
SQL2TAB
Delimited Data Database
CSV2DB
TAB2DB
CSV2STG
TAB2STG SQL2STG
DELIM2STG User-defined
template
[other textual
XML RDF
formats]
92. Consistent URIs - Convention
• Namespace prefix
– E.g. http://stellar/silchester/
• Entity type
– E.g. “EHE0007” (i.e. Context)
• Identifier (data value)
– E.g. “1015”
• URI pattern: {prefix}{entity type}_{value}
– E.g. http://stellar/silchester/EHE0007_1015
• Consistent identifiers facilitate incremental
enrichment of data
93. Cross Searching
• The STAR demonstrator
– Making use of the decoupled RDF files
– Cross searching between grey literature and
datasets
– A SPARQL engine supports the semantic
search
• Semantic Search Examples
– Context of type X containing Find of type Y:
“hearth” containing “coin”,
– Context Find of type X within Context of type
Y: “Animal Remains” within “pit”.
94. Cross Searching
“the pit produced a range of artefactual material
“all domestic fire-places that contain money”
which included animal bone”
97. Long term vision and the future
• Archaeological data sources are fragmentary by nature.
• Theoretical approaches used by practitioners are
diverse.
• “Data is sacred” - expressing one‟s knowledge base in
terms of another‟s ontology, may not always be
“acceptable”.
• Adoption of the Semantic Web by the heritage sector
depends upon the syntactical and semantic mark-up of
content.
• The sector should coordinate their efforts to ensure that
the fundamental building blocks that can enable their
success on the semantic web are in place.
• Try not to “reinvent the wheel” in terms of metadata - use
existing annotation schemes.
101. Implications of silo-ed data
• No synergy
Interpretation
• Cripples the
knowledge
frameworks
• Less effective
– Research
– Policy
– Impact
Synthesis
102.
103.
104.
105.
106.
107.
108. Credits and Thanks
• Ceri Binding
• Paul Cripps
• Glauco Mantegari
• Keith May
• Monika Solanki
• Doug Trudhope
• Andreas Vlachidis
Notas do Editor
Time team
Understand complex relationships in the fragmented archaeological recordArchaeology as human ecology
Sections, plans, sensors
Classification and identification of material (artefacts - finds)
Environmental Sampling – ecofacts
Environmental Processing
Environmental Analysis – identifying and counting grains
Data RichData Archiving - Building the silo
INFORMATION OVERLOADUnstructured
Too Much
Formal structures inhibit collaboration and accessInformal networks established to make the data work effectively
The nature of knowledge From a policy perspective there are different levels of knowledge awareness know what we know (the data we have access to) know there are things we don't know (the relevant data which is not accessible) and recognise there are things that we are unaware of which may be extremely important (the potential knowledge advances gained by integrating all data, collaborating with different domains and future research avenues). Ideally we want to increase the size of the accessible knowledge so that policy can be formed from a position of ‘perfect’, or ‘near-perfect’, knowledge.
Templates take tabular data as input:directly from delimited data file, or as output of SQL query on internal database