Archaeology, Informatics and Knowledge Representation

Housekeeping

• Notes: http://goo.gl/l6xGa
• Mindmap: http://goo.gl/4O1QX
• Presentation:
– Slideshare:
– Prezi:

Work is not conducted in 3 days!

The past is a foreign place

• Archaeological
knowledge
acquisition is a
Interpretation dynamic process
• Dynamic
feedback allows
Synthesis theories/practice
to be tested or
revised

Theories structure this evidence

Archaeological evidence (practice)

• Primary data
– Excavation
records
– Remote
sensing
transcriptions
– NMP
– Lab Analysis
– Specialist
reports
• Decoupled
synthetic data
– Site reports
– SMR
– NMR

Grey Literature

• Archaeology units conduct most
excavations in the UK
• Problem
– Predominantly paper based recording (still)
– Primary record is difficult to access
– Excavations are written up as site reports
(interpretative and data summary)
– These reports are not published: hence Grey
Literature

The Corpus helps Decision Making

Ideally we want to increase the size of the accessible knowledge so that
policy can be formed from a position of „perfect‟, or „near-perfect‟, knowledge.

Why do we need to model?

• Problems
– An idealised view of the world
• Basic models do not support rich data
• Do not support change management
• Are not dynamic
– Semantically unclear
– Fundamental database issues
• Lack of atomicity
• Poor URIs
– INCONSISTENCY

Why do we need to model?
• Benefits
– Archaeoinformatics – to appropriately
represent the archaeological record
• Variations in
– Quality
– Certainty
• Provenance (who did what)
• Presence/absence
– Support and enhance process
• Inference
• Evidence
• Multivocality (handling multiple interpretations)
– Improving structure

Modelling : Ontologies

• An ontology is an explicit specification of a domain in
terms of entities and relationship between these entities.
• The relationship between entities provide the semantics
or meaning of the data.
• This information is normally hidden in a dataset, e.g.,
databases, but it is made more explicit in an ontology.
• This explicit specification facilitates
– Querying
– Complex reasoning
– Extension of the knowledge by inference

Modelling: SKOS

• SKOS provides a standard way to represent
knowledge organization systems in RDF:
– thesauri,
– classification schemes,
– subject heading systems
– taxonomies
• SKOS is a data model for data that can be published
on the web.

Modelling: CiDOC CRM ontology

• provides definitions and a formal structure for describing
the implicit and explicit concepts and relationships used
in cultural heritage documentation.
• enables information exchange and interchange between
heterogeneous sources of cultural heritage information
by providing a common and extensible semantic
framework
• promote a shared understanding of cultural heritage
information by transforming disparate, localised
information sources into a coherent global resource.

Modelling: CiDOC CRM ontology

• This is the most dominant ontology in cultural heritage.
• It is intended to cover the full spectrum of cultural
heritage knowledge - from Archaeology to Art history,
literary and musical entities.
• An ontology of 86 classes and 137 properties for culture
and more.
• With the capacity to explain hundreds of (meta)data
formats.
• International standard since 2006 - ISO 21127:2006.
• The ontology has been encoded in OWL2.0, OWLDL
and RDFS.

Conceptualization
?
approximates
explains,
motivates
Data structures &
Presentation model

organize

Data
Legacy
Legacy
bases
systems
World Phenomena systems Data in various forms

AD461 * * P82 at some time
AD453
within
P11 had participant: Death of Death of
P93 took o.o.existence: Leo I Attila

P92 brought i. existence: P82 at some time
* within AD452
before
P4 has time-span before
(is time- span of)
Attila
Pope Leo I P14 carried out by meeting P14 carried out by Attila
(performed) Leo I (performed)

before before

before
Birth of Birth of Deduction:
Leo I Attila

Modelling: CIDOC-CRM-EH

• Archaeology extension to CiDOC CRM
• Mostly extensions to classes not
properties
• Archaeological concepts expand the
scope of CIDOC concepts –e.g.
– EHE0003: AreaOfInvestigation(IsA E53:
Place)
– EHE0007 Context (IsA E53: Place)
– EHE0005 Group (IsA E53: Place)

What can we do with these
modelling tools
• The basics: effective data management
• Linked Open Data
– With limited semantic inconsistencies
• Deposition of data in CRM-EH RDF
• Leverage the archive
– Mapping to Ontologies and SKOS
– Dealing with documents (which in some
instances are the only archive)
• Open Access

Examples – Populating the record
from grey literature

Grey Literature

• Prof. Richard Bradley – Reading Uni
– http://www.nature.com/news/2010/100407/full/464826
a.html
– Recognised the potential of Grey Literature
– Visited contract units
• Collated „grey literature‟
– Analysed for
• Understanding of Bronze Age settlement patterns
and dynamics
– Transformed
• Theory and interpretative frameworks
• What about the backlog?

STAR project

• http://hypermedia.research.glam.ac.uk/kos
/star/
• Open up the grey literature to scholarly
research.
• Develop new methods for enhancing
linkages between digital archive database
resources and to associated grey
literature, exploiting the potential of a high
level, core ontology.

NLP- Rule Based Information
Extraction
• Aims to Enable „rich‟, semantic aware indexing of
Archaeology fieldwork reports (Grey Literature) with
respect to the CRM-EH Conceptual Reference Model
(Ontology)
• Grey Literature; source materials that can not be found
through the conventional means of publication
• OASIS
– Online AccesS to the Index of achaeological investigationS
– Coordinated by ADS
– Online index to Archaeological Grey Literature
– Accessed via ADS ArchSearch online Service
(http://www.oasis.ac.uk)

NLP- General Architecture for Text
Engineering

Java Pattern Engine

EH Thesaurus
Gazetteer Lists

ADS – OASIS
Grey Literature
XML structures to represent
semantic properties

Name Entity Recognition (NER)

• The NER phase is targeted to extract the following
annotation types with respect to the CIDOC-CRM.
– E4.Period
– E19.Physical Object
– E53.Place
– E57.Material
• Supports ambiguous and context searches.

Example of Grey Literature
Annotations

Examples – Landscape stratigraphy
The matrix reflects the relative position
and stratigraphic contacts of observable
stratigraphic units, or contexts.


• Computerized stratigraphy has a long research history
– (e.g. ArchEd, Stratify)
– Satisfies many visualization and error checking issues
• Some problems concern:
– the difficulty to manage large/huge datasets
– the difficulty to integrate a digital matrix representation with other
software (e.g. GIS)
– the difficulty to handle multilinear stratigraphic sequences
– the difficulty to manage uncertain or insufficient knowledge

• This is a real problem for DYNAMIC data

M. Cattani et al – Prolog system
Variables („Terms‟)
X, Y, Z, ...
Al p h a b e t

Constants („Terms‟)
us1, us2, us3, ...
Unary Predicates („Atoms‟)
cutUnit(X), trench(X), wall(X), ...
Binary Predicates („Atoms‟)
cover(X,Y), fill(X,Y), cut(X,Y), ...
Positive & Negative Literals („Atoms‟ or „Classically Negated
Atoms‟)
Ax i om

wall(X), cut(X,Y),..., cover(X,Y),
fill(X,Y), ...
Ground Literals
cover(u6,u3), filledBy(u4,u8), ...
naf-Literals („Atoms‟ or „Atoms preceeded by not‟)
..., not cover(X,Y), not fill(X,Y),
...
Rules
dirPostTo(Z,Y) :-
equalTo(X,Y),cover(Z,X).

Archeometrical SpecS
DATA UPDATE
DATA
cover(us1,us2).
cover(us1,us2). cover(us1,us3).
cover(us1,us3). fill(us2,us4).
fill(us2,us4). fill(us3,us4).
fill(us3,us4).
equalTo(us2,us4).

DLV [build BEN/Oct 11 2007 gcc 3.4.5 (mingw special)] DLV [build BEN/Oct 11 2007 gcc 3.4.5 (mingw special)]

{posteriorTo(us1,us2), posteriorTo(us1,us3),
posteriorTo(us1,us4), posteriorTo(us2,us4),
posteriorTo(us3,us4), posteriorTo(us2,us3)}

{posteriorTo(us1,us2), posteriorTo(us1,us3),
posteriorTo(us1,us4), posteriorTo(us2,us4),
posteriorTo(us3,us4), posteriorTo(us3,us2)}

{posteriorTo(us1,us2), posteriorTo(us1,us3), {posteriorTo(us1,us2), posteriorTo(us1,us3),
posteriorTo(us1,us4), posteriorTo(us2,us4), posteriorTo(us1,us4), posteriorTo(us2,us4),
posteriorTo(us3,us4), contemporary(us3,us2), posteriorTo(us3,us4), contemporary(us3,us2),
contemporary(us2,us3)} contemporary(us2,us3)}

linPostTo(u6,u3), linPostTo(u1,u7), linPostTo(u4,u10), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u10,u11)
linPostTo(u5,u1), linPostTo(u8,u9)}

The pruned Harris‟s example – using Prolog and inferencing
{contemporary(u6,u6), contemporary(u1,u1), contemporary(u4,u4), contemporary(u2,u2), contemporary(u3,u3), contemporary(u12,u12), contemporary(u7,u7)
contemporary(u9,u9), contemporary(u11,u11), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8), linPostTo(u6,u3), linPostTo(u1,u7)
linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u10,u11), linPostTo(u5,u1)}

linPostTo(u4,u10), linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11), linPostTo(u5,u1)}

equalTo(u1,u4).
contemporary(u9,u9), contemporary(u10,u9), contemporary(u11,u11), contemporary(u9,u10), contemporary(u10,u10), contemporary(u5,u5), contemporary(u8,u8)
equalTo(u9,u10).


linPostTo(u2,u12), linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11), linPostTo(u5,u1), linPostTo(u8,u9)}


linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11), linPostTo(u5,u4), linPostTo(u8,u9)}

{linPostTo(u6,u3), linPostTo(u1,u7), linPostTo(u4,u10), linPostTo(u2,u12),
linPostTo(u8,u9)} linPostTo(u3,u2), linPostTo(u12,u5), linPostTo(u7,u8), linPostTo(u9,u11),
linPostTo(u10,u11), linPostTo(u5,u1), linPostTo(u5,u4), linPostTo(u8,u9),
contemporary(u1,u4), contemporary(u9,u10)}
linPostTo(u5,u1)}



• Provides a formal framework
– Scale stratigraphic calculations
– Manage and articulate uncertainty
– Web-based
• A foundation to develop new approaches
• Based on: M. Cattani, G. Mantegari, A.
Mosca, M. Palmonari
• http://goo.gl/T5yH7

Examples – Heterogenous Pottery
Sequences
• Pottery is important for dating
sites and deposits
• Classification based on form
and fabric variations
• Dates derived from stratified
sequences (e.g. wells)
• Pottery sequences developed
locally and integrated –
– Regionally
– Nationally

Clumping and splitting
• Periodically sequences are
reviewed
– Clumping (owl:sameas)
– Splitting
– Refining date ranges
• Date changes impact on:
– Interpretation
– Knowledge
– Policy
– Think “Grey literature” but bigger!
• Unfortunately the data is
decoupled and not linked. The
primary and synthetic data is
never/rarely re-interpreted

Examples – Data integration
through CiDOC CRM
• Need to produce an information model that
reflected continuing best archaeological practice
• Not fossilize structures of existing systems
• Model new information requirements
• Map to specific old legacy data and new data
fields
• Integration of old & new project recording
systems ( CSV, MDB, MySQLet al)
• Use ontology to model conceptual relationships
between data

STAR & STELLAR

• Current situation is one of fragmented
datasets and applications, with different
terminology systems
• Need for integrative conceptual framework
– English Heritage extended CIDOC CRM
ontology for archaeology
• Need for terminology control
– English Heritage Thesauri
– Recording Manual glossaries augmented with
dataset glossaries

STAR/STELLAR architecture
Applications – Server Side, Rich Client, Browser

Web Services, SQL, SPARQL

RDF Based Common Ontology Data Layer (CRM / CRMEH / SKOS)

Indexing SKOS Conversion Data Mapping / Normalisation

EH STAN RRAD IADB LEAP RPRE
Grey
thesauri,
literature
glossaries

STELLAR Data Conversions
SQL2CSV SQLEXECUTE
SQL2TAB
Delimited Data Database
CSV2DB
TAB2DB

CSV2STG
TAB2STG SQL2STG

DELIM2STG User-defined
template

[other textual
XML RDF
formats]

Consistent URIs - Convention
• Namespace prefix
– E.g. http://stellar/silchester/
• Entity type
– E.g. “EHE0007” (i.e. Context)
• Identifier (data value)
– E.g. “1015”
• URI pattern: {prefix}{entity type}_{value}
– E.g. http://stellar/silchester/EHE0007_1015
• Consistent identifiers facilitate incremental
enrichment of data

Cross Searching
• The STAR demonstrator
– Making use of the decoupled RDF files
– Cross searching between grey literature and
datasets
– A SPARQL engine supports the semantic
search
• Semantic Search Examples
– Context of type X containing Find of type Y:
“hearth” containing “coin”,
– Context Find of type X within Context of type
Y: “Animal Remains” within “pit”.

Cross Searching

“the pit produced a range of artefactual material
“all domestic fire-places that contain money”
which included animal bone”

Long term vision and the future

Long term vision and the future
• Archaeological data sources are fragmentary by nature.
• Theoretical approaches used by practitioners are
diverse.
• “Data is sacred” - expressing one‟s knowledge base in
terms of another‟s ontology, may not always be
“acceptable”.
• Adoption of the Semantic Web by the heritage sector
depends upon the syntactical and semantic mark-up of
content.
• The sector should coordinate their efforts to ensure that
the fundamental building blocks that can enable their
success on the semantic web are in place.
• Try not to “reinvent the wheel” in terms of metadata - use
existing annotation schemes.

Implications of silo-ed data

• No synergy
Interpretation
• Cripples the
knowledge
frameworks
• Less effective
– Research
– Policy
– Impact
Synthesis

Credits and Thanks

• Ceri Binding
• Paul Cripps
• Glauco Mantegari
• Keith May
• Monika Solanki
• Doug Trudhope
• Andreas Vlachidis

Archaeology, Informatics and Knowledge Representation

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Archaeology, Informatics and Knowledge Representation

Semelhante a Archaeology, Informatics and Knowledge Representation (20)

Mais de DART Project

Mais de DART Project (20)

Último

Último (20)

Archaeology, Informatics and Knowledge Representation

Notas do Editor