Dm2 e ontotext-nov2012

OWLIM

Mariana Damova, PhD

DM2E
Vienna, November 2012

Ontotext
– Top-5 provider of core Semantic Technology
– Established in year 2000; offices in Bulgaria, UK, USA
– Active both in research and commercial projects (FP7 funding for 10 years)

• 360° semantic technology – unique portfolio:
– Semantic Databases: high-performance RDF DBMS, scalable reasoning
– Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR)
– Web Mining: focused crawling, screen scraping, data fusion
– Linked Data Management and Data Integration

Good recognition in the SemTech community
– Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at
GYM, #3 for “linked data management” at Google

Several joint ventures and subsidiaries
– Innovantage: leading online recruitment intelligence provider in UK

Ontotext Clients (selected)

British Broadcasting Corporation (BBC)
– Run its World Cup 2010 sites on top of OWLIM
– Since Mar’12 BBC Sports
– 2012 Olympics sections are driven
by OWLIM and a Concept Extraction service developed by Ontotext
Press Association (UK)
– Analysis of Sports news
– Concept extraction
– Linked data generation
Top-3 USA media (not allowed to name)
The National Archives (UK) contracted Ontotext to implement
semantic KB and semantic search for the Government Web Archive
British Museum (UK) Ontotext leads the development of Phase 3 of
ResearchSpace project on collaborative research in cultural heritage;
British Museum’s public SPARQL end-point is powered by OWLIM
de Bibliothek (Holland) aggregation of data from 150 library databases

Semantic Technologies

• Semantic technologies (RDF, LOD) allow for an unprecedented ease of
integration of heterogeneous data sources
– Already adopted in pharmaceuticals and publishing industries
– Cultural heritage is the next

BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic
Publishing” architecture, the BBC team observed considerable reduction of
complexity of database design, query specification, application
development, and query evaluation time. BBC World Cup 2010 dynamic
semantic publishing. Jem Rayfield, Senior Technical Architect BBC News
and Knowledge.
http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dyna
mic_sem.html

Semantic Repository for RDFS and OWL

• OWLIM is a family of scalable semantic repositories
• OWLIM-Lite: in-memory, fastest, scales to ~100 million statements
• OWLIM-SE: file-based, sameAs & query optimizations, scales to 20 billion
statements
• OWLIM-Enterprise: replication cluster deployment for resilience and high
performance parallel query-answering

• OWLIM provides
– Management, integration and analysis of heterogeneous data
– Combined with light-weight, high-performance reasoning
– The inference is based on logical rule-entailment
– Full RDFS, OWL Horst, restricted OWL-Lite, OWL2-QL and OWL2 RL
– Custom semantics can be defined via rules and axiomatic triples

OWLIM in the Cultural Heritage Domain

Selected commercial projects
ResearchSpace project funded by the Andrew W. Mellon Foundation
Support for collaborative web-based research, information sharing and web publishing for
the cultural heritage scholarly community. An Ontotext-led international consortium.
The Polish Digital National Museum aggregates artifacts from over 70 contributing
cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM
repository of Ontotext
LODAC (Linked Open Data in Academia), Japan's National Institute of Informatics
aggregates various information across multiple Japanese resources as LOD. The system
uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples.
SemTech for Cultural Heritage project funded by ITCC
Semantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian
technical aggregator for Europeana
Selected research projects
MOLTO FP7 project, a use case in cultural heritage for a semantic knowledge
representationinfrastructure for querying RDF and presenting query results, includes close
to 9K museum objects from two collections of The Gothenburg City
Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded
integrating activity project, a consortium of 21 partners, metadata from 6 major European
cultural institutions has selected OWLIM repository of Ontotext

OWLIM PERFORMANCE

• OWLIM is a scalable, robust and efficient triple store
– Serving the two most important web-sites for the London Olympic Games
• Official Olympics website
• BBC Olympics website
– Performance highlights
• OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product
(17 min. for 100M)
• Best query performance among those repositories that can handle update and multi-client
query tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about
100 queries/sec)
• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario
• OWLIM v5 requires between 25% and 70% less storage space

• OWL 2 RL-type languages have proven to be the only feasible approach for
reasoning with billion statements

owl:sameAs Optimization

a way to handle the equivalent statements by a single master node,
which has as an impact efficient and compact handling of inferred
statements resulting in 4-6 times more statements available to query
than the explicitly introduced ones

OWLIM Replication Cluster

• Distribution through data replication is used to ensure:
– Better handling of concurrent user requests
– Failover support
• How does it work?
– Every user request is pushed in a transaction queue
– Each data write request is are multiplexed to all repository instances
– Each read request is dispatched to one of the
instance only
– To ensure load-balancing, each
read requests is send to the
instance with smallest execution
queue at this point in time

Geo-spatial index

• Geo-spatial information concerns the geometry of points, shapes and distances relative to the
surface of the Earth (or any spherical object).
• When using OWLIM-SE all angles are in decimal degrees with the latitude ranging from -90 to
+90 degrees and the longitude ranging from -180 to +180 degrees.

• airports have a reference point given by latitude, longitude and altitude;
• political boundaries can be specified by polygons where each vertex is a 2-Dimensional
latitude/longitude pair.

RDF Rank

• OWLIM-SE includes a plug-in that allows for efficient
calculation of a modification of PageRank over RDF graphs
• Computation of rank values is fast, e.g.
– 400M LOD statements takes 310 sec (27 iteraions)

• Results are available through a system predicate
• Example: get the 100 most important nodes in the RDF graph
SELECT ?n {?n rank:hasRDFRank ?r}
ORDER BY DESC(?r) LIMIT 100

Define: nested repositories

”Nested repositories” represent a new data
management concept for RDF data:
• a mechanism for sharing data stored across
multiple repositories, where
• one of them contains a large body of
knowledge which gets embedded in other
repositories
• each containing more specific data, which are
being interlinked with the common body of
knowledge

http://www.ontotext.com/owlim

mariana.damova@ontotext.com

Dm2 e ontotext-nov2012

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (17)

Semelhante a Dm2 e ontotext-nov2012

Semelhante a Dm2 e ontotext-nov2012 (20)

Mais de Mariana Damova, Ph.D

Mais de Mariana Damova, Ph.D (20)

Último

Último (20)

Dm2 e ontotext-nov2012