This document summarizes the BBC's Linked Data Platform (LDP), which uses semantic technologies to make BBC content more connected and discoverable. It discusses how the LDP was developed for the 2010 World Cup and 2012 Olympics, and how it aggregates metadata and stores creative works. Key challenges addressed are data management, reference data, ontologies, availability, and emerging concepts. The LDP powers several BBC properties and its open data API.
BBC Linked Data Platform (SemTechBiz San Fran 2013)
1. 5 June 2013
BBC Linked Data Platform
Using semantic technologies to make our content more connected and more discoverable
2. A (very) short history
✤ Dynamic Semantic Publishing
✤ BBC Sport - Transition from ‘static’ to ‘dynamic’
✤ Introduction of Semantic Technologies for World Cup 2010
✤ Raising the bar for Olympics 2012
✤ Linked Data Platform & The Creative Work
6. CreativeWorks
✤ Minimal metadata
✤ Enough non-semantic metadata to support ‘rich links’ in a wide
range of applications
✤ Enough semantic metadata (tags) to support discovery through
semantic queries
✤ Full metadata requires a content-type-specific metadata API
✤ Access to content requires a content API
7. Some use-cases
✤ Automated index pages/feeds
✤ Semantic navigation
✤ Semantic search
✤ A typical query:
✤ Top 10, most recent, BBC News Items about Politicians who are
members of The Labour Party
8. Powered by LDP
BBC Sport
BBC Music
BBC Olympics 2012
BBC Knowledge & Learning Beta
BBC News Local Beta
BBC Sport Mobile App
15. Our own URIs
✤ Everything has a ‘Thing URI’:
✤ http://www.bbc.co.uk/things/{GUID}#ID
✤ Opaque ID, dereferencable*
✤ BBC controls identity, therefore quality & consistency
✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc
*coming soon
16. Our own ontologies
✤ Core set of ontologies that are BBC owned
✤ Creative Work, BBC, (Organsational) Provenance, etc
✤ Ability to change regularly and unilaterally
✤ Provide ‘mappings’ to more widely used ontologies
(e.g. Schema.org)
✤ Domain ontologies can be shared or reused
✤ Sport, Politics, GeoLocation, etc
17. Open data
✤ Provided through Mashery
✤ ‘Connected Studio’ events will validate
our API
✤ Public beta to follow
✤ JSON-LD & Turtle
✤ Future
✤ Self-provisioned, cloud-based
triple stores
✤ Data Dumps
19. Managing concepts across BBC
✤ Which domain ‘owns’ Arnold Schwarzenegger?
✤ News? Entertainment? History? Politics?
✤ Can domains ‘own’ predicates?
✤ Layering information over shared concepts
✤ High quality sub-sets vs. lower quality ‘long-tail’
✤ Synchronisation with external datasets
✤ Tools for creating and managing concepts
✤ Emerging, splitting & combining concepts
✤ Linked Data gives us a language to solve these problems
20. Metadata
Often subjective, never complete
✤ What is this TV programme about?
✤ Manual tag curation
✤ Subjective
✤ Long-term expense
✤ Inconsistent
✤ Automated tag generation
✤ Short-term expense
✤ Value in data or algorithm?
✤ Complex
✤ Relies on assumptions
✤ Our approach? Invest in both. Validate learnings.
21. When to reason?
✤ Our options...
✤ Before writing to the triple store
✤ Materialised in the triple store (Forward-chaining inference)
✤ Inferred by the SPARQL engine (Backward-chaining inference)
✤ After SPARQL results have returned
✤ None/some/all of the above
22. Maturity of SemanticTech
✤ From a Software Industry perspective, Semantic (RDF) Technology is
not mainstream and is therefore hard to sell
✤ Library/application immaturity can be a hinderance to innovation
✤ I believe the Sem Tech industry needs to focus on
simplicity and abstraction
✤ Semantic Technology is complex, but using it, need not be
23. Find out more
✤ Video from QCon London 2013:
✤ http://www.infoq.com/presentations/bbc-‐data-‐platform-‐api
✤ BBC Internet Blog:
✤ http://www.bbc.co.uk/blogs/internet/posts/Linked-‐Data-‐Connecting-‐
together-‐the-‐BBCs-‐Online-‐Content
✤ david.rogers@bbc.co.uk
✤ @daverog