1. Web Scale Reasoning and the
LarKC Project
(Introduction)
Luka Bradeško
Cycorp Europe
2. Goals of LarKC
LarKC = Large Knowledge Collider
• Build an integrated pluggable platform
for large scale reasoning
• Support for parallelization,
distribution, remote execution, data
2
2
“Significant progress is sometimes made
not by making something possible that was impossible
before, but by substantially lowering the costs
of something that was only possible before at high cost”
distribution, remote execution, data
storage
• Use existing plug-ins, develop new
• Easy integration of components
• Enables low cost experimentation
3. Overall approach of LarKC
• Very lightweight platform
– communication, synchronisation, registration
– LarKC = “SPARQL endpoint on steroids”
• The real work happens in the plugins
• LarKC gives you:
– very scalable datalayer
3
– very scalable datalayer
– standardised interfaces for combining components
– utilities & infrastructure to abstract from remote deployment
• Three types of LarKC users:
– people building plugins
– people configuring workflows
– people using workflows
4. Workflow
Support
System
Plug-in
RegistryDecider
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
SPARQL Endpoint, Application
Platform Utility
Functionality
APIs
Plug-ins
LarKC Architecture
Data Layer API
RDF
Store
RDF
Store
RDF
Store
RDF
Doc
RDF
Doc
RDF
Doc
Data Layer
Query
Transformer
Plug-in API
Identifier
Plug-in API
Info. Set
Transformer
Plug-in API
Selecter
Plug-in API
Reasoner
Plug-in API
External
systems
External data
sources
4
5. LarKC Plug-in API
+ Collection<InformationSet>
identify
(Query theQuery, Contract
contract, Context context)
Identifier
+ Set<Query>
transform(Query theQuery,
Contract theContract,
Context theContext)
QueryTransformer
+ InformationSet
transform(InformationSet
theInformationSet, Contract
theContract, Context
theContext)
InformationSetTransformer
+ SetOfStatements
select(SetOfStatements
theSetOfStatements,
Contract contract, Context
context)
Selecter
+ VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements
theSetOfStatements, Contract contract, Context context)
Reasoner
• 5 types of plug-ins
• Plug-in API enables interoperability (between plug-in
and platform and between plug-ins)
• Plug-ins I/O abstract data structures of RDF triples =>
flexibility for assembling plug-ins and for plug-in writers
5
+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, SetOfStatements
theSetOfStatements, Contract contract, Context context)
+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements
theSetOfStatements, Contract contract, Context context)
+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery,
SetOfStatements theSetOfStatements, Contract contract, Context context)
+ VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters
theQoSParameters)
+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters
theQoSParameters)
+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters
theQoSParameters)
+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters
theQoSParameters)
Decider
flexibility for assembling plug-ins and for plug-in writers
• Compatibility ensured by DECIDER and workflow
configurators, based on plug-in description
6. Decider
What does a workflow look like?
Decider
Plug-in APIPlug-in API
Plug-in Manager
Query
Transformer
Plug-in API
Plug-in Manager
Identifier
Plug-in API
Plug-in Manager
Info. Set
Transformer
Plug-in API
Plug-in Manager
Selecter
Plug-in API
Plug-in Manager
Reasoner
Plug-in API
Plug-in
Registry
Workflow
Support
System
RDF
Store
Identifier
Info Set
Transformer
ReasonerSelecter
Query
Transformer
6
7. What does a workflow look like?
Decider
Info Set
Transformer
Identifier
Identifier
Decider
Plug-in APIPlug-in API
Plug-in Manager
Query
Transformer
Plug-in API
Plug-in Manager
Identifier
Plug-in API
Plug-in Manager
Info. Set
Transformer
Plug-in API
Plug-in Manager
Selecter
Plug-in API
Plug-in Manager
Reasoner
Plug-in API
Plug-in
Registry
Workflow
Support
System
RDF
Store
Identifier
Info Set
Transformer
ReasonerSelecter
Query
Transformer
Data Layer Data Layer Data Layer Data Layer
7
8. Decider Using Plug-in Registry to Create Workflow
Q
TI
S R
VB
B
D 1.3.1
Represent Properties
• Functional
• Non-functional (e.g. QoS)
• WSMO-Lite Syntax
Represent Properties
• Functional
• Non-functional (e.g. QoS)
• WSMO-Lite Syntax
Q
T
I
S R
VB
A
VB
Logical Representation
• Describes role
• Describes Inputs/Outputs
• Automatically extracted using API
• Decider can use for dynamic configuration
• Rule-based
• Fast
Logical Representation
• Describes role
• Describes Inputs/Outputs
• Automatically extracted using API
• Decider can use for dynamic configuration
• Rule-based
• Fast
8
9. LarKC Plug-in Managers
Plug-in Manager
Query
Transformer
Plug-in APIPlug-in API
Plug-in Manager
Identifier
Plug-in APIPlug-in API
• Run in separate threads
• Automatically add meta-data to registry when loaded
• Communicate RDF data by value or by reference
• Parallelisation
Plug-in Manager
Transformer
Plug-in APIPlug-in API
Plug-in Manager
Identifier
Plug-in APIPlug-in API
ransformer
Transformer
Transformer
Plug-in Manager
Identifier
Plug-in APIPlug-in API
Plug-in Manager
Identifier
Plug-in APIPlug-in API
Plug-in Manager
Selector
Plug-in API
Plug-in Manager
Selector
Plug-in API
• Split/Join connectors
in progress
9
10. Example workflow
Query Identify
Transform
GATE
Internet
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle
"http://shodan.ijs.si/article.txt" .
?company cyc:isa cyc:PubliclyHeldCorporation }
Select
ReasonResult
Research
Cyc
GATE
11. Pipeline
Support
System
Plug-in
RegistryDecider
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Application
LarKC Data Layer
Platform Utility
Functionality
APIs
Plug-ins
Data Layer API
RDF
Store
RDF
Store
RDF
Store
RDF
Doc
RDF
Doc
RDF
Doc
Data Layer
Query
Transformer
Plug-in API
Identifier
Plug-in API
Info. Set
Transformer
Plug-in API
Selecter
Plug-in API
Reasoner
Plug-in API
11
External
systems
External data
sources
Data Layer API
Data Layer
12. LarKC Data Layer
RDF
Graph
RDF
Graph
RDF
Graph
Default
Graph
RDF
Graph
RDF
Graph
Dataset
Labeled Set
Main goal:
• The Data Layer supports LarKC plug-ins:
– storage, retrieval and light-weight inference on top
of large volumes of data
– automates the exchange of RDF data by reference
and by value
12
RDF
Graph
RDF
Graph
RDF
Graph
Graph Graph
RDF
Graph
RDF
Graph
RDF
Graph
RDF
Graph
RDF
Graph
and by value
– offers other utility tools to manage data (e.g.
merger)
13. Used Concepts in the Data Model
NG1 NG3NG2
Labelled groups
NG4 NG5
Labelled groups
of statements Labelled groups
of statements
14. Supported Sets of Statements
RDF data types Description Example
Set of statement RDF statements s1, p1, o1, ng1
s2, p2, o2
s3, p3, o3, ng3, {group1}
RDF graph Named graph s1, p1, o1, ng1, {group1}
s2, p2, o2, ng1, {group2}
s3, p3, o3, ng1
Dataset SPARQL dataset
represents a collection
of graphs
s1, p1, o1, ng1
s2, p2, o2, ng2
s3, p3, o3, ng3
Labelled group of
statements
RDF group of
statements
s1, p1, o1, ng1, {group1}
s2, p2, o2, ng2, {group1}
s3, p3, o3, ng3, {group1}
16. Released System v1.1: larkc.sourceforge.net
Identify
• SINDICE
• SWOOGLE
• SINDICE
• SWOOGLE
Select
• Spreading
Activation
• Geolocation
• Spreading
Activation
• Geolocation
• Annotate GATE
• Annotate Cyc
• Annotate GATE
• Annotate Cyc
• Open Apache 2.0 license
• Previous early adopters
workshops @ ESWC ’09,10 and
ISWC ‘09
– participants modified plug-ins,
modified workflows
Transform
• Annotate Cyc
• SPARQL-CycL
• Annotate Cyc
• SPARQL-CycL
Reason
• Jena, IRIS, Pellet
• Cyc, PION
• Siemens
• Jena, IRIS, Pellet
• Cyc, PION
• Siemens
Decide
• Scripted: Real-
Time City
• Dynamic Cyc
• Scripted: Real-
Time City
• Dynamic Cyc
Decider
Plug-in API
Plug-in Manager
Query
Transformer
Plug-in API
Plug-in Manager
Identifier
Plug-in API
Plug-in Manager
Info. Set
Transformer
Plug-in API
Plug-in Manager
Selecter
Plug-in API
Plug-in Manager
Reasoner
Plug-in API
Plug-in
Registry
Pipeline
Support
System
Standard Open Environment:
Java, subversion, packaged
release, command line build, or eclipse
16
17. • Distributed Data Layer
• Caching, data warming/cooling
• Data Streaming between
remote components
Next Steps
• Requirements traceability and
update
• Architecture refinement
Platform validationPlatform validation
remote components
• Experimental instrumentation
and monitoring
17
Early Adopters
19. Rapid Progress, but We’re Not Finished…
Pipeline
Support
System
Plug-in
RegistryDecider
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Plug-in Manager
Plug-in API
Application
• Classified according to:
• Sources
– Initial Project Objectives (DoW)
– LarKC Collider Platform (WP5 discussions)
– LarKC Rapid Prototyping
– LarKC Use Cases (WP6, WP7a, WP7b)
– LarKC Plug-ins (WP2, WP3, WP4)
Detailed information
in D5.3.1
Requirements
Analysis and
report on lessons
learned during
prototyping
Requirements (WP 5)• Concentrating on parallel and
distributed execution.
• Optimisation of complex
workflows.
• Extend meta-data
representation for QoS,
parallelism and use it.
• Concentrating on parallel and
distributed data layer; caching
Data Layer API
RDF
Store
RDF
Store
RDF
Store
RDF
Doc
RDF
Doc
RDF
Doc
Data Layer
Query
Transformer
Plug-in API
Identifier
Plug-in API
Info. Set
Transformer
Plug-in API
Selecter
Plug-in API
Reasoner
Plug-in API
– Data Caching
– Anytime Behaviour
– Plug-in Registration and
Discovery
– Plug-in Monitoring and
Measurement
– Support for Developers
– Plug-ins
• Classified according to:
– Resources
– Heterogeneity
– Usage
– Interoperability
– Parallelization “within plug-
ins”
– Distributed/remote execution
– Data Layer
prototyping
distributed data layer; caching
and data migration.
• Support more plug-in needs
while maintaining platform
integrity
• Integrate distributed plug-ins
• Support workflows inspired by
human cognition (e.g. workflow
interruption for optimal stopping)
• Support anytime/streaming
• Experimental instrumentation and
monitoring
19
20. LarKC Plug-in API: General Plug-in Model
+ URI getIdentifier()
+ QoSInformation getQoSInformation()
Plug-in
Functional
properties
Non-functional
properties
WSDL description
Plug-in
description
• Plug-ins are assembled into Workflows, to realise a LarKC Experiment or Application
• Plug-ins are identified by a URI (Uniform Resource Identifier)
• Plug-ins provide MetaData about what they do (Functional properties): e.g. type =
Selecter
• Plug-ins provide information about their behaviour and needs, including Quality of
Service information (Non-functional properties): e.g. Throughput, MinMemory,
Cost,…
• Plug-ins can be provided with a Contract that tells them how to behave (e.g.
Contract : “give me the next 10 results”) and Context information used to store state
between invocations
20
21. LarKC Plug-in API: IDENTIFY
+ Collection<InformationSet> identify
(Query theQuery, Contract contract, Context
context)
Identifier
• IDENTIFY: Given a query, identify resources that could be used
to answer it
• Sindice – Triple Pattern Query RDF Graphs
• Google – Keyword Query Natural Language Document
• Triple Store – SPARQL Query RDF Graphs
21
23. LarKC Plug-in API: TRANSFORM (2/2)
+ InformationSet transform(InformationSet
theInformationSet, Contract theContract,
Context theContext)
InformationSetTransformer
• Information Set TRANSFORM: Transforms data from one
representation to another
• Natural Language Document RDF Graph
• Structured Data Sources RDF Graph
• RDF Graph RDF Graph (e.g. foaf vocabulary to facebook vocabulary)
23
24. LarKC Plug-in API: SELECT
+ SetOfStatements select(SetOfStatements
theSetOfStatements, Contract contract,
Context context)
Selecter
• SELECT: Given a set of statements (e.g. a number of RDF
Graphs) will choose a selection/sample from this set
– Collection of RDF Graphs Triple Set (Merged)
– Collection of RDF Graphs Triple Set (10% of each)
– Collection of RDF Graphs Triple Set (N Triples)
24
25. LarKC Plug-in API: REASON
+ VariableBinding sparqlSelect(SPARQLQuery theQuery, SetOfStatements
theSetOfStatements, Contract contract, Context context)
+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery,
SetOfStatements theSetOfStatements, Contract contract, Context context)
+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, SetOfStatements
theSetOfStatements, Contract contract, Context context)
Reasoner
• REASON: Executes a query against the supplied set of
statements
– SPARQL Query Variable Binding (Select)
– SPARQL Query Set of statements (Construct)
– SPARQL Query Set of statements (Describe)
– SPARQL Query Boolean (Ask)
theSetOfStatements, Contract contract, Context context)
+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery,
SetOfStatements theSetOfStatements, Contract contract, Context context)
25
26. LarKC Plug-in API: DECIDE
+ VariableBinding sparqlSelect(SPARQLQuery theQuery, QoSParameters
theQoSParameters)
+ SetOfStatements sparqlConstruct(SPARQLQuery theQuery, QoSParameters
theQoSParameters)
+ SetOfStatements sparqlDescribe(SPARQLQuery theQuery, QoSParameters
theQoSParameters)
Decider
• DECIDE: Builds the workflow and manages the control flow
– Scripted Decider: Predefined workflow is built and executed
– Self-configuring Decider: Uses plug-in descriptions (functional and non-functional
properties) to build the workflow
+ BooleanInformationSet sparqlAsk(SPARQLQuery theQuery, QoSParameters
theQoSParameters)
26