1. Requirements
Engineering
Semantic CMS Community
for Semantic
CMS
Lecturer
Organization
Date of presentation
Co-funded by the
1 Copyright IKS Consortium
European Union
2. Page:
Part I: Foundations
(1) Introduction of Content Foundations of Semantic
(2)
Management Web Technologies
Part II: Semantic Content Part III: Methodologies
Management
Knowledge Interaction Requirements Engineering
(3) (7)
and Presentation for Semantic CMS
(4) Knowledge Representation
and Reasoning
(8)
Designing
Semantic CMS
Semantifying
(5) Semantic Lifting (9) your CMS
Storing and Accessing Designing Interactive
(6) Semantic Data
(10) Ubiquitous IS
www.iks-project.eu Copyright IKS Consortium
3. Page: 3
What is this Lecture about?
We have seen ... Part III: Methodologies
... existing technologies of the
Semantic Web Requirements Engineering
(7) for Semantic CMS
... how these technologies can
be used for semantic content Designing
management (8) Semantic CMS
What is missing? (9) Semantifying
your CMS
Methodologies for the
development of semantic CMS Designing Interactive
(10) Ubiquitous IS
First, requirements for semantic
CMS need to be specified
www.iks-project.eu Copyright IKS Consortium
4. Page: 4
Outline
What the course is about?
Methodology
Understand industry needs/expectations
Analysis of Traditional CMSs
Identify business scenarios
Identification of High Level Requirements (HLRs)
High Level Requirements
Use cases
Resulting requirements
Summary
www.iks-project.eu Copyright IKS Consortium
5. Page: 5
What the course is about?
This course aims to
Give the details of the domain-independent requirement
elicitation process of semantic enhancement of any
Content Management System
www.iks-project.eu Copyright IKS Consortium
6. Page: 6
Methodology
Bilateral meetings with CMS vendors
Workshops
Interviews
Brainstorming sessions
Gathered requirements
Categorization under major topics
High Level Requirements
Use cases
Validate the resulting use cases against the
requirements of different CMS vendors
www.iks-project.eu Copyright IKS Consortium
7. Page: 7
Results
Requirements Engineering Process
Refine HLRs into specific software requirements using scenario
and use case descriptors
Actors model
All requirements are based on use cases which use a common
actor’s model for CMS.
Integration of semantic services to existing CMSs
Easy to use and technology independent mechanisms
RESTful HTTP services
All features are expressed in terms of services
Applicable to and can be accessed by “any” CMS
Mash-up to create new high-order services
www.iks-project.eu Copyright IKS Consortium
8. Page: 8
Analysis of Traditional CMSs
GOAL:
Identify common parts that all CMSs have
INPUT:
Product descriptions
Expectations from industry
Product web-sites
Running CMS itself
www.iks-project.eu Copyright IKS Consortium
9. Page: 9
Analysis of Traditional CMSs
Analysis of
Content Types
Content Workflow
Content Services
Architectural Styles
www.iks-project.eu Copyright IKS Consortium
11. Page: 11
CMS – Content Workflow
Main innovations take place in the phases
Enrichment
Storage
Publishing
www.iks-project.eu Copyright IKS Consortium
19. Page: 19
Merge All Inputs
Workshops
Brainstorming sessions
Collected list of statements from CMS vendors
Representing their view on a semantic CMS
e.g. legacy data, how to semantify them?
e.g. tagging, different for each person, rules for personalized
tagging
Examination of existing systems
Focus on industrial needs rather than theoretical thinking
Merge all input and come up with High Level Requirements
www.iks-project.eu Copyright IKS Consortium
20. Page: 20
High Level Requirements
HLR-1: Common Vocabulary
HLR-2: Architecture and integration
HLR-3: Semantic lifting & tagging
HLR-4: Semantic search & semantic query
HLR-5: Reasoning on content items
HLR-6: Links/relations among content items
HLR-7: Workflows
HLR-8: Change management, versions and audit
HLR-9: Multilingualism
HLR-10: Security
www.iks-project.eu Copyright IKS Consortium
21. Page: 21
The refinement process
Startwith HLRs and ends with testable software
requirements
www.iks-project.eu Copyright IKS Consortium
23. Page: 23
HLR 1
Common Vocabulary
For a common understanding for users
Relating a content item with clear and precise vocabulary
items
Services and engineering of
External ontologies, taxonomies, thesauri
4 scenarios upon the collected information
e.g. statements from CMS vendors
“Agree on a set of categories and relations,
attributes as the default set”
http://lsdis.cs.uga.edu
“Help in finding good vocabularies”
www.iks-project.eu Copyright IKS Consortium
24. Page: 24
HLR 1
Common Vocabulary
Use Cases
www.iks-project.eu Copyright IKS Consortium
25. Page: 25
HLR 1
Common Vocabulary
Resulting Requirements
Functional requirements
The Vocabulary shall be navigable
…
Data requirements
Vocabulary shall be in one of standard format which
…
Integration requirements
Vocabulary shall be in an accepted standard format
…
Interface requirements: an interface shall be implemented for
Presenting list of Vocabularies
…
Non functional requirements
Vocabularies shall always be accessible
…
www.iks-project.eu Copyright IKS Consortium
26. Page: 26
HLR 2
Architecture and integration
Easy integration of
services to be developed
into different
heterogeneous system
environments
RESTful service
interfaces
The implementation
should be as technology
independent as possible
Should also provide
technology specific http://xml.com
access to the services for
best performance results
www.iks-project.eu Copyright IKS Consortium
27. Page: 27
HLR 2
Architecture and integration
Everything should be accessed by an URI
Linked Data approach
The communication should be based on standardized
text-based data formats
e.g. XML
http://viralpatel.net/
www.iks-project.eu Copyright IKS Consortium
28. Page: 28
HLR 3
Semantic lifting & tagging
Semantic tagging on
content items
Ontological classes
RDF properties
Microformats
http://microformats.org/
Extract
semantics from structures and unstructured data
automatically or semi-automatically
Make suggestions about annotations
Navigate on the content items in a semantic fashion
www.iks-project.eu Copyright IKS Consortium
29. Page: 29
HLR 4
Semantic search & semantic query
Faceted search mechanisms in top of semantic query
language support
Statements from the industry
Similarity search, similarity detection
User friendly RDF query
Support for disambiguation of search
www.iks-project.eu Copyright IKS Consortium
30. Page: 30
HLR 5
Reasoning on content items
Extracting implicit
information from the
explicit information
residing in the content
repositories
“Semantic consistency
check in CMSs”
http://www.kent.ac.uk/
www.iks-project.eu Copyright IKS Consortium
31. Page: 31
HLR 6
Links/relations among content items
Along with the semantic
annotations of the content
items, semantic relations
among them should also be
considered
“Instance linking, linked
data cloud, whenever we
create something link it with
something existing”
http://ctmlogistics.co.uk/
www.iks-project.eu Copyright IKS Consortium
32. Page: 32
HLR 7
Workflows
Control flow/lifecycle of
the content
Workflows for semantic
actions similar to
workflows for content
“Intelligent content
workflows, configured
based on organization,
hierarchy”
http://coredotnet.blogspot.com
www.iks-project.eu Copyright IKS Consortium
33. Page: 33
HLR 8
Change management, versions and audit
The system should also be
aware of changing content
and provide solutions to
invalidate semantic data
Prior extracted semantic
information might become
invalid as the content
changes
Content evolution asdhttp://visiongss.com
Semantic data evolution
www.iks-project.eu Copyright IKS Consortium
34. Page: 34
HLR 9
Multilingualism
Services to be provided
should be aware of content
in different languages
Enabling a variety of users
in different nationality
Language support
independent of the CMS
application domain http://ec.europa.eu/
www.iks-project.eu Copyright IKS Consortium
35. Page: 35
HLR 10
Security
The system must consider
existing access control
restrictions in CMSs
New kinds of restrictions
which reflect the semantic
data access
e.g. for algorithms that
reason on existing data http://www.oplin.org
Integrationof permission,
role and group models
www.iks-project.eu Copyright IKS Consortium
36. Page: 36
Summary
Therequirements evolved from a systematic
requirements engineering approach
Started with the analysis of current CMS systems and their
similarities
Collection of needs of CMS vendors in the field of
semantic enhancements of their systems
Workshops
Brainstorming sessions
Interviews
From the High Level Requirements (HLRs)
Necessary Actors are defined
Scenarios are constructed
www.iks-project.eu Copyright IKS Consortium
37. Page: 37
Summary
From the scenarios for each HLR
Use cases are extracted
From the use cases resulting requirements are refined
into the following types of requirements
Functional
Data
Integration
Interface
Non functional
www.iks-project.eu Copyright IKS Consortium
Notas do Editor
This course aims to present the steps that are taken to elicit requirements of the framework aiming to enhance traditional content management systems with semantic capabilities. It will first give common steps that are done between system designers and target groups i.e the group delivering the work and the target group (CMS vendors in this case) to determine the actual needs. After going over the agreements done between the two groups the focus will be on the high level requirements elicited. The course will try to give importance of high level requirements from a semantic perspective.
In the first phases of requirement elicitation process, bilateral meetings with the target group are done. These meeting can be workshops, interviews, brainstorming session, etc. Apart from bilateral meetings, carrying out questionnaires and survey can also be considered as methods to obtain, understand the needs of target groups in requirement analysis phase. For the time being, as our consideration is content management systems and as it is desired to provide services on top of existing systems, it would not be realistic if the existing systems would not have been examined. The better understanding of CMS vendors’ needs, the more fluent advancement later in design, implementation phases of the project.As long as collecting the needs, requirements of target groups, the collected materials are categorized into topics. For example put “Extract RDF statements from XML or HTML document” statement into “Enrichment of Content” topic and “Possibility to create pages by queries” statement into “Support for content creation” topic. The topics identified lead to high level requirements and use cases the refinement process.To ensure that needs of different CMS providers are considered, a cross validation is done between the resulting use cases and requirements of CMS providers.
Elicited high level requirements are refined into use cases. All requirements are based on use cases which use a common actor’s model for CMS because this model is the basis for the communication between the consortium, which contains producer and consumer groups, about different use cases and which actors are involved. Easy adoption and technology independence is one of the major concerns of the CMS industry. Because CMS providers want to spend the minimum effort to integrate semantic services onto their existing systems. Using RESTful services through the HTTP protocol for accessing to semantic services independent from the underlying technology required by them is a worldwide accepted modality to provide easy integration. Functionality provided by the system should be accessible by adopters through RESTful services. These services should be applicable for any domain e.gtouristic domain or health care domain. These services can be reused to create new higher-order services, the system can be extended by new services for semantic features, and services can be replaced. This setup allows a modular development of the IKS and enough flexibility to experiment with different implementations of semantic services. Additionally, each service is required to define further extension points to allow fine grained customization of all semantic features.
As it is desired to provide semantic functionalities that will be used by any kind of content management systems, common parts of existing content management systems should be identified. There are different kinds of input to be considered namely, product descriptions, expectations from industry, product web sites. Furthermore, existing CMSs can be analyzed by directly running and investigating the features.
These are the possible content types that are managed by content management systems. Not all of them support all kinds of these items; but a system aiming to be quite generic for existing systems should be capable to support these content types.
The figure in the slide shows the generic content workflow within in a content management systems. The important steps of this workflow in terms of semantic enhancement are the last 3 steps. Following 3 slides list the topics that are identified for each step.
The figure and explanation is adapted from the study: Fabian Christ, Benjamin Nagel: A Reference Architecture for Semantic Content Management Systems Starting from the user interface layer, A CMS User Interface at the top layer in the figure presents the content and offers editorial features to create, modify, and manage content within its lifecycle. Access to the content itself is provided by a Content Access layer. This layer is used by the User Interface to get access to the content and the content management features of the CMS. Additionally, the Content Access layer can be used by third party software that may want to integrate the CMS into other applications. The core management features are implemented in the Content Management layer. This layer provides functionalities for the definition of the domain or application specific Content Data Model. The Content Data Model layer is conceptually placed below the Content Management layer that has the necessary features to manipulate the model. The Content Data Model is the application specific model based on the underlying Content Repository. The Content Repository defines the fundamental concepts and persistence mechanisms for any Content Data Model that is defined on top. The Content Management features are tightly related to the Content Administration layer to administer the CMS stack.The question was how new functionality provided by the semantic services may be integrated in this architectural scenario. The idea is to offer a set of semantic services that can be easily used by a standardized communication protocol. This approach is agreed and supported by the CMS vendors who would like to see simple RESTful interfaces to these semantic services. The new situation is depicted in the next slide.
The figure and explanation is adapted from the study: Fabian Christ, Benjamin Nagel: A Reference Architecture for Semantic Content Management Systems The figures shows the architecture that enables traditional CMSs enhancing their systems with semantic capabilities without a major change in the existing system. The adaptation can be examined in 4 layers defined by an SCMS which are Presentation & Interaction, Semantic Lifting, Knowledge Representation and Reasoning and Persistence.In a traditional CMS, the user is able to edit and consumecontent through a user interface. When dealing with knowledge in SemanticCMS (SCMS) we need an additional layer at the user interface level that allows a user to interact with content, calledSemantic User Interaction. For example, a user writes an article and the SCMS recognizes the name of a person in that article. An SCMS includes a reference to an objectrepresenting that person – not only the person’s name. The user can interact with the person object and see, e.g. its birthday.In Semantic Lifting layer, SCMS provides algorithms for semantic metadata extraction from the stored content which is a missing capability of traditional content management systems.After lifting content to a semantic level thisextracted information may be used as inputs for reasoning techniques in the Reasoninglayer. To handle knowledge within the system we use Knowledge (representation) Models thatdefine the semantic metadata used to express knowledge. These metadata are often definedalong some ontology that specifies so-called concepts and their semantic relationsIn thePersistence layer, as triple stores are used to store knowledge that is represented by triples (subject, predicate, object) indicating a relation between subject and object. To be able to give a semantic meaning to a triple, there should be Knowledge Models on top of knowledge repository to specify the semantic meaning of a certain predicate.
After defining the high level requirements (HLR) each HLR is refined using the following refinement process. The process starts with the HLR, produces use cases (UC), and results in lists of testable software requirements (R) for the system to be developed. The figure in the slide depicts the refinement graph as an directed acyclic graph (DAG) that emerges from this process.
The requirements refinement process iterates over all HLRs. For each HLR two refinement steps are performed. The process is depicted in the next figure.The first refinement is to specify scenarios and to extract and consolidate use cases from these scenarios. The result is a set of scenarios and use cases for each HLR. The use case consolidation is important to identify relationships between use cases and to keep them consistent among each other.The second refinement step is to extract and consolidate the resulting requirements. The software requirements result from the use cases, so that each use case relates to one or more software requirements. A key characteristic of these requirements is their testability. For this the requirements are formulated as simple statements like "The system shall be able to...". This formulation is key word based according to [RFC2119] (see section 4).The refinement process is implemented as an open participation process that supports constant input from the involved target groups. The process coordination and consolidation of the input was done by the research partners, who also made proposals for the requirements based on the input of the industrial partners. To achieve this in a distributed setup of partners, the documented results were published online at any time with the opportunity forthe partners to add comments and make further suggestions.
In order to be able to support semantic services on top of the CMS, there needs to be support for common vocabularies, which will constitute a commonunderstanding for users by relating a content item with clear and precise vocabulary items. These vocabularies can be external ontologies, taxonomies, thesauri, and they can provide horizontal or domain knowledge.Therefore, services for engineering of such vocabularies within the system are a key requirement. These vocabularies will be utilized in the system services for providing semantic capabilities.
The figure on this slide shows the refined use cases from a high level requirement. This is the first step in the refinement process.
In the second step of refinement process, detailed, different kinds of requirements are extracted from the use cases and scenarios that are consolidated in the first step of refinement process.
To allow easy integration of system functionalities into different heterogeneous system environments all provided functions should be accessible through RESTful service interfaces. So the architecture should be based on a service approach. The implementation should be as technology independent as possible on the one hand and on the other hand provide technology specific access to the services to guarantee best performance results.
The mantra behind the idea of providing each functionality through RESTful services is that everything (data, functions, etc.) inside the system stack can be accessed by an URI.The system services need access to information that are inside the data repository of the CMS. Therefore, the system defines data access interfaces that must be supported by the CMS that integrates the system. The communication is based on standardized text-based data formats, e.g. XML.
The system to be developed should provide services to enable semantic tagging on the content items with the semantic technologies such as ontological classes, RDF properties, microformats etc... The system attaches importance to providing horizontal services to extract semantics from structured and unstructured data automatically or semi-automatically, make suggestions about the annotations and to navigate on the content items in a semantic fashion etc...
One of the key outcomes of semantic enhancements on CMSs can be observed through the semantic query and search functionalities of the system.Faceted search mechanisms on top of semantic query language support form the key requirements of this perspective. Having semantic information aboutcontent should be used to improve the search capabilities. With semantic data the system should extend the traditional search functionality to allow newways of formulating search criteria and to provide "better" search results.
Extracting implicit set of data from the explicit information residing in the content repositories is a key requirement for horizontal services of the system to be developed. Reasoning on content managed by CMSs may reveal implicit relations, similarities between different content items that can be interested by the users. Furthermore, reasoning can be used in processes like consistency checking, auto categorization, etc.
Besides tagging in combination with ontological means, content entities can be (statically) linked. This process can be automated by algorithms that reasonon the provided tags and ontologies. Content items are linked among each other during their lifecycles by the help of relevant services inside a CMS. These links/relations are needed to be handled by the semantic services of the system. Along with the semantic annotations of the content items, semantic relations among them should also be considered.As linking is already a standard technique in CMSs the system to be developed should therefore focus on automatic link creation by playing on semantic algorithms and data.
Most CMS system have their own workflow management system to control the flow and lifecycle of content. The system should offer services that can beused to implement/extend a workflow management as part of the CMS. Additionally the system should provide workflows for semantic actions similar toworkflows for content. By this the user can describe a workflow which defines the semantic reasoning algorithms and semantic extraction algorithms thatwill be applied on a new content entity.
Like traditional CMS provide the functionality for content versioning and audit, the system must provide this concept for semantic information. All services provided by the system should log their actions in a way that they are comprehensible for a user (transparency) and the service should provide the possibility to undo an action.The system should also be aware of changing content and provide solutions to invalidate semantic data, e.g. a prior extracted semantic informationmight become invalid as the content changes. The problem of content evolution will become to a problem of semantic data evolution.The mentioned functionalities are not specific to an application domain of a CMS. Therefore, these services should be provided horizontally.
The semantic services to be provided by the system should be aware of content in different languages and provide functions to reason about information even if they are in different languages. Furthermore, the services provided by the system needs to support multilingualism for enabling a variety of users in different nationality to use the system. Multilingualism is an requirement of the horizontal services of the system as language support independent of the CMS application domain unless the CMS is not designed for a specific language.
In CMS the content access can be configured using more or less fine grade access controls. When using semantic algorithms the system must considerthese existing access control restrictions. Additionally the service may consider new kinds of restrictions which reflect the semantic data access, e.g. for algorithms that reason on existing data. The system needs a concept how to integrate permission, role and group models that normally exists as part of a CMS.