Mais conteúdo relacionado
Semelhante a Pal gov.tutorial2.session12 2.architectural solutions for the integration issues (15)
Mais de Mustafa Jarrar (20)
Pal gov.tutorial2.session12 2.architectural solutions for the integration issues
- 1. أكاديمية الحكومة اإللكترونية الفلسطينية
The Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial II: Data Integration and Open Information Systems
Session 12.2
Architectural Solutions for the Integration Issues
Dr. Mustafa Jarrar
University of Birzeit
mjarrar@birzeit.edu
www.jarrar.info
PalGov © 2011 1
- 2. About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
Project Consortium:
Birzeit University, Palestine
University of Trento, Italy
(Coordinator )
Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium
Palestine Technical University, Palestine
Université de Savoie, France
Ministry of Telecom and IT, Palestine
University of Namur, Belgium
Ministry of Interior, Palestine
TrueTrust, UK
Ministry of Local Government, Palestine
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011
2
- 3. © Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
PalGov © 2011 3
- 4. Tutorial Map
Topic h
Intended Learning Objectives
Session 1: XML Basics and Namespaces 3
A: Knowledge and Understanding
Session 2: XML DTD’s 3
2a1: Describe tree and graph data models.
Session 3: XML Schemas 3
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
2a3: Demonstrate knowledge about querying techniques for data Session 4: Lab-XML Schemas 3
models as SPARQL and XPath. Session 5: RDF and RDFs 3
2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3
2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3
heterogeneous data. Session 8: Lab-OWL 3
B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3
2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3
RDF). Session 11: Lab-Oracle Semantic Technology 3
2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5
2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5
2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1
C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1
2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1
and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3
D: General and Transferable Skills
2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5
2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3
PalGov © 2011 4
- 5. Module ILOs
After completing this module students will be able to:
- Explain different architectural solutions to the problem of data
integration.
PalGov © 2011 5
- 6. Architectural Solutions for the Integration
Issues
• Two families of solutions for the integration issue:
– Application-driven Integration
• Various types of middleware (e.g. Web Services, Remote
Procedure Call (RPC), Publish & Subscribe) that achieve
reconciliation through application to middleware communication
– Data-driven Integration
• Various types of data reconciliation and integration
– Consolidation
– Data Warehouse
– Data Integration
PalGov © 2011 6
- 7. Architectures of application-driven
Integration
e.g., Service Oriented Architecture
AS AS
SS SS
MSG-1 .. . MSG-N
.. . .. .
SS SS SS SS Legend
AS AS AS AS SS = Security Server
AS = Adapter Server
MSG = Data Message
PalGov © 2011 7
- 8. Architectures of application-driven
Integration Source: Carlo Batini
e.g., Publish-Subscribe Architecture
Typical application-driven integration architecture for integration of updates.
Update of an object O
1
2
Middleware
5
7 6 4 3
Application 1 Source 1 Application 2 Application n Source n
Source 2
Subscribes Publishes
PalGov © 2011 8
- 9. Information Integration Architectures
Source: Carlo Batini
Consolidation
Source 1
Source 1
Source 2 Unique DB
Source 2
Source n
…..
New architecture
Source n
once for all
PalGov © 2011 9
- 10. Information Integration Architectures
Source: Carlo Batini
Data Warehouse
Source 1
Data Warehouse
middleware
Source 2 Unique DB
…..
Source n
New architecture:
New data base periodically updated
PalGov © 2011 10
- 11. Information Integration Architectures
Source: Carlo Batini
Virtual Data Integration
Local
Source 1 schema
Mediator
Local
schema
Local
Local Global
schema
Local schema
Source 2 schema
schema
…..
Local
Source n schema
No new data base! New architecture
PalGov © 2011 11
- 12. Additional Reading
The integration problem…
Source: Carlo Batini
Registry
Source 1
of clients 1
Registry
of clients 2 Source 2
Which kind of New
Retail
integration? architecture
Source 3
sales
On line
How to decide?
sales Source 4
…..
Other
Source n
PalGov © 2011 12
- 13. Additional Reading
Criteria to be adopted
Source: Carlo Batini
• autonomy, the degree of independence between the different data
base administrators in their design choices;
• relevance of historical data, and consequent need to periodically store
new data without deleting the old ones;
• query complexity, in terms of amount of data and tables visited and
number of operators on them, and consequent time complexity in
query execution;
• relevance of currency in queries, the need for queries to extract current
data;
• economic value of integration, the relevance of having integrated
information in input for business operational and decisional processes
in order to produce effective outputs;
PalGov © 2011 13
- 14. Additional Reading
Criteria to be adopted
Source: Carlo Batini
• volatility of sources, frequency of adding or deleting sources, and
frequency of change of source schemas;
• relevance of queries w.r.t transactions, relative importance and
frequency of queries with respect to changes in data;
• management complexity, the effort to be spent in management
activities related to databases and hw-sw infrastructures, due to the
corresponding complexity of the organizations using the data bases;
• costs of heterogeneity, hidden and explicit costs related to business
processes that are due to making use of heterogeneous data.
PalGov © 2011 14
- 15. References
• Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
• Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.
PalGov © 2011 15