SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
WWW.LEDS-PROJEKT.DE
ECCENCA CORPORATE
MEMORY
SEMANTICALLY INTEGRATED ENTERPRISE DATA LAKES
ROBERT ISELE
September
26, 2016
1
MOTIVATION
Enterprise Data Management Objective:
“Ensure all data is aligned to a common meaning
in order to achieve automation in performing
complex analytics and generating trusted
reports.”
Source:
2015 Data Management Industry Benchmark -
EDM Council
September 26,
2016
2
In 2015 only 7% of
respondents claim to
already be using shared
and unambiguous
definitions of data across
the firm and have it
accessible as operational
metadata.
7%
ARCHITECTURE
September 26,
2016
3
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Corporate
Memory
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems
Big Data DWH-
Infrastructure
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Data Ingestion
• Files in the data lake (CSV, XML, Excel)
• (relational) Databases
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Data Lake
• Emerging approach to handle large amounts
of data
• Cost-effective storage
• Data is held in their native formats
Good
Does not force an up-front integration of the
ingested data sets
Bad
Retaining an overview of disparate data silos in
the lake without having a coherent shared view
is a challenging issue
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Data Warehouses
• Existing infrastucture
• Typically relational databases
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Metadata Layer
• Dataset Metadata
• Ontologies
• Integration Rules
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Graphical User Interface
Customer Applications
INTEGRATION PROCESS
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
September 26,
2016
9
DATASET MANAGEMENT
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
September 26,
2016
10
DATASET CATALOG
• Enables the user to explore and manage datasets in the data lake
• Files in the data lake (CSV, XML, Excel)
• Databases (Apache Hive or external databases)
September 26,
2016
11
MANAGING METADATA
• Exploring and editing dataset metadata
• Semantic content information, like textual
descriptions, tags and related Persons
• Technical information and parameters, like
formats, data model and encoding
• Access information, like access path or URL,
source system or API call
• Organizational provenance, like
organizational units owning or maintaining
the dataset
September 26,
2016
12
DATASET DISCOVERY
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
September 26,
2016
13
DATASET DISCOVERY
• Goal: Augment a dataset with data from related datasets
• Automatic discovery of dataset with overlapping information
• Explorative interface
• Discovery is based on two data parts
• Business meta data
• Profiling summary
September 26,
2016
14
DISCOVERY VIEW
• Datasets are matched based on their metadata (profiling + business data)
September 26,
2016
15
DATASET PROFILING
• Datasets often contain implicit and explicit schema information
• Column names, data formats, enumerated values etc.
• Example: column contains formatted dates
• Idea: Extract a dataset summary
• For each column / property the summary contains:
1. Data type (e.g., number, date, industry classification)
2. Data format (e.g., date format)
3. Data statistics (e.g., range, distribution, most frequent values)
• Materialized as RDF with UI view
September 26,
2016
16
DETECTING DATA TYPES
• Detecting common datatypes as well as user-defined types
• Common datatypes
• Numbers
• Dates / Times
• Geographic locations (geo-coordinates, states, countries)
• User-defined data types can be integrated by adding an ontology /
taxonomy
• Usually a SKOS taxonomy
• Managed as another dataset in the dataset management
• Example: Industry taxonomy
• Standard taxonomy (NACE, SIC, NAICS) or company specific
September 26,
2016
17
FORMATS AND STATISTICS
• For some types, the data format is detected
• Example: Dates are formatted in DD-MM-YYYY
• Two functions are generated:
1. Parser that is able to read the detected representation
2. Normalizer that converts the parsed values into a configurable, organization-wide
target representation
• Statistics summarize the values:
• Value range and distribution
• Most frequent values
• Data selectivity
September 26,
2016
18
DISCOVERY VIEW
• Datasets are matched based on their metadata (profiling + business data)
Septemb
er 26, 2016
19
INTEGRATION PROCESS
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
September 26,
2016
20
DATA INTEGRATION
• The integration process is driven by a set of rules
• Lifting Rules map the source datasets to a ontology
• Linking Rules connect different datasets to a knowledge graph
• Rules are operator trees, consisting of four types of operators
• Data Access Operators
• Transformation Operators
• Similarity Operators
• Aggregation Operators
• Rules can be learned using genetic programming algorithms
• Rules are human understandable and can be edited
September 26,
2016
21
DATASET LIFTING
• Objective: Map the datasets in the data lake to a consistent vocabulary.
• A lifting rule consists of a number of mappings
• Each mapping assigns a term in the original data set (such as a column for tabular data)
to a term in the target ontology (such as a property provided by an ontology).
• Multiple mappings for each dataset can be managed to allow different
views on the same data.
• Initial mappings are generated automatically based on the profiling results
from where the user can continue to build on.
September 26,
2016
22
LIFTING EXAMPLE
September 26,
2016
23
Bond ISIN Country Industry
NEDWBK CAD 5,2%25 CA639832AA25 Canada Banking
SIEMENSF1.50%03/20 DE000A1G85B4 Germany Electrical
Equipment
Electricite de France
(EDF), 6,5% 26jan2019
USF2893TAB29 France Utilities
NEDWBK CAD 5,2%25
fibo:hasSecurityIdentifier
Utilities
Industry Ontology
Banking
France
Country Ontology
Germany
EMEA
“CA639832AA25”
fibo:legallyRecordedIn
fibo:industrySector
LINKING
• Goal: Connect individual datasets to a knowledge graph
• Identify related entities in different datasets and link them
• Either entities describing the same real world object or another relation
September 26,
2016
24
NEDWBK CAD 5,2%25
ratingScore
Industry OntologyCountry Ontology
EMEA
“AAA”
fibo:legallyRecordedIn
fibo:industrySector
Rating CAD 5,2%25
hasRating
fibo:industrySector
fibo:legallyRecordedIn
LINKAGE RULES
• Linking is based on domain-specific rules
• Specify the conditions that must hold true for two entities to be linked
September 26,
2016
25
LEARNING LINKAGE RULES
Problem: Manually writing rules is time-consuming and requires expertise
Approach: Interactive machine learning algorithm for generating rules
• Generates a rule based on a number of user-confirmed link candidates.
• Link candidates are actively selected by the learning algorithm to include link candidates
that yield a high information gain.
• The user does not need any knowledge of the characteristics
of the dataset or any particular similarity computation techniques.
September 26,
2016
26
INTEGRATION PROCESS
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
VIEW GENERATION
• The user selects a set of lifted and linked datasets
September 26,
2016
28
Hadoop
Data Lake
DATA ACCESS
• Generate data flows based on
Apache Spark
• The data flows utilize Resilient
Distributed Datasets (RDDs)
• RDDs derive new data sets from
existing data sets by applying a
chain of transformations
• A derived data set can either
• be recomputed on-the-fly
• persisted on stable storage
• Data flows can be executed
efficiently on Hadoop clusters.
September 26,
2016
29
Corporate
Bonds
Data Lifting 1
(Apache Spark
RDD)
Data Linking
(Apache Spark RDD)
Internal
Ratings
Data Lifting 2
(Apache Spark
RDD)
External
Ratings
Data Lifting 3
(Apache Spark
RDD)
eccenca
Corporate
Memory
Data
Consumer
SQL CSV
Excel
Spark
API
DEMO
Contact
Dr. Robert Isele
Tel: +49 151 17238616
email: robert.isele@eccenca.com
eccencaCommand your Data!

Mais conteúdo relacionado

Mais procurados

Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Martin Voigt
 
II-SDV 2016 - QWAM Content Intelligence
II-SDV 2016 - QWAM Content IntelligenceII-SDV 2016 - QWAM Content Intelligence
II-SDV 2016 - QWAM Content IntelligenceDr. Haxel Consult
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsOntotext
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
 
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirII-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirDr. Haxel Consult
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics DemoOntotext
 
Geospatial Big Data: Business Cases from proDataMarket
Geospatial Big Data: Business Cases from proDataMarketGeospatial Big Data: Business Cases from proDataMarket
Geospatial Big Data: Business Cases from proDataMarketdapaasproject
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteMarin Dimitrov
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
 
LD4 conference 2020 The Use of Linked Data at the ISSN International Centre
LD4 conference 2020 The Use of Linked Data at the ISSN International CentreLD4 conference 2020 The Use of Linked Data at the ISSN International Centre
LD4 conference 2020 The Use of Linked Data at the ISSN International CentreISSN International Centre
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
Open Information in need of liberation: Aspire and the conundrum of linked data
Open Information in need of liberation: Aspire and the conundrum of linked dataOpen Information in need of liberation: Aspire and the conundrum of linked data
Open Information in need of liberation: Aspire and the conundrum of linked dataTalis
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureMichele Pasin
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceMarin Dimitrov
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
 

Mais procurados (20)

Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016
 
II-SDV 2016 - QWAM Content Intelligence
II-SDV 2016 - QWAM Content IntelligenceII-SDV 2016 - QWAM Content Intelligence
II-SDV 2016 - QWAM Content Intelligence
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
 
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirII-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
Geospatial Big Data: Business Cases from proDataMarket
Geospatial Big Data: Business Cases from proDataMarketGeospatial Big Data: Business Cases from proDataMarket
Geospatial Big Data: Business Cases from proDataMarket
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
S4: The Self-Service Semantic Suite
S4: The Self-Service Semantic SuiteS4: The Self-Service Semantic Suite
S4: The Self-Service Semantic Suite
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
LD4 conference 2020 The Use of Linked Data at the ISSN International Centre
LD4 conference 2020 The Use of Linked Data at the ISSN International CentreLD4 conference 2020 The Use of Linked Data at the ISSN International Centre
LD4 conference 2020 The Use of Linked Data at the ISSN International Centre
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Open Information in need of liberation: Aspire and the conundrum of linked data
Open Information in need of liberation: Aspire and the conundrum of linked dataOpen Information in need of liberation: Aspire and the conundrum of linked data
Open Information in need of liberation: Aspire and the conundrum of linked data
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
Low-cost Open Data As-a-Service
Low-cost Open Data As-a-ServiceLow-cost Open Data As-a-Service
Low-cost Open Data As-a-Service
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing Platform
 

Destaque

Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MExecuting SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MLinked Enterprise Date Services
 
Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store
Distributed Collaboration on RDF Datasets Using Git: Towards the Quit StoreDistributed Collaboration on RDF Datasets Using Git: Towards the Quit Store
Distributed Collaboration on RDF Datasets Using Git: Towards the Quit StoreLinked Enterprise Date Services
 
Streaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and SemanticsStreaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and SemanticsLinked Enterprise Date Services
 
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016Aad Versteden
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataLinked Enterprise Date Services
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesDataWorks Summit
 

Destaque (9)

Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-MExecuting SPARQL Queries over Mapped Document Stores with SparqlMap-M
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
 
Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store
Distributed Collaboration on RDF Datasets Using Git: Towards the Quit StoreDistributed Collaboration on RDF Datasets Using Git: Towards the Quit Store
Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store
 
Streaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and SemanticsStreaming-based Text Mining using Deep Learning and Semantics
Streaming-based Text Mining using Deep Learning and Semantics
 
E-government at its best: Open, transparent and useful
E-government at its best: Open, transparent and usefulE-government at its best: Open, transparent and useful
E-government at its best: Open, transparent and useful
 
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
mu.semte.ch - A journey from TenForce's perspective - SEMANTICS2016
 
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked DataFAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
FAME.Q – A Formal approach to Master Quality in Enterprise Linked Data
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 

Semelhante a eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes

Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Jonathan Challener
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...Alistair Hamilton
 

Semelhante a eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes (20)

Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
 

Último

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Último (20)

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes

  • 1. WWW.LEDS-PROJEKT.DE ECCENCA CORPORATE MEMORY SEMANTICALLY INTEGRATED ENTERPRISE DATA LAKES ROBERT ISELE September 26, 2016 1
  • 2. MOTIVATION Enterprise Data Management Objective: “Ensure all data is aligned to a common meaning in order to achieve automation in performing complex analytics and generating trusted reports.” Source: 2015 Data Management Industry Benchmark - EDM Council September 26, 2016 2 In 2015 only 7% of respondents claim to already be using shared and unambiguous definitions of data across the firm and have it accessible as operational metadata. 7%
  • 3. ARCHITECTURE September 26, 2016 3 Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Corporate Memory Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure
  • 4. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Data Ingestion • Files in the data lake (CSV, XML, Excel) • (relational) Databases
  • 5. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Data Lake • Emerging approach to handle large amounts of data • Cost-effective storage • Data is held in their native formats Good Does not force an up-front integration of the ingested data sets Bad Retaining an overview of disparate data silos in the lake without having a coherent shared view is a challenging issue
  • 6. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Data Warehouses • Existing infrastucture • Typically relational databases
  • 7. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Metadata Layer • Dataset Metadata • Ontologies • Integration Rules
  • 8. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Graphical User Interface Customer Applications
  • 9. INTEGRATION PROCESS Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop September 26, 2016 9
  • 10. DATASET MANAGEMENT Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop September 26, 2016 10
  • 11. DATASET CATALOG • Enables the user to explore and manage datasets in the data lake • Files in the data lake (CSV, XML, Excel) • Databases (Apache Hive or external databases) September 26, 2016 11
  • 12. MANAGING METADATA • Exploring and editing dataset metadata • Semantic content information, like textual descriptions, tags and related Persons • Technical information and parameters, like formats, data model and encoding • Access information, like access path or URL, source system or API call • Organizational provenance, like organizational units owning or maintaining the dataset September 26, 2016 12
  • 13. DATASET DISCOVERY Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop September 26, 2016 13
  • 14. DATASET DISCOVERY • Goal: Augment a dataset with data from related datasets • Automatic discovery of dataset with overlapping information • Explorative interface • Discovery is based on two data parts • Business meta data • Profiling summary September 26, 2016 14
  • 15. DISCOVERY VIEW • Datasets are matched based on their metadata (profiling + business data) September 26, 2016 15
  • 16. DATASET PROFILING • Datasets often contain implicit and explicit schema information • Column names, data formats, enumerated values etc. • Example: column contains formatted dates • Idea: Extract a dataset summary • For each column / property the summary contains: 1. Data type (e.g., number, date, industry classification) 2. Data format (e.g., date format) 3. Data statistics (e.g., range, distribution, most frequent values) • Materialized as RDF with UI view September 26, 2016 16
  • 17. DETECTING DATA TYPES • Detecting common datatypes as well as user-defined types • Common datatypes • Numbers • Dates / Times • Geographic locations (geo-coordinates, states, countries) • User-defined data types can be integrated by adding an ontology / taxonomy • Usually a SKOS taxonomy • Managed as another dataset in the dataset management • Example: Industry taxonomy • Standard taxonomy (NACE, SIC, NAICS) or company specific September 26, 2016 17
  • 18. FORMATS AND STATISTICS • For some types, the data format is detected • Example: Dates are formatted in DD-MM-YYYY • Two functions are generated: 1. Parser that is able to read the detected representation 2. Normalizer that converts the parsed values into a configurable, organization-wide target representation • Statistics summarize the values: • Value range and distribution • Most frequent values • Data selectivity September 26, 2016 18
  • 19. DISCOVERY VIEW • Datasets are matched based on their metadata (profiling + business data) Septemb er 26, 2016 19
  • 20. INTEGRATION PROCESS Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop September 26, 2016 20
  • 21. DATA INTEGRATION • The integration process is driven by a set of rules • Lifting Rules map the source datasets to a ontology • Linking Rules connect different datasets to a knowledge graph • Rules are operator trees, consisting of four types of operators • Data Access Operators • Transformation Operators • Similarity Operators • Aggregation Operators • Rules can be learned using genetic programming algorithms • Rules are human understandable and can be edited September 26, 2016 21
  • 22. DATASET LIFTING • Objective: Map the datasets in the data lake to a consistent vocabulary. • A lifting rule consists of a number of mappings • Each mapping assigns a term in the original data set (such as a column for tabular data) to a term in the target ontology (such as a property provided by an ontology). • Multiple mappings for each dataset can be managed to allow different views on the same data. • Initial mappings are generated automatically based on the profiling results from where the user can continue to build on. September 26, 2016 22
  • 23. LIFTING EXAMPLE September 26, 2016 23 Bond ISIN Country Industry NEDWBK CAD 5,2%25 CA639832AA25 Canada Banking SIEMENSF1.50%03/20 DE000A1G85B4 Germany Electrical Equipment Electricite de France (EDF), 6,5% 26jan2019 USF2893TAB29 France Utilities NEDWBK CAD 5,2%25 fibo:hasSecurityIdentifier Utilities Industry Ontology Banking France Country Ontology Germany EMEA “CA639832AA25” fibo:legallyRecordedIn fibo:industrySector
  • 24. LINKING • Goal: Connect individual datasets to a knowledge graph • Identify related entities in different datasets and link them • Either entities describing the same real world object or another relation September 26, 2016 24 NEDWBK CAD 5,2%25 ratingScore Industry OntologyCountry Ontology EMEA “AAA” fibo:legallyRecordedIn fibo:industrySector Rating CAD 5,2%25 hasRating fibo:industrySector fibo:legallyRecordedIn
  • 25. LINKAGE RULES • Linking is based on domain-specific rules • Specify the conditions that must hold true for two entities to be linked September 26, 2016 25
  • 26. LEARNING LINKAGE RULES Problem: Manually writing rules is time-consuming and requires expertise Approach: Interactive machine learning algorithm for generating rules • Generates a rule based on a number of user-confirmed link candidates. • Link candidates are actively selected by the learning algorithm to include link candidates that yield a high information gain. • The user does not need any knowledge of the characteristics of the dataset or any particular similarity computation techniques. September 26, 2016 26
  • 27. INTEGRATION PROCESS Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop
  • 28. VIEW GENERATION • The user selects a set of lifted and linked datasets September 26, 2016 28
  • 29. Hadoop Data Lake DATA ACCESS • Generate data flows based on Apache Spark • The data flows utilize Resilient Distributed Datasets (RDDs) • RDDs derive new data sets from existing data sets by applying a chain of transformations • A derived data set can either • be recomputed on-the-fly • persisted on stable storage • Data flows can be executed efficiently on Hadoop clusters. September 26, 2016 29 Corporate Bonds Data Lifting 1 (Apache Spark RDD) Data Linking (Apache Spark RDD) Internal Ratings Data Lifting 2 (Apache Spark RDD) External Ratings Data Lifting 3 (Apache Spark RDD) eccenca Corporate Memory Data Consumer SQL CSV Excel Spark API
  • 30. DEMO
  • 31. Contact Dr. Robert Isele Tel: +49 151 17238616 email: robert.isele@eccenca.com eccencaCommand your Data!