SlideShare uma empresa Scribd logo
1 de 31
WWW.LEDS-PROJEKT.DE
ECCENCA CORPORATE MEMORY
SEMANTICALLY INTEGRATED ENTERPRISE DATA LAKES
September 29, 20161
MOTIVATION
Enterprise Data Management Objective:
“Ensure all data is aligned to a common
meaning in order to achieve automation in
performing complex analytics and generating
trusted reports.”
Source:
2015 Data Management Industry Benchmark -
EDM Council
September 29, 20162
In 2015 only 7% of
respondents claim to
already be using shared
and unambiguous
definitions of data across
the firm and have it
accessible as operational
metadata.
7%
ARCHITECTURE
September 29, 20163
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Corporate
Memory
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems
Big Data DWH-
Infrastructure
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Data Ingestion
• Files in the data lake (CSV, XML, Excel)
• (relational) Databases
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Data Lake
• Emerging approach to handle large amounts
of data
• Cost-effective storage
• Data is held in their native formats
Good
Does not force an up-front integration of the
ingested data sets
Bad
Retaining an overview of disparate data silos in
the lake without having a coherent shared view
is a challenging issue
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Data Warehouses
• Existing infrastucture
• Typically relational databases
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Metadata Layer
• Dataset Metadata
• Ontologies
• Integration Rules
ARCHITECTURE
Management
Accounting
Risk Management
Regulatory Reporting
Treasury MarketingAccounting
Inbound Raw Data Store
Knowledge Graph for Meta Data, KPI Definition and Data Models
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
Big Data
DWH-
Infrastructure
Graphical User Interface
Customer Applications
INTEGRATION PROCESS
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
September 29, 20169
DATASET MANAGEMENT
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
September 29, 201610
DATASET CATALOG
• Enables the user to explore and manage datasets in the data lake
• Files in the data lake (CSV, XML, Excel)
• Databases (Apache Hive or external databases)
September 29, 201611
MANAGING METADATA
• Exploring and editing dataset metadata
• Semantic content information, like textual
descriptions, tags and related Persons
• Technical information and parameters, like
formats, data model and encoding
• Access information, like access path or
URL, source system or API call
• Organizational provenance, like
organizational units owning or maintaining
the dataset
September 29, 201612
DATASET DISCOVERY
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
September 29, 201613
DATASET DISCOVERY
• Goal: Augment a dataset with data from related datasets
• Automatic discovery of dataset with overlapping information
• Explorative interface
• Discovery is based on two data parts
• Business meta data
• Profiling summary
September 29, 201614
DISCOVERY VIEW
• Datasets are matched based on their metadata (profiling + business data)
September 29, 201615
DATASET PROFILING
• Datasets often contain implicit and explicit schema information
• Column names, data formats, enumerated values etc.
• Example: column contains formatted dates
• Idea: Extract a dataset summary
• For each column / property the summary contains:
1. Data type (e.g., number, date, industry classification)
2. Data format (e.g., date format)
3. Data statistics (e.g., range, distribution, most frequent values)
• Materialized as RDF with UI view
September 29, 201616
DETECTING DATA TYPES
• Detecting common datatypes as well as user-defined types
• Common datatypes
• Numbers
• Dates / Times
• Geographic locations (geo-coordinates, states, countries)
• User-defined data types can be integrated by adding an ontology /
taxonomy
• Usually a SKOS taxonomy
• Managed as another dataset in the dataset management
• Example: Industry taxonomy
• Standard taxonomy (NACE, SIC, NAICS) or company specific
September 29, 201617
FORMATS AND STATISTICS
• For some types, the data format is detected
• Example: Dates are formatted in DD-MM-YYYY
• Two functions are generated:
1. Parser that is able to read the detected representation
2. Normalizer that converts the parsed values into a configurable, organization-wide
target representation
• Statistics summarize the values:
• Value range and distribution
• Most frequent values
• Data selectivity
September 29, 201618
DISCOVERY VIEW
• Datasets are matched based on their metadata (profiling + business data)
September 29,
2016
19
INTEGRATION PROCESS
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
September 29, 201620
DATA INTEGRATION
• The integration process is driven by a set of rules
• Lifting Rules map the source datasets to a ontology
• Linking Rules connect different datasets to a knowledge graph
• Rules are operator trees, consisting of four types of operators
• Data Access Operators
• Transformation Operators
• Similarity Operators
• Aggregation Operators
• Rules can be learned using genetic programming algorithms
• Rules are human understandable and can be edited
September 29, 201621
DATASET LIFTING
• Objective: Map the datasets in the data lake to a consistent vocabulary.
• A lifting rule consists of a number of mappings
• Each mapping assigns a term in the original data set (such as a column for tabular
data) to a term in the target ontology (such as a property provided by an ontology).
• Multiple mappings for each dataset can be managed to allow different
views on the same data.
• Initial mappings are generated automatically based on the profiling results
from where the user can continue to build on.
September 29, 201622
LIFTING EXAMPLE
September 29, 201623
Bond ISIN Country Industry
NEDWBK CAD 5,2%25 CA639832AA25 Canada Banking
SIEMENSF1.50%03/20 DE000A1G85B4 Germany Electrical
Equipment
Electricite de France
(EDF), 6,5% 26jan2019
USF2893TAB29 France Utilities
NEDWBK CAD 5,2%25
fibo:hasSecurityIdentifier
Utilities
Industry Ontology
Banking
France
Country Ontology
Germany
EMEA
“CA639832AA25”
fibo:legallyRecordedIn
fibo:industrySector
LINKING
• Goal: Connect individual datasets to a knowledge graph
• Identify related entities in different datasets and link them
• Either entities describing the same real world object or another relation
September 29, 201624
NEDWBK CAD 5,2%25
ratingScore
Industry OntologyCountry Ontology
EMEA
“AAA”
fibo:legallyRecordedIn
fibo:industrySector
Rating CAD 5,2%25
hasRating
fibo:industrySector
fibo:legallyRecordedIn
LINKAGE RULES
• Linking is based on domain-specific rules
• Specify the conditions that must hold true for two entities to be linked
September 29, 201625
LEARNING LINKAGE RULES
Problem: Manually writing rules is time-consuming and requires expertise
Approach: Interactive machine learning algorithm for generating rules
• Generates a rule based on a number of user-confirmed link candidates.
• Link candidates are actively selected by the learning algorithm to include link candidates
that yield a high information gain.
• The user does not need any knowledge of the characteristics
of the dataset or any particular similarity computation techniques.
September 29, 201626
INTEGRATION PROCESS
Dataset
Management
•Catalog Datasets
•Catalog Ontologies
•Manage Metadata
Dataset Discovery
•Data Profiling
•Dataset Exploration
Dataset Integration
•Dataset Lifting
•Dataset Linking
•Data Quality Validation
Data Access
•Domain Specific
Consolidated Views
•Execution on Hadoop
VIEW GENERATION
• The user selects a set of lifted and linked datasets
September 29, 201628
Hadoop
Data Lake
DATA ACCESS
• Generate data flows based on
Apache Spark
• The data flows utilize Resilient
Distributed Datasets (RDDs)
• RDDs derive new data sets from
existing data sets by applying a
chain of transformations
• A derived data set can either
• be recomputed on-the-fly
• persisted on stable storage
• Data flows can be executed
efficiently on Hadoop clusters.
September 29, 201629
Corporate
Bonds
Data Lifting 1
(Apache Spark
RDD)
Data Linking
(Apache Spark RDD)
Internal
Ratings
Data Lifting 2
(Apache Spark
RDD)
External
Ratings
Data Lifting 3
(Apache Spark
RDD)
eccenca
Corporate
Memory
Data
Consumer
SQL CSV
Excel
Spark
API
DEMO
Contact
Dr. Robert Isele
Tel: +49 151 17238616
email: robert.isele@eccenca.com
eccencaCommand your Data!

Mais conteúdo relacionado

Mais procurados

Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Martin Voigt
 
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...semanticsconference
 
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for EnterpriseChalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprisesemanticsconference
 
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...semanticsconference
 
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...semanticsconference
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics DemoOntotext
 
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsSemantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsLinked Enterprise Date Services
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsOntotext
 
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirII-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirDr. Haxel Consult
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...semanticsconference
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphCambridge Semantics
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesSrinath Srinivasa
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic TechnologiesPeter Haase
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
 
Solution architecture
Solution architectureSolution architecture
Solution architectureRajat Agrawal
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryOntotext
 

Mais procurados (20)

Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016Ontos NLP Stack, Sep. 2016
Ontos NLP Stack, Sep. 2016
 
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
 
Solution architecture for big data projects
Solution architecture for big data projectsSolution architecture for big data projects
Solution architecture for big data projects
 
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for EnterpriseChalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
 
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...David Kuilman | Creating a Semantic Enterprise Content model to support conti...
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
 
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
Joe Pairman | Multiplying the Power of Taxonomy with Granular, Structured Con...
 
Open Data and News Analytics Demo
Open Data and News Analytics DemoOpen Data and News Analytics Demo
Open Data and News Analytics Demo
 
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web ApplicationsSemantic E-Commerce - Use Cases in Enterprise Web Applications
Semantic E-Commerce - Use Cases in Enterprise Web Applications
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
 
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla AirII-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
The Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge GraphThe Business Case for Semantic Web Ontology & Knowledge Graph
The Business Case for Semantic Web Ontology & Knowledge Graph
 
Big Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and OpportunitiesBig Data and the Semantic Web: Challenges and Opportunities
Big Data and the Semantic Web: Challenges and Opportunities
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
 
Semantic Technology in Publishing & Finance
Semantic Technology in Publishing & FinanceSemantic Technology in Publishing & Finance
Semantic Technology in Publishing & Finance
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing Platform
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Solution architecture
Solution architectureSolution architecture
Solution architecture
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to Delivery
 

Destaque

Michael Fuchs | How to compute semantic relationships between entities and fa...
Michael Fuchs | How to compute semantic relationships between entities and fa...Michael Fuchs | How to compute semantic relationships between entities and fa...
Michael Fuchs | How to compute semantic relationships between entities and fa...semanticsconference
 
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...semanticsconference
 
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...semanticsconference
 
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...semanticsconference
 
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...semanticsconference
 
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...semanticsconference
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...semanticsconference
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...semanticsconference
 
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...semanticsconference
 
Victor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of ThingsVictor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of Thingssemanticsconference
 
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...semanticsconference
 
Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Datasemanticsconference
 
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...semanticsconference
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...semanticsconference
 
Thomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old DataThomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old Datasemanticsconference
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Povedasemanticsconference
 
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...semanticsconference
 
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINEFelix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINEsemanticsconference
 
Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...
Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...
Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...semanticsconference
 

Destaque (19)

Michael Fuchs | How to compute semantic relationships between entities and fa...
Michael Fuchs | How to compute semantic relationships between entities and fa...Michael Fuchs | How to compute semantic relationships between entities and fa...
Michael Fuchs | How to compute semantic relationships between entities and fa...
 
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
Camilo Thorne, Stefano Faralli and Heiner Stuckenschmidt | Entity Linking for...
 
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
Jörg Waitelonis, Henrik Jürges and Harald Sack | Don't compare Apples to Oran...
 
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
Sebastian Bader | Semantic Technologies for Assisted Decision-Making in Indus...
 
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
Philippe Martin and Jérémy Bénard | Importing, Translating and Exporting Know...
 
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
Vladimir Alexiev | Semantic Enrichment of Twitter Microposts Helps Understand...
 
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
Adam Bartusiak and Jörg Lässig | Semantic Processing for the Conversion of Un...
 
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
Phil Ritchie | Putting Standards into Action: Multilingual and Semantic Enric...
 
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
Shuangyong Song, Qingliang Miao and Yao Meng | Linking Images to Semantic Kno...
 
Victor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of ThingsVictor Charpenay | Standardized Semantics for an Open Web of Things
Victor Charpenay | Standardized Semantics for an Open Web of Things
 
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAid...
 
Kostas Kastrantas | Business Opportunities with Linked Open Data
Kostas Kastrantas  | Business Opportunities with Linked Open DataKostas Kastrantas  | Business Opportunities with Linked Open Data
Kostas Kastrantas | Business Opportunities with Linked Open Data
 
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
Kolawole John Adebayo, Luigi Di Caro and Guido Boella | A Supervised Keyphras...
 
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
OWL-based validation by Gavin Mendel Gleasonand Bojan Bozic, Trinity College,...
 
Thomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old DataThomas Vavra | New Ways of Handling Old Data
Thomas Vavra | New Ways of Handling Old Data
 
OOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria PovedaOOPS!: on-line ontology diagnosis by Maria Poveda
OOPS!: on-line ontology diagnosis by Maria Poveda
 
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
Georgios Meditskos and Stamatia Dasiopoulou | Question Answering over Pattern...
 
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINEFelix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
Felix Burkhardt | ARCHITECTURE FOR A QUESTION ANSWERING MACHINE
 
Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...
Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...
Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...
 

Semelhante a Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Beyond the Data Horizon Unlocking Growth for 5X through Competitor Analysis.pptx
Beyond the Data Horizon Unlocking Growth for 5X through Competitor Analysis.pptxBeyond the Data Horizon Unlocking Growth for 5X through Competitor Analysis.pptx
Beyond the Data Horizon Unlocking Growth for 5X through Competitor Analysis.pptxPrasanna Hegde
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...Big Data Week
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...
AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...
AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...Amazon Web Services
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptxssuserdd904d
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsSonata Software
 
CRM-UG Summit Phoenix 2018 - What is Common Data Model and how to use it?
CRM-UG Summit Phoenix 2018 - What is Common Data Model and how to use it?CRM-UG Summit Phoenix 2018 - What is Common Data Model and how to use it?
CRM-UG Summit Phoenix 2018 - What is Common Data Model and how to use it?Nicolas Georgeault
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdfAlan McSweeney
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewDenodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo
 

Semelhante a Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes (20)

Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Beyond the Data Horizon Unlocking Growth for 5X through Competitor Analysis.pptx
Beyond the Data Horizon Unlocking Growth for 5X through Competitor Analysis.pptxBeyond the Data Horizon Unlocking Growth for 5X through Competitor Analysis.pptx
Beyond the Data Horizon Unlocking Growth for 5X through Competitor Analysis.pptx
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...
AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...
AWS Summit Singapore - Accelerate Digital Transformation through AI-powered C...
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptx
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
 
CRM-UG Summit Phoenix 2018 - What is Common Data Model and how to use it?
CRM-UG Summit Phoenix 2018 - What is Common Data Model and how to use it?CRM-UG Summit Phoenix 2018 - What is Common Data Model and how to use it?
CRM-UG Summit Phoenix 2018 - What is Common Data Model and how to use it?
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 View
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 

Mais de semanticsconference

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventuresemanticsconference
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...semanticsconference
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideationsemanticsconference
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance centersemanticsconference
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domainssemanticsconference
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4semanticsconference
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi ressemanticsconference
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlandssemanticsconference
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...semanticsconference
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...semanticsconference
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage informationsemanticsconference
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017semanticsconference
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...semanticsconference
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...semanticsconference
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichmentsemanticsconference
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police storysemanticsconference
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...semanticsconference
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....semanticsconference
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...semanticsconference
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...semanticsconference
 

Mais de semanticsconference (20)

Linear books to open world adventure
Linear books to open world adventureLinear books to open world adventure
Linear books to open world adventure
 
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 1.2   high-precision, context-free entity linking exploiting unambigu...Session 1.2   high-precision, context-free entity linking exploiting unambigu...
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
 
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 4.3   semantic annotation for enhancing collaborative ideationSession 4.3   semantic annotation for enhancing collaborative ideation
Session 4.3 semantic annotation for enhancing collaborative ideation
 
Session 1.1 dalicc - data licenses clearance center
Session 1.1   dalicc - data licenses clearance centerSession 1.1   dalicc - data licenses clearance center
Session 1.1 dalicc - data licenses clearance center
 
Session 1.3 context information management across smart city knowledge domains
Session 1.3   context information management across smart city knowledge domainsSession 1.3   context information management across smart city knowledge domains
Session 1.3 context information management across smart city knowledge domains
 
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0   aussenac semanticsnl-pwebsem2017-v4Session 0.0   aussenac semanticsnl-pwebsem2017-v4
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
 
Session 0.0 keynote sandeep sacheti - final hi res
Session 0.0   keynote sandeep sacheti - final hi resSession 0.0   keynote sandeep sacheti - final hi res
Session 0.0 keynote sandeep sacheti - final hi res
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.2   enrich your knowledge graphs: linked data integration with pool...Session 1.2   enrich your knowledge graphs: linked data integration with pool...
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
 
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4   connecting information from legislation and datasets using a ca...Session 1.4   connecting information from legislation and datasets using a ca...
Session 1.4 connecting information from legislation and datasets using a ca...
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 0.0   media panel - matthias priem - gtuo - semantics 2017Session 0.0   media panel - matthias priem - gtuo - semantics 2017
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
 
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3   semantic asset management in the dutch rail engineering and con...Session 1.3   semantic asset management in the dutch rail engineering and con...
Session 1.3 semantic asset management in the dutch rail engineering and con...
 
Session 1.3 energy, smart homes & smart grids: towards interoperability...
Session 1.3   energy, smart homes & smart grids: towards interoperability...Session 1.3   energy, smart homes & smart grids: towards interoperability...
Session 1.3 energy, smart homes & smart grids: towards interoperability...
 
Session 1.2 improving access to digital content by semantic enrichment
Session 1.2   improving access to digital content by semantic enrichmentSession 1.2   improving access to digital content by semantic enrichment
Session 1.2 improving access to digital content by semantic enrichment
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 2.5   semantic similarity based clustering of license excerpts for im...Session 2.5   semantic similarity based clustering of license excerpts for im...
Session 2.5 semantic similarity based clustering of license excerpts for im...
 
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 4.2   unleash the triple: leveraging a corporate discovery interface....Session 4.2   unleash the triple: leveraging a corporate discovery interface....
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
 
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 5.6   towards a semantic outlier detection framework in wireless sens...Session 5.6   towards a semantic outlier detection framework in wireless sens...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
 

Último

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Último (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes

  • 1. WWW.LEDS-PROJEKT.DE ECCENCA CORPORATE MEMORY SEMANTICALLY INTEGRATED ENTERPRISE DATA LAKES September 29, 20161
  • 2. MOTIVATION Enterprise Data Management Objective: “Ensure all data is aligned to a common meaning in order to achieve automation in performing complex analytics and generating trusted reports.” Source: 2015 Data Management Industry Benchmark - EDM Council September 29, 20162 In 2015 only 7% of respondents claim to already be using shared and unambiguous definitions of data across the firm and have it accessible as operational metadata. 7%
  • 3. ARCHITECTURE September 29, 20163 Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Corporate Memory Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure
  • 4. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Data Ingestion • Files in the data lake (CSV, XML, Excel) • (relational) Databases
  • 5. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Data Lake • Emerging approach to handle large amounts of data • Cost-effective storage • Data is held in their native formats Good Does not force an up-front integration of the ingested data sets Bad Retaining an overview of disparate data silos in the lake without having a coherent shared view is a challenging issue
  • 6. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Data Warehouses • Existing infrastucture • Typically relational databases
  • 7. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Metadata Layer • Dataset Metadata • Ontologies • Integration Rules
  • 8. ARCHITECTURE Management Accounting Risk Management Regulatory Reporting Treasury MarketingAccounting Inbound Raw Data Store Knowledge Graph for Meta Data, KPI Definition and Data Models Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems Big Data DWH- Infrastructure Graphical User Interface Customer Applications
  • 9. INTEGRATION PROCESS Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop September 29, 20169
  • 10. DATASET MANAGEMENT Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop September 29, 201610
  • 11. DATASET CATALOG • Enables the user to explore and manage datasets in the data lake • Files in the data lake (CSV, XML, Excel) • Databases (Apache Hive or external databases) September 29, 201611
  • 12. MANAGING METADATA • Exploring and editing dataset metadata • Semantic content information, like textual descriptions, tags and related Persons • Technical information and parameters, like formats, data model and encoding • Access information, like access path or URL, source system or API call • Organizational provenance, like organizational units owning or maintaining the dataset September 29, 201612
  • 13. DATASET DISCOVERY Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop September 29, 201613
  • 14. DATASET DISCOVERY • Goal: Augment a dataset with data from related datasets • Automatic discovery of dataset with overlapping information • Explorative interface • Discovery is based on two data parts • Business meta data • Profiling summary September 29, 201614
  • 15. DISCOVERY VIEW • Datasets are matched based on their metadata (profiling + business data) September 29, 201615
  • 16. DATASET PROFILING • Datasets often contain implicit and explicit schema information • Column names, data formats, enumerated values etc. • Example: column contains formatted dates • Idea: Extract a dataset summary • For each column / property the summary contains: 1. Data type (e.g., number, date, industry classification) 2. Data format (e.g., date format) 3. Data statistics (e.g., range, distribution, most frequent values) • Materialized as RDF with UI view September 29, 201616
  • 17. DETECTING DATA TYPES • Detecting common datatypes as well as user-defined types • Common datatypes • Numbers • Dates / Times • Geographic locations (geo-coordinates, states, countries) • User-defined data types can be integrated by adding an ontology / taxonomy • Usually a SKOS taxonomy • Managed as another dataset in the dataset management • Example: Industry taxonomy • Standard taxonomy (NACE, SIC, NAICS) or company specific September 29, 201617
  • 18. FORMATS AND STATISTICS • For some types, the data format is detected • Example: Dates are formatted in DD-MM-YYYY • Two functions are generated: 1. Parser that is able to read the detected representation 2. Normalizer that converts the parsed values into a configurable, organization-wide target representation • Statistics summarize the values: • Value range and distribution • Most frequent values • Data selectivity September 29, 201618
  • 19. DISCOVERY VIEW • Datasets are matched based on their metadata (profiling + business data) September 29, 2016 19
  • 20. INTEGRATION PROCESS Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop September 29, 201620
  • 21. DATA INTEGRATION • The integration process is driven by a set of rules • Lifting Rules map the source datasets to a ontology • Linking Rules connect different datasets to a knowledge graph • Rules are operator trees, consisting of four types of operators • Data Access Operators • Transformation Operators • Similarity Operators • Aggregation Operators • Rules can be learned using genetic programming algorithms • Rules are human understandable and can be edited September 29, 201621
  • 22. DATASET LIFTING • Objective: Map the datasets in the data lake to a consistent vocabulary. • A lifting rule consists of a number of mappings • Each mapping assigns a term in the original data set (such as a column for tabular data) to a term in the target ontology (such as a property provided by an ontology). • Multiple mappings for each dataset can be managed to allow different views on the same data. • Initial mappings are generated automatically based on the profiling results from where the user can continue to build on. September 29, 201622
  • 23. LIFTING EXAMPLE September 29, 201623 Bond ISIN Country Industry NEDWBK CAD 5,2%25 CA639832AA25 Canada Banking SIEMENSF1.50%03/20 DE000A1G85B4 Germany Electrical Equipment Electricite de France (EDF), 6,5% 26jan2019 USF2893TAB29 France Utilities NEDWBK CAD 5,2%25 fibo:hasSecurityIdentifier Utilities Industry Ontology Banking France Country Ontology Germany EMEA “CA639832AA25” fibo:legallyRecordedIn fibo:industrySector
  • 24. LINKING • Goal: Connect individual datasets to a knowledge graph • Identify related entities in different datasets and link them • Either entities describing the same real world object or another relation September 29, 201624 NEDWBK CAD 5,2%25 ratingScore Industry OntologyCountry Ontology EMEA “AAA” fibo:legallyRecordedIn fibo:industrySector Rating CAD 5,2%25 hasRating fibo:industrySector fibo:legallyRecordedIn
  • 25. LINKAGE RULES • Linking is based on domain-specific rules • Specify the conditions that must hold true for two entities to be linked September 29, 201625
  • 26. LEARNING LINKAGE RULES Problem: Manually writing rules is time-consuming and requires expertise Approach: Interactive machine learning algorithm for generating rules • Generates a rule based on a number of user-confirmed link candidates. • Link candidates are actively selected by the learning algorithm to include link candidates that yield a high information gain. • The user does not need any knowledge of the characteristics of the dataset or any particular similarity computation techniques. September 29, 201626
  • 27. INTEGRATION PROCESS Dataset Management •Catalog Datasets •Catalog Ontologies •Manage Metadata Dataset Discovery •Data Profiling •Dataset Exploration Dataset Integration •Dataset Lifting •Dataset Linking •Data Quality Validation Data Access •Domain Specific Consolidated Views •Execution on Hadoop
  • 28. VIEW GENERATION • The user selects a set of lifted and linked datasets September 29, 201628
  • 29. Hadoop Data Lake DATA ACCESS • Generate data flows based on Apache Spark • The data flows utilize Resilient Distributed Datasets (RDDs) • RDDs derive new data sets from existing data sets by applying a chain of transformations • A derived data set can either • be recomputed on-the-fly • persisted on stable storage • Data flows can be executed efficiently on Hadoop clusters. September 29, 201629 Corporate Bonds Data Lifting 1 (Apache Spark RDD) Data Linking (Apache Spark RDD) Internal Ratings Data Lifting 2 (Apache Spark RDD) External Ratings Data Lifting 3 (Apache Spark RDD) eccenca Corporate Memory Data Consumer SQL CSV Excel Spark API
  • 30. DEMO
  • 31. Contact Dr. Robert Isele Tel: +49 151 17238616 email: robert.isele@eccenca.com eccencaCommand your Data!

Notas do Editor

  1. TODO more details on linkage rules or rules in generatl (operators etc.)
  2. - Explain why manually writing a rule is hard?