SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
FREE TO SHARE
Sheetal Pratik - Saxo Bank
August 2020
Data Workbench
About Saxo
We are leading fintech and regtech
specialists, connecting traders, investors and
partners to more than 35,000 instruments –
across all asset classes – from a single
account.
What we do
We build digital platforms to facilitate multi-
asset market access and provide clients of all
sizes with professional-grade tools, industry-
leading prices and best-in-class service.
Data ForScale: Transforming DataAccess,DataGovernance and DataQuality 2
• A data driven organization need to have multi-level Data Governance. Most of the tools are designed to fix the
fact e.g. before a data warehouse load. What is needed is to ensure data integrity at the origin to prevent the
“butterfly effect” in the downstream systems.
• The article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, clearly emphasizes on how
data platform with a centralized architecture can lead to failures by being bottleneck at certain point and have
impact to stability. Also with ownership of data at the domain level, it becomes a failed attempt to manage the
data dictionary centrally or duplicate the effort of creating and maintaining such data assets.
• Considering this, it is imperative that the solution has to be more futuristic and a straight implementation of any
of the COTS products for Data Cataloguing might not be the right answer to Saxo’s Data Governance
implementation.
• The preferred strategy for tooling is to fix forward rather than attempting to fix the past by using some kind of
crawler and using ML to extract the metadata from various data sources.
DATA GOVERNANCE FOR A DIGITAL NATIVE
3
For DomainTeams
Who need visibility onthe availability,meaning, usage,ownershipand quality of data
The Data Workbench(Owner's pride Neighbor's envy)
Is a one-stopdata shop
That provides transparency of Saxo’s data ecosystem
Unlike our current state whichis becomingincreasingly complex as we grow
Our product will help Saxo to improve time to market andunlock new insights.
The Data Workbench is designedto be part of the new data architecture.It consists oftwomaincomponentsa Data Catalogue anda Data
QualitySolution.
1. The Data Catalogue captures andexposes metadata. This provides transparency into the meaningand ownershipof ourdata. The Data
Catalogue is built on DataHub a data catalogue open-sourcedby LinkedIn. LinkedInis very supportive and are workingclosely withus
helping withthe adoptionof the tool.
2. The Data Quality Solutionis built onthe opensource solution Great Expectations supportedbySuperConductive. Great Expectations is
a declarative,flexible,andextensible data quality solution. It allows teams to define dataquality rules and actively monitor the quality
of their data.
VISION
4
• Federated Data Governance model is an industry trend where the enterprise governance team facilitates the
monitoring and management of the quality of enterprise critical data, with assistance from the business unit.
• LinkedIn’s journey of its shift of approach from initial version of Data Governance solution called WhereHows to
DataHub , is a typical example of the paradigm shift from “a central metadata repository” solution to a more
decentralized architecture that puts domains before anything else to support the possibility of self-service data
platform.
• We realized that a practical way of implementation would be to stay lean and agile and iteratively work with data
domains while establishing the Data Governance framework and thus create a platform that is self-serviced, scalable
and more relevant to stakeholders.
• We had a discussion with LinkedIn to understand their journey, learnings and lessons learnt that motivated them to
evolve from WhereHows to Datahub. We acknowledged that, Saxo Bank is on a similar journey and we can fast
forward the implementation by adopting Datahub open sources that best relates to the ecosystem of Saxo Bank.
• The LinkedIn datahub team has been extremely responsive.
• Other digital natives have also recognized that the incumbent solutions are not fit for the modern age and have
built their specific solutions.
MOTIVATION FOR THE SOLUTION
5
PERSONAS: GOALS AND PAIN POINTSGoalPainPoints
Data Asset
Owner
Data
Steward
Data Governance
Committee Member
Data Scientist Data Consumer
(Reporting)
To find data, its owner of
data and anything else
that helps compliance
To solve business
problems based on
data
To get an overview
of whole data in
the org
To define and
document data
standards
To be responsible
and accountable
for data
Lack of clarity on
ownership of data
New role in Saxo
Bank, so yet to
own full
responsibility
Lack of clarity on
what to mandate
at what level
(federated or data
domain level)
Data missing/
incomplete
See who can
explain data
elements
Not sure if the
data can be
trusted for making
the right decisions
6
DATA MESH APPROACH - PRODUCT THINKING
7
DIRECT AGENT/
FLEET ENGAGEMENT
SPEED OF
DELIVERY
Domain
Polyglot
Output Data
Ports
Polyglot Data
Input Ports
Control
Ports
Logs,
metrics
Self-serve
description
Discoverable
Addressable
Self-describing
Trustworthy
Interoperable
7
TARGET METADATA MODEL
8
STREAM PROCESSING
DATA PLATFORM - HIGH LEVEL OVERVIEW
9
ConsumptionDomain
Data Source
Business
Capabilities
Business
Capabilities
Reports
Trading
Instruments
Prices
Customers/Parti
es
Product
RX
Data Workbench
BUSINESS EVENTS
STREAM PROCESSING
(Enrich/Transform/Aggregate)
DATA STORAGE& MANAGEMENT
DW
DL Storage
DATA PRODUCTS
Native Data Products Domain Data
Products
Aggregated Data
Products
Fit for Purpose Data
Products
Confluent Kafka Platform
Data Catalog:
DATAHUB
DATA QUALITY
Great
Expectations:
OTHER PROCESSING FRAMEWORKS
● SAXO features list
● SAXO initial evaluation params
● TW extended feature list
● Product documentation
● Software: Local installations
● Vendor questionnaire
● Includes initial & shortlisted list
● Data Catalog
● Data Quality
Evaluation Criteria Definition Shortlisted tools Evaluation Process
TOOLS EVALUATION PROCESS
10
11
DATA CATALOG TOOL EVALUATION
11
Prioritized Feature List
● Full Text search on dataset
name, attributes and tags
● Extensiblesearch model
Metadata Search
● Web-based UI to show
metadata, governance
attributes, tags and lineage
● Ability to edit and enrich
attributes
Metadata UI
● Push-based REST API
● Pull-based adapters for
Snowflake and CRM dynamics
● Extensibility
Metadata Ingestion
● Metadata entity for datasets,
its users and attributes
● Business glossary &
documentation
● Extensibility
Metadata Modelling
● Dataset lineagewith upstream
and downstream provenance
● Integration with data
processing/orchestration tools
Data Lineage
● Support for metadata
enrichment and tagging
● Ability to flaga dataset
Data Stewardship
● Shows related Quality Attributes in
UI
● Extensibleto integrate with any DQ
tool
Data Quality Integration
● Cloud-native(Scalable& High
Availability)
● Configurable
● Extensible
Architecture
● Data as a Product
● Distributed Domain Driven
Architecture
● Self-serviceplatform
Alignment with Data Mesh
● Authentication / LDAP
● Authorization / RBAC
Security ● Metadata Versioning
● Data Virtualization
● ML/AI capabilities
Deprioritized
● Export API
Metadata Export
● LicensingCost
● Customization / Development cost
Total Cost of Ownership
● Release cycle
● Community support
● Commercial Support
● Documentation
Support
11
TOOLS LANDSCAPE
Data Catalog
Collibra
Informatica EDC
Alatian
Data.World
Azure Data Catalog - Prev2
Zeenea
Apache Atlas
Linkedin DataHub
Amundsen
Marquez
Commercial
Open Source
In House
12
TOOLS LANDSCAPE
Data Catalog
Collibra
Informatica EDC
Alatian
Data.World
Azure Data Catalog - Prev2
Zeenea
Apache Atlas
Linkedin DataHub
Amundsen
Marquez
Commercial
Open Source
In House
13
DATA CATALOG TOOL EVALUATION
Deep-dive analysis of the capabilities of shortlisted tools purely as per the teams understaning in Saxo’s context.
Datahub Marquez Amundsen Collibra Zeenea
Metadata Search *
Metadata UI Editable
Metadata Ingestion *
Metadata Modelling *
Data Lineage *
Metadata Export *
Data Stewardship
Data Quality Integration
Architecture * Only supports
AWS
Security *
Alignment with Data Mesh *
Support
Total Cost of Ownership *
Completelysuitable
Partiallysuitable
No/Minimal suitability
Commercial
Open Source
• Push Based approach that supports Event Driven
architecture. The solutionbuilt onthe principle of
self-service andproducers know theirdata better
and they canprovide the rich metadata so that it
helps in discover the data-assets andencourages
consumption.
• DetailedEvaluationwascarriedout fromSaxo
perspective.
• Possibility of evolution
• Extensibility with the open sourceand evolve
the toolas per needs. Right fit from feature
perspectivein terms of Data Governance
maturity of the organization.
• Leverage from larger community needs and
also influence internal process when needed.
• Reputation of LinkedIn, their success in Kafka
and datahub scaling internally to LinkedIn
volume
• Promising Roadmap 14
**Disclaimer**:Based on Saxo evaluation criteriaand interpretationofproduct capabilities
DATASET ONBOARDING - RESPONSIBILITIES
Metadata
● Dataset/Data Product Metadata
○ Ownership Information
○ Reader Information
○ Topic Configuration Details
○ Dataset Structure (AVRO Schema)
○ Business Term mapping
○ Source Dataset Definition (Optional)
● Quality Rules
Data Engineering
● Domain Transformations
● (Kafka Stream)
PRODUCERS
Metadata
● Consumer Details
● Usage Details
● Target Dataset details
CONSUMERS
Engineering Capabilities
● Supporting New Domains
● Metadata Integration
● DQ Integration
Kafka PLATFORM-Lean Team
15
Thank You
Q&A?
16

Mais conteúdo relacionado

Mais procurados

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
IICS_Capabilities.pptx
IICS_Capabilities.pptxIICS_Capabilities.pptx
IICS_Capabilities.pptxNandan Kumar
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceDATAVERSITY
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Data catalog
Data catalogData catalog
Data catalogiamtodor
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Differencejeetendra mandal
 

Mais procurados (20)

Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
IICS_Capabilities.pptx
IICS_Capabilities.pptxIICS_Capabilities.pptx
IICS_Capabilities.pptx
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Data Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-ServiceData Catalogues - Architecting for Collaboration & Self-Service
Data Catalogues - Architecting for Collaboration & Self-Service
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Data catalog
Data catalogData catalog
Data catalog
 
Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Batch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing DifferenceBatch Processing vs Stream Processing Difference
Batch Processing vs Stream Processing Difference
 

Semelhante a LinkedInSaxoBankDataWorkbench

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta DataDigikrit
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Denodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationDenodo
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricNathan Bijnens
 

Semelhante a LinkedInSaxoBankDataWorkbench (20)

5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 

Último

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Último (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

LinkedInSaxoBankDataWorkbench

  • 1. FREE TO SHARE Sheetal Pratik - Saxo Bank August 2020 Data Workbench
  • 2. About Saxo We are leading fintech and regtech specialists, connecting traders, investors and partners to more than 35,000 instruments – across all asset classes – from a single account. What we do We build digital platforms to facilitate multi- asset market access and provide clients of all sizes with professional-grade tools, industry- leading prices and best-in-class service. Data ForScale: Transforming DataAccess,DataGovernance and DataQuality 2
  • 3. • A data driven organization need to have multi-level Data Governance. Most of the tools are designed to fix the fact e.g. before a data warehouse load. What is needed is to ensure data integrity at the origin to prevent the “butterfly effect” in the downstream systems. • The article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh”, clearly emphasizes on how data platform with a centralized architecture can lead to failures by being bottleneck at certain point and have impact to stability. Also with ownership of data at the domain level, it becomes a failed attempt to manage the data dictionary centrally or duplicate the effort of creating and maintaining such data assets. • Considering this, it is imperative that the solution has to be more futuristic and a straight implementation of any of the COTS products for Data Cataloguing might not be the right answer to Saxo’s Data Governance implementation. • The preferred strategy for tooling is to fix forward rather than attempting to fix the past by using some kind of crawler and using ML to extract the metadata from various data sources. DATA GOVERNANCE FOR A DIGITAL NATIVE 3
  • 4. For DomainTeams Who need visibility onthe availability,meaning, usage,ownershipand quality of data The Data Workbench(Owner's pride Neighbor's envy) Is a one-stopdata shop That provides transparency of Saxo’s data ecosystem Unlike our current state whichis becomingincreasingly complex as we grow Our product will help Saxo to improve time to market andunlock new insights. The Data Workbench is designedto be part of the new data architecture.It consists oftwomaincomponentsa Data Catalogue anda Data QualitySolution. 1. The Data Catalogue captures andexposes metadata. This provides transparency into the meaningand ownershipof ourdata. The Data Catalogue is built on DataHub a data catalogue open-sourcedby LinkedIn. LinkedInis very supportive and are workingclosely withus helping withthe adoptionof the tool. 2. The Data Quality Solutionis built onthe opensource solution Great Expectations supportedbySuperConductive. Great Expectations is a declarative,flexible,andextensible data quality solution. It allows teams to define dataquality rules and actively monitor the quality of their data. VISION 4
  • 5. • Federated Data Governance model is an industry trend where the enterprise governance team facilitates the monitoring and management of the quality of enterprise critical data, with assistance from the business unit. • LinkedIn’s journey of its shift of approach from initial version of Data Governance solution called WhereHows to DataHub , is a typical example of the paradigm shift from “a central metadata repository” solution to a more decentralized architecture that puts domains before anything else to support the possibility of self-service data platform. • We realized that a practical way of implementation would be to stay lean and agile and iteratively work with data domains while establishing the Data Governance framework and thus create a platform that is self-serviced, scalable and more relevant to stakeholders. • We had a discussion with LinkedIn to understand their journey, learnings and lessons learnt that motivated them to evolve from WhereHows to Datahub. We acknowledged that, Saxo Bank is on a similar journey and we can fast forward the implementation by adopting Datahub open sources that best relates to the ecosystem of Saxo Bank. • The LinkedIn datahub team has been extremely responsive. • Other digital natives have also recognized that the incumbent solutions are not fit for the modern age and have built their specific solutions. MOTIVATION FOR THE SOLUTION 5
  • 6. PERSONAS: GOALS AND PAIN POINTSGoalPainPoints Data Asset Owner Data Steward Data Governance Committee Member Data Scientist Data Consumer (Reporting) To find data, its owner of data and anything else that helps compliance To solve business problems based on data To get an overview of whole data in the org To define and document data standards To be responsible and accountable for data Lack of clarity on ownership of data New role in Saxo Bank, so yet to own full responsibility Lack of clarity on what to mandate at what level (federated or data domain level) Data missing/ incomplete See who can explain data elements Not sure if the data can be trusted for making the right decisions 6
  • 7. DATA MESH APPROACH - PRODUCT THINKING 7 DIRECT AGENT/ FLEET ENGAGEMENT SPEED OF DELIVERY Domain Polyglot Output Data Ports Polyglot Data Input Ports Control Ports Logs, metrics Self-serve description Discoverable Addressable Self-describing Trustworthy Interoperable 7
  • 9. STREAM PROCESSING DATA PLATFORM - HIGH LEVEL OVERVIEW 9 ConsumptionDomain Data Source Business Capabilities Business Capabilities Reports Trading Instruments Prices Customers/Parti es Product RX Data Workbench BUSINESS EVENTS STREAM PROCESSING (Enrich/Transform/Aggregate) DATA STORAGE& MANAGEMENT DW DL Storage DATA PRODUCTS Native Data Products Domain Data Products Aggregated Data Products Fit for Purpose Data Products Confluent Kafka Platform Data Catalog: DATAHUB DATA QUALITY Great Expectations: OTHER PROCESSING FRAMEWORKS
  • 10. ● SAXO features list ● SAXO initial evaluation params ● TW extended feature list ● Product documentation ● Software: Local installations ● Vendor questionnaire ● Includes initial & shortlisted list ● Data Catalog ● Data Quality Evaluation Criteria Definition Shortlisted tools Evaluation Process TOOLS EVALUATION PROCESS 10
  • 11. 11 DATA CATALOG TOOL EVALUATION 11 Prioritized Feature List ● Full Text search on dataset name, attributes and tags ● Extensiblesearch model Metadata Search ● Web-based UI to show metadata, governance attributes, tags and lineage ● Ability to edit and enrich attributes Metadata UI ● Push-based REST API ● Pull-based adapters for Snowflake and CRM dynamics ● Extensibility Metadata Ingestion ● Metadata entity for datasets, its users and attributes ● Business glossary & documentation ● Extensibility Metadata Modelling ● Dataset lineagewith upstream and downstream provenance ● Integration with data processing/orchestration tools Data Lineage ● Support for metadata enrichment and tagging ● Ability to flaga dataset Data Stewardship ● Shows related Quality Attributes in UI ● Extensibleto integrate with any DQ tool Data Quality Integration ● Cloud-native(Scalable& High Availability) ● Configurable ● Extensible Architecture ● Data as a Product ● Distributed Domain Driven Architecture ● Self-serviceplatform Alignment with Data Mesh ● Authentication / LDAP ● Authorization / RBAC Security ● Metadata Versioning ● Data Virtualization ● ML/AI capabilities Deprioritized ● Export API Metadata Export ● LicensingCost ● Customization / Development cost Total Cost of Ownership ● Release cycle ● Community support ● Commercial Support ● Documentation Support 11
  • 12. TOOLS LANDSCAPE Data Catalog Collibra Informatica EDC Alatian Data.World Azure Data Catalog - Prev2 Zeenea Apache Atlas Linkedin DataHub Amundsen Marquez Commercial Open Source In House 12
  • 13. TOOLS LANDSCAPE Data Catalog Collibra Informatica EDC Alatian Data.World Azure Data Catalog - Prev2 Zeenea Apache Atlas Linkedin DataHub Amundsen Marquez Commercial Open Source In House 13
  • 14. DATA CATALOG TOOL EVALUATION Deep-dive analysis of the capabilities of shortlisted tools purely as per the teams understaning in Saxo’s context. Datahub Marquez Amundsen Collibra Zeenea Metadata Search * Metadata UI Editable Metadata Ingestion * Metadata Modelling * Data Lineage * Metadata Export * Data Stewardship Data Quality Integration Architecture * Only supports AWS Security * Alignment with Data Mesh * Support Total Cost of Ownership * Completelysuitable Partiallysuitable No/Minimal suitability Commercial Open Source • Push Based approach that supports Event Driven architecture. The solutionbuilt onthe principle of self-service andproducers know theirdata better and they canprovide the rich metadata so that it helps in discover the data-assets andencourages consumption. • DetailedEvaluationwascarriedout fromSaxo perspective. • Possibility of evolution • Extensibility with the open sourceand evolve the toolas per needs. Right fit from feature perspectivein terms of Data Governance maturity of the organization. • Leverage from larger community needs and also influence internal process when needed. • Reputation of LinkedIn, their success in Kafka and datahub scaling internally to LinkedIn volume • Promising Roadmap 14 **Disclaimer**:Based on Saxo evaluation criteriaand interpretationofproduct capabilities
  • 15. DATASET ONBOARDING - RESPONSIBILITIES Metadata ● Dataset/Data Product Metadata ○ Ownership Information ○ Reader Information ○ Topic Configuration Details ○ Dataset Structure (AVRO Schema) ○ Business Term mapping ○ Source Dataset Definition (Optional) ● Quality Rules Data Engineering ● Domain Transformations ● (Kafka Stream) PRODUCERS Metadata ● Consumer Details ● Usage Details ● Target Dataset details CONSUMERS Engineering Capabilities ● Supporting New Domains ● Metadata Integration ● DQ Integration Kafka PLATFORM-Lean Team 15