SlideShare a Scribd company logo
1 of 28
Download to read offline
Amundsen at Brex
Why Amundsen?
(why data catalog at all?
Documentation and discoverability
1. Data scientists and analysts found it difficult to learn what a table is or how and where its used
2. Documentation often doesn’t exist!
Governance
3. We have no unified way to tag and identify sensitive data/PII
The Problems
Documentation and discoverability
1. Data scientists and analysts found it difficult to learn what a table is or how and where its used
a. Single searchable source of documentation
b. Data catalogs can display dependencies between tables/dashboards
2. Documentation often doesn’t exist!
a. The catalog’s existence provides motivation to write documentation
b. The catalog can be used to track down popular but undocumented tables
Governance
3. We have no unified way to tag and identify sensitive data/PII
a. Provides a unified tag database for tags from all integrated data sources
b. Tags can be auto-generated by PII-detection tools, as well as manually added
Data catalog to the rescue
Amundsen is:
● 🙂 Simple - exactly the feature set we needed
● 🎉 Open source =
○ less expensive
○ highly customizable
○ can be integrated with anything
○ part of a community
We chose Amundsen because
❤ open source!
Implementing Amundsen
SSO with Okta
Custom
Looker
integration
K8s pods for each service, running behind a VPN
Airflow ETL DAG
runs once per
day
Neo4J 4.x
We decided not
to let people edit
descriptions
At least, for now)
We wanted documentation to be version
controlled, and we wanted to the code to be the
single source of truth.
Since people work in code a lot of the time, we felt
that it is important for Amundsen to work with code
documentation, not against it.
We intend to revisit this though!
Challenges we faced
Package
Requirements
Issues
We fixed these!
Amundsen Databuilder had a pretty strict
requirements.txt, making it difficult to:
1. Integrate into our existing Airflow codebase.
2. Combine it with other libraries
Eventually we unpinned most of the dependencies,
and contributed this upstream!
Neo4J 4.x
We’d love to fix this upstream!
We decided to use Neo4J 4.x, because:
1. Backups are easier
2. It would give us the opportunity to
potentially switch to a SAAS e.g. neo4j Aura
Amundsen does not support Neo4J 4.x, so we
made a few tweaks!
As a result, we’re currently using some forks.
Fun stuff we did!
Looker Integration
.LookML files in
Github
Models and Views;
SQL queries
Dashboards via
API
Dashboards
reference Models
and Views
Looker’s
Sources of
truth
Amundsen
?Git clone?
Looker Integration
.LookML files in
Github
Models and Views;
SQL queries
Dashboards via
API
Dashboards
reference Models
and Views
Looker’s
Sources of
truth
Amundsen
?Git clone?
Looker Integration
Looker data lineage
extraction code
Docker Image for
Artifact Builder
(in ECR)
Models/Views
In LookML repo
Amundsen Airflow DAG
Combines lookml view data
With API dashboard data
Lineage.json
(in S3)
CICD uses
docker image to
build lineage.json
Docker image
built in CICD
1. Build docker images which convert LookerML files to JSON lineage data
2. Use the Docker images in the source repository’s build pipeline to
create artifacts and upload them to S3
3. Consumers download the
artifacts at run-time, combine with
dashboard data from API
Dashboards from
Looker API
Looker Integration - Stage 1
Model/View files
(In looker Github repo)
Lineage.json
(in S3)
Parse LookML files and
build lineage graph
during CI.
Neo4J
Looker Integration - Stage 2
Model/View files
(In looker Github repo)
Lineage.json
(in S3)
Amundsen Airflow Pipeline
Combines lookml view data
With API dashboard data
Dashboards
(From Looker API)
Neo4J
Parse LookML files and
build lineage graph
during CI.
Query dashboards
using Looker SDK
Download lineage
data from S3
Looker Integration Data lineage -
Analysts love this!
Description
from looker
Looker Integration
Looker lineage
works in reverse
Custom Taggers
Documented/
Undocumented
tags
Category from
RBAC system
Custom Taggers
Customized Snowflake Integration
One query can pull all the metadata
we need from many tables across
multiple databases:
● Columns
● Last update time
● Documentation
● Table usage
What’s next for
Amundsen at Brex?
Collect user feedback
and usage statistics to
drive adoption
and documentation
Use Amundsen for PII
tagging
Contribute upstream 🤞
● Looker integration
● Neo4J 4.x support
● “Enriching Transformers”
● Useful taggers
Thanks for your time!
Any questions?

More Related Content

What's hot

Minería de datos Presentación
Minería de datos PresentaciónMinería de datos Presentación
Minería de datos Presentación
edmaga
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
Rohit Dubey
 

What's hot (20)

Minería de datos Presentación
Minería de datos PresentaciónMinería de datos Presentación
Minería de datos Presentación
 
Augury: Real-Time Insights for the Industrial IoT
Augury: Real-Time Insights for the Industrial IoTAugury: Real-Time Insights for the Industrial IoT
Augury: Real-Time Insights for the Industrial IoT
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data Intelligence
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of Analytics
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
Introduction to data analytics
Introduction to data analyticsIntroduction to data analytics
Introduction to data analytics
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Metaverse In Healthcare Can Be The Next Big Thing? 
Metaverse In Healthcare Can Be The Next Big Thing? Metaverse In Healthcare Can Be The Next Big Thing? 
Metaverse In Healthcare Can Be The Next Big Thing? 
 

Similar to Amundsen at Brex and Looker integration

Similar to Amundsen at Brex and Looker integration (20)

Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 
Angular meteor presentation
Angular meteor presentationAngular meteor presentation
Angular meteor presentation
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
HLoader – Automated Incremental Hadoop Data Loader Service and Framework
HLoader – Automated Incremental Hadoop Data Loader Service and FrameworkHLoader – Automated Incremental Hadoop Data Loader Service and Framework
HLoader – Automated Incremental Hadoop Data Loader Service and Framework
 
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET-  	  Hosting NLP based Chatbot on AWS Cloud using DockerIRJET-  	  Hosting NLP based Chatbot on AWS Cloud using Docker
IRJET- Hosting NLP based Chatbot on AWS Cloud using Docker
 
Linkers in compiler
Linkers in compilerLinkers in compiler
Linkers in compiler
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
Data Engineer's Lunch #54: dbt and Spark
Data Engineer's Lunch #54: dbt and SparkData Engineer's Lunch #54: dbt and Spark
Data Engineer's Lunch #54: dbt and Spark
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
 
OpenProdoc Overview
OpenProdoc OverviewOpenProdoc Overview
OpenProdoc Overview
 
Spring data jpa are used to develop spring applications
Spring data jpa are used to develop spring applicationsSpring data jpa are used to develop spring applications
Spring data jpa are used to develop spring applications
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
 
Microsoft Azure News - Dec 2016
Microsoft Azure News - Dec 2016Microsoft Azure News - Dec 2016
Microsoft Azure News - Dec 2016
 
Creating a SPA blog withAngular and Cloud Firestore
Creating a SPA blog withAngular and Cloud FirestoreCreating a SPA blog withAngular and Cloud Firestore
Creating a SPA blog withAngular and Cloud Firestore
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
ArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudArcReady - Architecting For The Cloud
ArcReady - Architecting For The Cloud
 

More from markgrover

More from markgrover (20)

From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting data
 
Amundsen lineage designs - community meeting, Dec 2020
Amundsen lineage designs - community meeting, Dec 2020 Amundsen lineage designs - community meeting, Dec 2020
Amundsen lineage designs - community meeting, Dec 2020
 
REA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and AmundsenREA Group's journey with Data Cataloging and Amundsen
REA Group's journey with Data Cataloging and Amundsen
 
Amundsen gremlin proxy design
Amundsen gremlin proxy designAmundsen gremlin proxy design
Amundsen gremlin proxy design
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadata
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
TensorFlow Extension (TFX) and Apache Beam
TensorFlow Extension (TFX) and Apache BeamTensorFlow Extension (TFX) and Apache Beam
TensorFlow Extension (TFX) and Apache Beam
 
Big Data at Speed
Big Data at SpeedBig Data at Speed
Big Data at Speed
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyft
 
Fighting cybersecurity threats with Apache Spot
Fighting cybersecurity threats with Apache SpotFighting cybersecurity threats with Apache Spot
Fighting cybersecurity threats with Apache Spot
 
Fraud Detection with Hadoop
Fraud Detection with HadoopFraud Detection with Hadoop
Fraud Detection with Hadoop
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Amundsen at Brex and Looker integration

  • 1.
  • 3. Why Amundsen? (why data catalog at all?
  • 4. Documentation and discoverability 1. Data scientists and analysts found it difficult to learn what a table is or how and where its used 2. Documentation often doesn’t exist! Governance 3. We have no unified way to tag and identify sensitive data/PII The Problems
  • 5. Documentation and discoverability 1. Data scientists and analysts found it difficult to learn what a table is or how and where its used a. Single searchable source of documentation b. Data catalogs can display dependencies between tables/dashboards 2. Documentation often doesn’t exist! a. The catalog’s existence provides motivation to write documentation b. The catalog can be used to track down popular but undocumented tables Governance 3. We have no unified way to tag and identify sensitive data/PII a. Provides a unified tag database for tags from all integrated data sources b. Tags can be auto-generated by PII-detection tools, as well as manually added Data catalog to the rescue
  • 6. Amundsen is: ● 🙂 Simple - exactly the feature set we needed ● 🎉 Open source = ○ less expensive ○ highly customizable ○ can be integrated with anything ○ part of a community We chose Amundsen because ❤ open source!
  • 8. SSO with Okta Custom Looker integration K8s pods for each service, running behind a VPN Airflow ETL DAG runs once per day Neo4J 4.x
  • 9. We decided not to let people edit descriptions At least, for now) We wanted documentation to be version controlled, and we wanted to the code to be the single source of truth. Since people work in code a lot of the time, we felt that it is important for Amundsen to work with code documentation, not against it. We intend to revisit this though!
  • 11. Package Requirements Issues We fixed these! Amundsen Databuilder had a pretty strict requirements.txt, making it difficult to: 1. Integrate into our existing Airflow codebase. 2. Combine it with other libraries Eventually we unpinned most of the dependencies, and contributed this upstream!
  • 12. Neo4J 4.x We’d love to fix this upstream! We decided to use Neo4J 4.x, because: 1. Backups are easier 2. It would give us the opportunity to potentially switch to a SAAS e.g. neo4j Aura Amundsen does not support Neo4J 4.x, so we made a few tweaks! As a result, we’re currently using some forks.
  • 13. Fun stuff we did!
  • 14. Looker Integration .LookML files in Github Models and Views; SQL queries Dashboards via API Dashboards reference Models and Views Looker’s Sources of truth Amundsen ?Git clone?
  • 15. Looker Integration .LookML files in Github Models and Views; SQL queries Dashboards via API Dashboards reference Models and Views Looker’s Sources of truth Amundsen ?Git clone?
  • 16. Looker Integration Looker data lineage extraction code Docker Image for Artifact Builder (in ECR) Models/Views In LookML repo Amundsen Airflow DAG Combines lookml view data With API dashboard data Lineage.json (in S3) CICD uses docker image to build lineage.json Docker image built in CICD 1. Build docker images which convert LookerML files to JSON lineage data 2. Use the Docker images in the source repository’s build pipeline to create artifacts and upload them to S3 3. Consumers download the artifacts at run-time, combine with dashboard data from API Dashboards from Looker API
  • 17. Looker Integration - Stage 1 Model/View files (In looker Github repo) Lineage.json (in S3) Parse LookML files and build lineage graph during CI. Neo4J
  • 18. Looker Integration - Stage 2 Model/View files (In looker Github repo) Lineage.json (in S3) Amundsen Airflow Pipeline Combines lookml view data With API dashboard data Dashboards (From Looker API) Neo4J Parse LookML files and build lineage graph during CI. Query dashboards using Looker SDK Download lineage data from S3
  • 19. Looker Integration Data lineage - Analysts love this! Description from looker
  • 23. Customized Snowflake Integration One query can pull all the metadata we need from many tables across multiple databases: ● Columns ● Last update time ● Documentation ● Table usage
  • 25. Collect user feedback and usage statistics to drive adoption and documentation
  • 26. Use Amundsen for PII tagging
  • 27. Contribute upstream 🤞 ● Looker integration ● Neo4J 4.x support ● “Enriching Transformers” ● Useful taggers
  • 28. Thanks for your time! Any questions?