Amundsen at Brex and Looker integration

Why Amundsen?
(why data catalog at all?

Documentation and discoverability
1. Data scientists and analysts found it difficult to learn what a table is or how and where its used
2. Documentation often doesn’t exist!
Governance
3. We have no unified way to tag and identify sensitive data/PII
The Problems

Documentation and discoverability
1. Data scientists and analysts found it difficult to learn what a table is or how and where its used
a. Single searchable source of documentation
b. Data catalogs can display dependencies between tables/dashboards
2. Documentation often doesn’t exist!
a. The catalog’s existence provides motivation to write documentation
b. The catalog can be used to track down popular but undocumented tables
Governance
3. We have no unified way to tag and identify sensitive data/PII
a. Provides a unified tag database for tags from all integrated data sources
b. Tags can be auto-generated by PII-detection tools, as well as manually added
Data catalog to the rescue

Amundsen is:
● 🙂 Simple - exactly the feature set we needed
● 🎉 Open source =
○ less expensive
○ highly customizable
○ can be integrated with anything
○ part of a community
We chose Amundsen because
❤ open source!

SSO with Okta
Custom
Looker
integration
K8s pods for each service, running behind a VPN
Airflow ETL DAG
runs once per
day
Neo4J 4.x

We decided not
to let people edit
descriptions
At least, for now)
We wanted documentation to be version
controlled, and we wanted to the code to be the
single source of truth.
Since people work in code a lot of the time, we felt
that it is important for Amundsen to work with code
documentation, not against it.
We intend to revisit this though!

Package
Requirements
Issues
We fixed these!
Amundsen Databuilder had a pretty strict
requirements.txt, making it difficult to:
1. Integrate into our existing Airflow codebase.
2. Combine it with other libraries
Eventually we unpinned most of the dependencies,
and contributed this upstream!

Neo4J 4.x
We’d love to fix this upstream!
We decided to use Neo4J 4.x, because:
1. Backups are easier
2. It would give us the opportunity to
potentially switch to a SAAS e.g. neo4j Aura
Amundsen does not support Neo4J 4.x, so we
made a few tweaks!
As a result, we’re currently using some forks.

Looker Integration
.LookML files in
Github
Models and Views;
SQL queries
Dashboards via
API
Dashboards
reference Models
and Views
Looker’s
Sources of
truth
Amundsen
?Git clone?

Looker Integration
Looker data lineage
extraction code
Docker Image for
Artifact Builder
(in ECR)
Models/Views
In LookML repo
Amundsen Airflow DAG
Combines lookml view data
With API dashboard data
Lineage.json
(in S3)
CICD uses
docker image to
build lineage.json
Docker image
built in CICD
1. Build docker images which convert LookerML files to JSON lineage data
2. Use the Docker images in the source repository’s build pipeline to
create artifacts and upload them to S3
3. Consumers download the
artifacts at run-time, combine with
dashboard data from API
Dashboards from
Looker API

Looker Integration - Stage 1
Model/View files
(In looker Github repo)
Lineage.json
(in S3)
Parse LookML files and
build lineage graph
during CI.
Neo4J

Looker Integration - Stage 2
Model/View files
(In looker Github repo)
Lineage.json
(in S3)
Amundsen Airflow Pipeline
Combines lookml view data
With API dashboard data
Dashboards
(From Looker API)
Neo4J
Parse LookML files and
build lineage graph
during CI.
Query dashboards
using Looker SDK
Download lineage
data from S3

Looker Integration Data lineage -
Analysts love this!
Description
from looker

Looker Integration
Looker lineage
works in reverse

Custom Taggers
Documented/
Undocumented
tags
Category from
RBAC system

Customized Snowflake Integration
One query can pull all the metadata
we need from many tables across
multiple databases:
● Columns
● Last update time
● Documentation
● Table usage

What’s next for
Amundsen at Brex?

Collect user feedback
and usage statistics to
drive adoption
and documentation

Contribute upstream 🤞
● Looker integration
● Neo4J 4.x support
● “Enriching Transformers”
● Useful taggers

Thanks for your time!
Any questions?

Amundsen at Brex and Looker integration

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Amundsen at Brex and Looker integration

Similar to Amundsen at Brex and Looker integration (20)

More from markgrover

More from markgrover (20)

Recently uploaded

Recently uploaded (20)

Amundsen at Brex and Looker integration