Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake

Databricks
DatabricksDeveloper Marketing and Relations at MuleSoft em Databricks
We Connect Active People With Their Passions
COLUMBIA SPORTSWEAR – SPARK + AI SUMMIT 2020
JUNE 2020
LARAMINOR AND BILAL OBEIDAT
2
Enterprise Information Management’s vision is to connect the RIGHT PEOPLE with the RIGHT
DATA at the RIGHT TIME to support informed business decisions.
Increase Columbia’s ability to:
• Be data driven in support of business strategy & operations
• Deliver enterprise data assets that meet global information needs
• Scale, share and grow using governed data, aligned processes and shared products
WHAT IS EIM?
VISION
3
Technology
• Azure Data Management Stack:
–Azure Data Factory
–Azure Data Lake
–Azure Databricks
–Azure Synapse Data Warehouse
• SAP BW / Hana
DATA DELIVERY
abc
Development
• Integration:
–Columbia source systemsinto Azure
–Integration to 3rd party analytic
systems and applications
–Partnered with Columbia’s integration
team
• Data Models:
–Relational and dimensional models for
business reporting and analytics
–Data models for data science /
analytics
4
WHERE WE STARTED
abc
5
6
BUSINESS ACCESS
abc
Info Consumer
Business users
(4k+)
Data Analyst
<15
Info Consumer
Business SME’s
50 -100
Databricks
/DataLake
Data
Warehouse
Azure Analysis
Services
PowerBI
Internal / Open Restricted
Internal / Select Restricted
Internal / Select Restricted
7
DATA LAKE LAYOUT AND SECURITY
abc
Raw source
Internal
• Source system name
• Object Name (table
name)
• Type (full, incremental)
• Partition (date)
Restricted_domain
• Source system name
• Object Name (table
name)
• Type (full, incremental)
• Partition (date)
Curated
Internal
• Data Domain (sales)
• Schema
• Table name
Restricted_domain
• Schema
• Table name
Computed(Analysts
directory)
Dtc_restricted
• Analyst determine
8
9
• For bringing in data from source systems, what used to take a week takes a
day
• All computed data on the lake and available for use. Microservice drops to
the lake for real time reporting through Databricks Streaming.
• Databricks external metastoreallows for sharing with EIM
• Everything we do is through CICD integration
• Cloud based and elastic, speed/scalability enabled growth and efficient
data processing at low cost
• Expanded data access for business, self serve reporting, business analysis
• Prepped for data science resourcesto engage
• Easy team expansion and onboarding, we’ve increased dev team from 8 to
20 in a 1.5 years
POSITIVE OUTCOMES
abc
10
General
• Security groups
• Security Model
• Security Audit
• Costs
Vendor Engagement
• Leverage vendors, but know there’s
a limit
• Professional services
• Vendor experts and agreements
LESSONS LEARNED
abc
Data Lake
• Security
• Organization, Enterprise
• Audit, Monitoring
• Backup,DR
Team
• A solid leader or two
• Keep the team open to change, open to
chaos
• Allocate time for discovery
• Managing expectationswith senior
leadership
Questions?
11
Confidential
1 de 11

Recomendados

Building a Federated Data Directory Platform for Public Health por
Building a Federated Data Directory Platform for Public HealthBuilding a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public HealthDatabricks
568 visualizações34 slides
Architect’s Open-Source Guide for a Data Mesh Architecture por
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
3.1K visualizações48 slides
IBM Cloud Day January 2021 - A well architected data lake por
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
76 visualizações23 slides
Architecting a datalake por
Architecting a datalakeArchitecting a datalake
Architecting a datalakeLaurent Leturgez
332 visualizações22 slides
Modern data warehouse por
Modern data warehouseModern data warehouse
Modern data warehouseRakesh Jayaram
241 visualizações24 slides
Redash: Open Source SQL Analytics on Data Lakes por
Redash: Open Source SQL Analytics on Data LakesRedash: Open Source SQL Analytics on Data Lakes
Redash: Open Source SQL Analytics on Data LakesDatabricks
514 visualizações14 slides

Mais conteúdo relacionado

Mais procurados

IBM Cloud Native Day April 2021: Serverless Data Lake por
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach
105 visualizações27 slides
Building Robust Production Data Pipelines with Databricks Delta por
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
1.3K visualizações12 slides
Making Data Timelier and More Reliable with Lakehouse Technology por
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
1.6K visualizações44 slides
Suburface 2021 IBM Cloud Data Lake por
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeTorsten Steinbach
111 visualizações30 slides
The Hidden Value of Hadoop Migration por
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
612 visualizações23 slides
Harnessing the Power of Apache Hadoop por
Harnessing the Power of Apache Hadoop Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop Cloudera, Inc.
1.2K visualizações15 slides

Mais procurados(20)

IBM Cloud Native Day April 2021: Serverless Data Lake por Torsten Steinbach
IBM Cloud Native Day April 2021: Serverless Data LakeIBM Cloud Native Day April 2021: Serverless Data Lake
IBM Cloud Native Day April 2021: Serverless Data Lake
Torsten Steinbach105 visualizações
Building Robust Production Data Pipelines with Databricks Delta por Databricks
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
Databricks1.3K visualizações
Making Data Timelier and More Reliable with Lakehouse Technology por Matei Zaharia
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia1.6K visualizações
Suburface 2021 IBM Cloud Data Lake por Torsten Steinbach
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
Torsten Steinbach111 visualizações
The Hidden Value of Hadoop Migration por Databricks
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
Databricks612 visualizações
Harnessing the Power of Apache Hadoop por Cloudera, Inc.
Harnessing the Power of Apache Hadoop Harnessing the Power of Apache Hadoop
Harnessing the Power of Apache Hadoop
Cloudera, Inc.1.2K visualizações
Verizon Centralizes Data into a Data Lake in Real Time for Analytics por DataWorks Summit
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
DataWorks Summit1K visualizações
IBM Cloud Day January 2021 Data Lake Deep Dive por Torsten Steinbach
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
Torsten Steinbach109 visualizações
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a... por Databricks
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Databricks748 visualizações
Modernizing to a Cloud Data Architecture por Databricks
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks656 visualizações
Building Custom Big Data Integrations por Pat Patterson
Building Custom Big Data IntegrationsBuilding Custom Big Data Integrations
Building Custom Big Data Integrations
Pat Patterson642 visualizações
Spark - Migration Story por Roman Chukh
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
Roman Chukh1.3K visualizações
Building the Data Lake with Azure Data Factory and Data Lake Analytics por Khalid Salama
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Khalid Salama13K visualizações
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole por Vasu S
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Vasu S90 visualizações
Moving to Databricks & Delta por Databricks
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks891 visualizações
Owning Your Own (Data) Lake House por Data Con LA
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
Data Con LA339 visualizações
Migrate and Modernize Hadoop-Based Security Policies for Databricks por Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Databricks415 visualizações
Intro to databricks delta lake por Mykola Zerniuk
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
Mykola Zerniuk316 visualizações
Building Data Lakes with Apache Airflow por Gary Stafford
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
Gary Stafford240 visualizações

Similar a Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake

ICP for Data- Enterprise platform for AI, ML and Data Science por
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
2.1K visualizações31 slides
Building a Modern Analytic Database with Cloudera 5.8 por
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
2.4K visualizações28 slides
A journey to faster, repeatable data commercialization por
A journey to faster, repeatable data commercializationA journey to faster, repeatable data commercialization
A journey to faster, repeatable data commercializationInstitute of Contemporary Sciences
315 visualizações29 slides
ADV Slides: Building and Growing Organizational Analytics with Data Lakes por
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
567 visualizações30 slides
2022 Trends in Enterprise Analytics por
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
511 visualizações36 slides
Liberate Legacy Data Sources with Precisely and Databricks por
Liberate Legacy Data Sources with Precisely and DatabricksLiberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and DatabricksPrecisely
100 visualizações19 slides

Similar a Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake(20)

ICP for Data- Enterprise platform for AI, ML and Data Science por Karan Sachdeva
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
Karan Sachdeva2.1K visualizações
Building a Modern Analytic Database with Cloudera 5.8 por Cloudera, Inc.
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
Cloudera, Inc.2.4K visualizações
ADV Slides: Building and Growing Organizational Analytics with Data Lakes por DATAVERSITY
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY567 visualizações
2022 Trends in Enterprise Analytics por DATAVERSITY
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY511 visualizações
Liberate Legacy Data Sources with Precisely and Databricks por Precisely
Liberate Legacy Data Sources with Precisely and DatabricksLiberate Legacy Data Sources with Precisely and Databricks
Liberate Legacy Data Sources with Precisely and Databricks
Precisely100 visualizações
When and How Data Lakes Fit into a Modern Data Architecture por DATAVERSITY
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY686 visualizações
Ibm db2update2019 icp4 data por Gustav Lundström
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
Gustav Lundström476 visualizações
CDS Overview (May 2015) por Karim Lalji
CDS Overview (May 2015)CDS Overview (May 2015)
CDS Overview (May 2015)
Karim Lalji253 visualizações
Informatica Cloud Summer 2016 Release Webinar Slides por Informatica Cloud
Informatica Cloud Summer 2016 Release Webinar SlidesInformatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud Summer 2016 Release Webinar Slides
Informatica Cloud2K visualizações
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera por MongoDB
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB1.9K visualizações
Data Architecture Strategies: Data Architecture for Digital Transformation por DATAVERSITY
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY1.6K visualizações
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote por Caserta
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Caserta 1.6K visualizações
Data & Analytics with CIS & Microsoft Platforms por Sonata Software
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
Sonata Software645 visualizações
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for... por Precisely
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
Precisely717 visualizações
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics por Cynthia Saracco
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Cynthia Saracco1.3K visualizações
A Key to Real-time Insights in a Post-COVID World (ASEAN) por Denodo
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
Denodo 113 visualizações
Modern Data Architectures for Business Outcomes por Amazon Web Services
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services777 visualizações
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx por Amazon Web Services
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxTrack 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Amazon Web Services272 visualizações

Mais de Databricks

DW Migration Webinar-March 2022.pptx por
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
4.3K visualizações25 slides
Data Lakehouse Symposium | Day 1 | Part 1 por
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
1.5K visualizações43 slides
Data Lakehouse Symposium | Day 1 | Part 2 por
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
743 visualizações16 slides
Data Lakehouse Symposium | Day 4 por
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
1.8K visualizações74 slides
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop por
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
6.3K visualizações64 slides
Democratizing Data Quality Through a Centralized Platform por
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
1.4K visualizações36 slides

Mais de Databricks(20)

DW Migration Webinar-March 2022.pptx por Databricks
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks4.3K visualizações
Data Lakehouse Symposium | Day 1 | Part 1 por Databricks
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks1.5K visualizações
Data Lakehouse Symposium | Day 1 | Part 2 por Databricks
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks743 visualizações
Data Lakehouse Symposium | Day 4 por Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K visualizações
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop por Databricks
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks6.3K visualizações
Democratizing Data Quality Through a Centralized Platform por Databricks
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks1.4K visualizações
Learn to Use Databricks for Data Science por Databricks
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks1.6K visualizações
Why APM Is Not the Same As ML Monitoring por Databricks
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks743 visualizações
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix por Databricks
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks689 visualizações
Stage Level Scheduling Improving Big Data and AI Integration por Databricks
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks850 visualizações
Simplify Data Conversion from Spark to TensorFlow and PyTorch por Databricks
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks1.8K visualizações
Scaling your Data Pipelines with Apache Spark on Kubernetes por Databricks
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks2.1K visualizações
Scaling and Unifying SciKit Learn and Apache Spark Pipelines por Databricks
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks667 visualizações
Sawtooth Windows for Feature Aggregations por Databricks
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks606 visualizações
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink por Databricks
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks677 visualizações
Re-imagine Data Monitoring with whylogs and Spark por Databricks
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks551 visualizações
Raven: End-to-end Optimization of ML Prediction Queries por Databricks
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks450 visualizações
Processing Large Datasets for ADAS Applications using Apache Spark por Databricks
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks514 visualizações
Massive Data Processing in Adobe Using Delta Lake por Databricks
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks719 visualizações
Machine Learning CI/CD for Email Attack Detection por Databricks
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks389 visualizações

Último

Dr. Ousmane Badiane-2023 ReSAKSS Conference por
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceAKADEMIYA2063
6 visualizações34 slides
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language... por
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...patiladiti752
9 visualizações15 slides
Oral presentation.pdf por
Oral presentation.pdfOral presentation.pdf
Oral presentation.pdfreemalmazroui8
6 visualizações10 slides
Business administration Project File.pdf por
Business administration Project File.pdfBusiness administration Project File.pdf
Business administration Project File.pdfKiranPrajapati91
11 visualizações36 slides
K-Drama Recommendation Using Python por
K-Drama Recommendation Using PythonK-Drama Recommendation Using Python
K-Drama Recommendation Using PythonFridaPutriassa
9 visualizações20 slides
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange por
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS TriangeAnalytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS TriangeRNayak3
5 visualizações6 slides

Último(20)

Dr. Ousmane Badiane-2023 ReSAKSS Conference por AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20636 visualizações
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language... por patiladiti752
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
patiladiti7529 visualizações
Oral presentation.pdf por reemalmazroui8
Oral presentation.pdfOral presentation.pdf
Oral presentation.pdf
reemalmazroui86 visualizações
Business administration Project File.pdf por KiranPrajapati91
Business administration Project File.pdfBusiness administration Project File.pdf
Business administration Project File.pdf
KiranPrajapati9111 visualizações
K-Drama Recommendation Using Python por FridaPutriassa
K-Drama Recommendation Using PythonK-Drama Recommendation Using Python
K-Drama Recommendation Using Python
FridaPutriassa9 visualizações
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange por RNayak3
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS TriangeAnalytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange
Analytics Center of Excellence | Data CoE |Analytics CoE| WNS Triange
RNayak35 visualizações
Running PostgreSQL in a Kubernetes cluster: CloudNativePG por Nick Ivanov
Running PostgreSQL in a Kubernetes cluster: CloudNativePGRunning PostgreSQL in a Kubernetes cluster: CloudNativePG
Running PostgreSQL in a Kubernetes cluster: CloudNativePG
Nick Ivanov10 visualizações
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between... por DataScienceConferenc1
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
[DSC Europe 23] Ilija Duni - How Foursquare Builds Meaningful Bridges Between...
DataScienceConferenc15 visualizações
apple.pptx por honeybeeqwe
apple.pptxapple.pptx
apple.pptx
honeybeeqwe8 visualizações
Applied physics letters journal.pdf por aqsamukhtiyar88
Applied physics letters journal.pdfApplied physics letters journal.pdf
Applied physics letters journal.pdf
aqsamukhtiyar887 visualizações
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf por 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr348 visualizações
Custom Tag Manager Templates por Markus Baersch
Custom Tag Manager TemplatesCustom Tag Manager Templates
Custom Tag Manager Templates
Markus Baersch31 visualizações
GDG Community Day 2023 - Interpretable ML in production por SARADINDU SENGUPTA
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in production
SARADINDU SENGUPTA7 visualizações
DGST Methodology Presentation.pdf por maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum8 visualizações
4_4_WP_4_06_ND_Model.pptx por d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 visualizações
Employees attrition por MaryAlejandraDiaz
Employees attritionEmployees attrition
Employees attrition
MaryAlejandraDiaz8 visualizações
Best Home Security Systems.pptx por mogalang
Best Home Security Systems.pptxBest Home Security Systems.pptx
Best Home Security Systems.pptx
mogalang11 visualizações

Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake

  • 1. We Connect Active People With Their Passions COLUMBIA SPORTSWEAR – SPARK + AI SUMMIT 2020 JUNE 2020 LARAMINOR AND BILAL OBEIDAT
  • 2. 2 Enterprise Information Management’s vision is to connect the RIGHT PEOPLE with the RIGHT DATA at the RIGHT TIME to support informed business decisions. Increase Columbia’s ability to: • Be data driven in support of business strategy & operations • Deliver enterprise data assets that meet global information needs • Scale, share and grow using governed data, aligned processes and shared products WHAT IS EIM? VISION
  • 3. 3 Technology • Azure Data Management Stack: –Azure Data Factory –Azure Data Lake –Azure Databricks –Azure Synapse Data Warehouse • SAP BW / Hana DATA DELIVERY abc Development • Integration: –Columbia source systemsinto Azure –Integration to 3rd party analytic systems and applications –Partnered with Columbia’s integration team • Data Models: –Relational and dimensional models for business reporting and analytics –Data models for data science / analytics
  • 5. 5
  • 6. 6 BUSINESS ACCESS abc Info Consumer Business users (4k+) Data Analyst <15 Info Consumer Business SME’s 50 -100 Databricks /DataLake Data Warehouse Azure Analysis Services PowerBI Internal / Open Restricted Internal / Select Restricted Internal / Select Restricted
  • 7. 7 DATA LAKE LAYOUT AND SECURITY abc Raw source Internal • Source system name • Object Name (table name) • Type (full, incremental) • Partition (date) Restricted_domain • Source system name • Object Name (table name) • Type (full, incremental) • Partition (date) Curated Internal • Data Domain (sales) • Schema • Table name Restricted_domain • Schema • Table name Computed(Analysts directory) Dtc_restricted • Analyst determine
  • 8. 8
  • 9. 9 • For bringing in data from source systems, what used to take a week takes a day • All computed data on the lake and available for use. Microservice drops to the lake for real time reporting through Databricks Streaming. • Databricks external metastoreallows for sharing with EIM • Everything we do is through CICD integration • Cloud based and elastic, speed/scalability enabled growth and efficient data processing at low cost • Expanded data access for business, self serve reporting, business analysis • Prepped for data science resourcesto engage • Easy team expansion and onboarding, we’ve increased dev team from 8 to 20 in a 1.5 years POSITIVE OUTCOMES abc
  • 10. 10 General • Security groups • Security Model • Security Audit • Costs Vendor Engagement • Leverage vendors, but know there’s a limit • Professional services • Vendor experts and agreements LESSONS LEARNED abc Data Lake • Security • Organization, Enterprise • Audit, Monitoring • Backup,DR Team • A solid leader or two • Keep the team open to change, open to chaos • Allocate time for discovery • Managing expectationswith senior leadership