SlideShare uma empresa Scribd logo
1 de 14
Confidential
Unifying Analytics with Apache Spark for the Enterprise
Jonathan Gole
Sr. Director, Product Management & Business Analytics
US Card, Data Products
2Confidential
About Capital One
What we do.. Diversified Financial Services A great place to work
3Confidential
We’re innovating to disrupt an industry
“We founded Capital One on the belief that information and technology would revolutionize
financial services. Two decades later, our belief is even stronger.”
– Rich Fairbank, Co-founder & CEO, Capital One
How We Work
4Confidential
US Card - Data Products team lead
Manage largest Apache Spark projects in Capital One
Who am I and why am I here?
5Confidential
Infrastructure challenges have slowed down digital transformation at banks
Typical Bank Challenges
▸Mainframes
▸Slow batch data processes
▸Overly complex and redundant systems
▸Limited support for public cloud and open-source
Harm to the Business
▸Less efficient marketing targeting strategies
▸Limitations with new underwriting techniques
▸Greater process and operational complexity
▸Disconnected or incomplete digital experiences
6Confidential
These infrastructure limitations often limit and frustrate associates
Data Engineer
Business Analyst
Repetitive work, building and fixing ETL pipelines
Difficulty analyzing all data at scale, frustrated by inability to get new insights to customers
We needed to improve our technology
AND our culture
Associates were isolated via separate
technology paradigms..
Software Engineer Reliant on data engineers to build workloads, limited access to data sources
Data Scientist Limited by compute, access to open-source, time to get new models to production
7Confidential
Solution: Unifying data and AI through Apache Spark
Test & Prototype Decision to build around Apache Spark Apply a Product Lens and Learn
through Doing
8Confidential
Learning from initial challenges
Infrastructure & Resilience Associate Learning Curve Diversity & Complexity of Use Cases
Value in centrally managed products to enable rapid innovation Realized we didn’t have the expertise to
build it all ourselves
9Confidential
Next Step: deploy distinct optimized strategies to meet needs of all users
Operations Analytics
• Well-tested, production-ready code
• Deploy workloads/apps independently
• Develop primarily in Scala, Java
• Low-barrier to getting started, easy automation
• Emphasis on fast iteration and collaboration
• Work using SQL, Python, ML libraries
Shared “Quantum” application
framework and code libraries
Unified Analytics Platform, leveraging a
notebook UI & infrastructure automation
Personas
Defining
Needs
Solution
Data Engineers, Software Engineers Data Scientists, Analysts
10Confidential
Continuous investment in improving our products and our ecosystem
Deploy a POC of a
Unified Analytics
Platform (UAP)Analytics
Operations
Opened
platform to
data
scientists
Deployed new
architecture to
scale
Integrations with
common
enterprise Apps,
data sources
Build “stockpile”
of well written
code for
common use
cases
Mass user
training, through
live & self-paced
course
Open to the
enterprise!
Developed
”Quantum” 1.0
framework and
deployment tools
Created
”inner-source”
model inside
enterprise
Built JSON-DSL
abstraction &
common code
libraries
Enable Spark 2.0,
structured streaming,
always-on streaming
operations
Spark on
Kubernetes for
unified dev-ops
Custom
application for
visually-editing
and testing jobs
Make it Work! Make it Scale! Make it Easy!Product
Philosophy
11Confidential
Creating a vast ecosystem around Apache Spark
Infrastructure
Data
Integrations
System
Integrations
Quantum
Application
Framework
Unified
Analytics
Platform
User Interfaces
Notebooks Custom
Workflow Editor
BI Products
+
12Confidential
Enabling innovation across a wide range of data-intensive use cases
ETL
Marketing
Campaigns
Account
Management
External Data
Sharing
Feature
Calculation
Models & ML
Streaming
Alerts
Cloud SQL
analytics
Business
reporting
…
13Confidential
Creating GREAT jobs for our associates
Data Engineer
Business Analyst
Transforming operations via new data sources, real-time streams, & machine learning
Deriving better insights more quickly, partner more closely engineering, data science
Software Engineer Operate as a full-stack team, quickly adding data operations to the application stack
Data Scientist Use advanced ML techniques, easily deploy new models, and access valuable new data
More effective and collaborative culture around data
14Confidential
Lessons learned for unifying analytics within the enterprise
Focus on your customers
Start small, prove value, and iterate
Embrace a community
Take a unified approach

Mais conteúdo relacionado

Mais procurados

Connected Finance Reference Architecture
Connected Finance Reference ArchitectureConnected Finance Reference Architecture
Connected Finance Reference ArchitectureWSO2
 
apidays LIVE Hong Kong 2021 - Building AI/ML model API for Digital App by Eas...
apidays LIVE Hong Kong 2021 - Building AI/ML model API for Digital App by Eas...apidays LIVE Hong Kong 2021 - Building AI/ML model API for Digital App by Eas...
apidays LIVE Hong Kong 2021 - Building AI/ML model API for Digital App by Eas...apidays
 
apidays LIVE Hong Kong 2021 - Rethinking Financial Services with Data in Moti...
apidays LIVE Hong Kong 2021 - Rethinking Financial Services with Data in Moti...apidays LIVE Hong Kong 2021 - Rethinking Financial Services with Data in Moti...
apidays LIVE Hong Kong 2021 - Rethinking Financial Services with Data in Moti...apidays
 
Microservice: the phanot menace. Istio Service Mesh: the new hope. JEEConf 2019
Microservice: the phanot menace. Istio Service Mesh: the new hope. JEEConf 2019Microservice: the phanot menace. Istio Service Mesh: the new hope. JEEConf 2019
Microservice: the phanot menace. Istio Service Mesh: the new hope. JEEConf 2019Sergii Bishyr
 
[EIC 2021] Securing the Digital Double - The Path to a Trusted Digital Ecosystem
[EIC 2021] Securing the Digital Double - The Path to a Trusted Digital Ecosystem[EIC 2021] Securing the Digital Double - The Path to a Trusted Digital Ecosystem
[EIC 2021] Securing the Digital Double - The Path to a Trusted Digital EcosystemWSO2
 
[WSO2Con EU 2018] API-driven Integration with WSO2 at Schneider Electric
[WSO2Con EU 2018] API-driven Integration with WSO2 at Schneider Electric[WSO2Con EU 2018] API-driven Integration with WSO2 at Schneider Electric
[WSO2Con EU 2018] API-driven Integration with WSO2 at Schneider ElectricWSO2
 
apidays LIVE London 2021 - Embedded Finance and new API infrastructure by Rau...
apidays LIVE London 2021 - Embedded Finance and new API infrastructure by Rau...apidays LIVE London 2021 - Embedded Finance and new API infrastructure by Rau...
apidays LIVE London 2021 - Embedded Finance and new API infrastructure by Rau...apidays
 
apidays LIVE Singapore 2021 - A cloud-native approach to open banking in acti...
apidays LIVE Singapore 2021 - A cloud-native approach to open banking in acti...apidays LIVE Singapore 2021 - A cloud-native approach to open banking in acti...
apidays LIVE Singapore 2021 - A cloud-native approach to open banking in acti...apidays
 
apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...
apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...
apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...apidays
 
apidays LIVE New York 2021 - Solving API security through holistic obervabili...
apidays LIVE New York 2021 - Solving API security through holistic obervabili...apidays LIVE New York 2021 - Solving API security through holistic obervabili...
apidays LIVE New York 2021 - Solving API security through holistic obervabili...apidays
 
apidays LIVE Australia 2021 - Democratising data-driven decisions with self-s...
apidays LIVE Australia 2021 - Democratising data-driven decisions with self-s...apidays LIVE Australia 2021 - Democratising data-driven decisions with self-s...
apidays LIVE Australia 2021 - Democratising data-driven decisions with self-s...apidays
 
apidays LIVE Paris 2021 - APIs - How did we get here and where are we going n...
apidays LIVE Paris 2021 - APIs - How did we get here and where are we going n...apidays LIVE Paris 2021 - APIs - How did we get here and where are we going n...
apidays LIVE Paris 2021 - APIs - How did we get here and where are we going n...apidays
 
[WSO2Con USA 2018] Integration Is The New App Foundation
[WSO2Con USA 2018] Integration Is The New App Foundation[WSO2Con USA 2018] Integration Is The New App Foundation
[WSO2Con USA 2018] Integration Is The New App FoundationWSO2
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentAltoros
 
[WSO2Con EU 2018] Simplifying Digital Transformation with an "API Aware" Mindset
[WSO2Con EU 2018] Simplifying Digital Transformation with an "API Aware" Mindset[WSO2Con EU 2018] Simplifying Digital Transformation with an "API Aware" Mindset
[WSO2Con EU 2018] Simplifying Digital Transformation with an "API Aware" MindsetWSO2
 
apidays LIVE London 2021 - Presenting the Kubernetes Browser by Daria Muehlet...
apidays LIVE London 2021 - Presenting the Kubernetes Browser by Daria Muehlet...apidays LIVE London 2021 - Presenting the Kubernetes Browser by Daria Muehlet...
apidays LIVE London 2021 - Presenting the Kubernetes Browser by Daria Muehlet...apidays
 
Building a Future-ready Bank
Building a Future-ready BankBuilding a Future-ready Bank
Building a Future-ready BankWSO2
 
apidays LIVE London 2021 - Banking APIs Evolution by Hector Arias, BBVA
apidays LIVE London 2021 - Banking APIs Evolution by Hector Arias, BBVAapidays LIVE London 2021 - Banking APIs Evolution by Hector Arias, BBVA
apidays LIVE London 2021 - Banking APIs Evolution by Hector Arias, BBVAapidays
 

Mais procurados (20)

Connected Finance Reference Architecture
Connected Finance Reference ArchitectureConnected Finance Reference Architecture
Connected Finance Reference Architecture
 
apidays LIVE Hong Kong 2021 - Building AI/ML model API for Digital App by Eas...
apidays LIVE Hong Kong 2021 - Building AI/ML model API for Digital App by Eas...apidays LIVE Hong Kong 2021 - Building AI/ML model API for Digital App by Eas...
apidays LIVE Hong Kong 2021 - Building AI/ML model API for Digital App by Eas...
 
apidays LIVE Hong Kong 2021 - Rethinking Financial Services with Data in Moti...
apidays LIVE Hong Kong 2021 - Rethinking Financial Services with Data in Moti...apidays LIVE Hong Kong 2021 - Rethinking Financial Services with Data in Moti...
apidays LIVE Hong Kong 2021 - Rethinking Financial Services with Data in Moti...
 
Microservice: the phanot menace. Istio Service Mesh: the new hope. JEEConf 2019
Microservice: the phanot menace. Istio Service Mesh: the new hope. JEEConf 2019Microservice: the phanot menace. Istio Service Mesh: the new hope. JEEConf 2019
Microservice: the phanot menace. Istio Service Mesh: the new hope. JEEConf 2019
 
[EIC 2021] Securing the Digital Double - The Path to a Trusted Digital Ecosystem
[EIC 2021] Securing the Digital Double - The Path to a Trusted Digital Ecosystem[EIC 2021] Securing the Digital Double - The Path to a Trusted Digital Ecosystem
[EIC 2021] Securing the Digital Double - The Path to a Trusted Digital Ecosystem
 
Power
PowerPower
Power
 
[WSO2Con EU 2018] API-driven Integration with WSO2 at Schneider Electric
[WSO2Con EU 2018] API-driven Integration with WSO2 at Schneider Electric[WSO2Con EU 2018] API-driven Integration with WSO2 at Schneider Electric
[WSO2Con EU 2018] API-driven Integration with WSO2 at Schneider Electric
 
apidays LIVE London 2021 - Embedded Finance and new API infrastructure by Rau...
apidays LIVE London 2021 - Embedded Finance and new API infrastructure by Rau...apidays LIVE London 2021 - Embedded Finance and new API infrastructure by Rau...
apidays LIVE London 2021 - Embedded Finance and new API infrastructure by Rau...
 
apidays LIVE Singapore 2021 - A cloud-native approach to open banking in acti...
apidays LIVE Singapore 2021 - A cloud-native approach to open banking in acti...apidays LIVE Singapore 2021 - A cloud-native approach to open banking in acti...
apidays LIVE Singapore 2021 - A cloud-native approach to open banking in acti...
 
apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...
apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...
apidays LIVE New York 2021 - Microservice Authorization with Open Policy Agen...
 
apidays LIVE New York 2021 - Solving API security through holistic obervabili...
apidays LIVE New York 2021 - Solving API security through holistic obervabili...apidays LIVE New York 2021 - Solving API security through holistic obervabili...
apidays LIVE New York 2021 - Solving API security through holistic obervabili...
 
apidays LIVE Australia 2021 - Democratising data-driven decisions with self-s...
apidays LIVE Australia 2021 - Democratising data-driven decisions with self-s...apidays LIVE Australia 2021 - Democratising data-driven decisions with self-s...
apidays LIVE Australia 2021 - Democratising data-driven decisions with self-s...
 
apidays LIVE Paris 2021 - APIs - How did we get here and where are we going n...
apidays LIVE Paris 2021 - APIs - How did we get here and where are we going n...apidays LIVE Paris 2021 - APIs - How did we get here and where are we going n...
apidays LIVE Paris 2021 - APIs - How did we get here and where are we going n...
 
[WSO2Con USA 2018] Integration Is The New App Foundation
[WSO2Con USA 2018] Integration Is The New App Foundation[WSO2Con USA 2018] Integration Is The New App Foundation
[WSO2Con USA 2018] Integration Is The New App Foundation
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and Treatment
 
[WSO2Con EU 2018] Simplifying Digital Transformation with an "API Aware" Mindset
[WSO2Con EU 2018] Simplifying Digital Transformation with an "API Aware" Mindset[WSO2Con EU 2018] Simplifying Digital Transformation with an "API Aware" Mindset
[WSO2Con EU 2018] Simplifying Digital Transformation with an "API Aware" Mindset
 
apidays LIVE London 2021 - Presenting the Kubernetes Browser by Daria Muehlet...
apidays LIVE London 2021 - Presenting the Kubernetes Browser by Daria Muehlet...apidays LIVE London 2021 - Presenting the Kubernetes Browser by Daria Muehlet...
apidays LIVE London 2021 - Presenting the Kubernetes Browser by Daria Muehlet...
 
Building a Future-ready Bank
Building a Future-ready BankBuilding a Future-ready Bank
Building a Future-ready Bank
 
ING microServices
ING   microServicesING   microServices
ING microServices
 
apidays LIVE London 2021 - Banking APIs Evolution by Hector Arias, BBVA
apidays LIVE London 2021 - Banking APIs Evolution by Hector Arias, BBVAapidays LIVE London 2021 - Banking APIs Evolution by Hector Arias, BBVA
apidays LIVE London 2021 - Banking APIs Evolution by Hector Arias, BBVA
 

Semelhante a Democratizing Apache Spark for the Enterprise with Jonathan Gole

It Consulting & Services - Black Basil Technologies
It Consulting & Services  - Black Basil TechnologiesIt Consulting & Services  - Black Basil Technologies
It Consulting & Services - Black Basil TechnologiesBlack Basil Technologies
 
Building a Data Streaming Center of Excellence With Steve Gonzalez and Derek ...
Building a Data Streaming Center of Excellence With Steve Gonzalez and Derek ...Building a Data Streaming Center of Excellence With Steve Gonzalez and Derek ...
Building a Data Streaming Center of Excellence With Steve Gonzalez and Derek ...HostedbyConfluent
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive eraIBM Analytics
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsInside Analysis
 
How to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeVMware Tanzu
 
ABHIJEET MURLIDHAR GHAG Axisbank
ABHIJEET MURLIDHAR GHAG AxisbankABHIJEET MURLIDHAR GHAG Axisbank
ABHIJEET MURLIDHAR GHAG AxisbankAbhijeet Ghag
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Embracing Cloud Deployment for Big Data and DevOps
Embracing Cloud Deployment for Big Data and DevOpsEmbracing Cloud Deployment for Big Data and DevOps
Embracing Cloud Deployment for Big Data and DevOpsSteve Woodward
 
Embracing Cloud Deployment for Big Data and Dev Ops
Embracing Cloud Deployment for Big Data and Dev OpsEmbracing Cloud Deployment for Big Data and Dev Ops
Embracing Cloud Deployment for Big Data and Dev OpsNick Brown
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationInside Analysis
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...SoftServe
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the tradeFangda Wang
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...Docker, Inc.
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Oracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentOracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentNitai Partners Inc
 
BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0BizTrans SysTech
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 

Semelhante a Democratizing Apache Spark for the Enterprise with Jonathan Gole (20)

It Consulting & Services - Black Basil Technologies
It Consulting & Services  - Black Basil TechnologiesIt Consulting & Services  - Black Basil Technologies
It Consulting & Services - Black Basil Technologies
 
Building a Data Streaming Center of Excellence With Steve Gonzalez and Derek ...
Building a Data Streaming Center of Excellence With Steve Gonzalez and Derek ...Building a Data Streaming Center of Excellence With Steve Gonzalez and Derek ...
Building a Data Streaming Center of Excellence With Steve Gonzalez and Derek ...
 
Make data simple in the cognitive era
Make data simple in the cognitive eraMake data simple in the cognitive era
Make data simple in the cognitive era
 
Analytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old ConstraintsAnalytic Excellence - Saying Goodbye to Old Constraints
Analytic Excellence - Saying Goodbye to Old Constraints
 
How to Migrate Applications Off a Mainframe
How to Migrate Applications Off a MainframeHow to Migrate Applications Off a Mainframe
How to Migrate Applications Off a Mainframe
 
ABHIJEET MURLIDHAR GHAG Axisbank
ABHIJEET MURLIDHAR GHAG AxisbankABHIJEET MURLIDHAR GHAG Axisbank
ABHIJEET MURLIDHAR GHAG Axisbank
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Embracing Cloud Deployment for Big Data and DevOps
Embracing Cloud Deployment for Big Data and DevOpsEmbracing Cloud Deployment for Big Data and DevOps
Embracing Cloud Deployment for Big Data and DevOps
 
Embracing Cloud Deployment for Big Data and Dev Ops
Embracing Cloud Deployment for Big Data and Dev OpsEmbracing Cloud Deployment for Big Data and Dev Ops
Embracing Cloud Deployment for Big Data and Dev Ops
 
Smarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with AutomationSmarter Analytics: Supporting the Enterprise with Automation
Smarter Analytics: Supporting the Enterprise with Automation
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
The Complexity to "Yes" in Analytics Software and the Possibilities with Dock...
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
Oracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentOracle business analytics and endeca approach Document
Oracle business analytics and endeca approach Document
 
BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0BizTrans SysTech_Analytics_Serv_SAP_v1.0
BizTrans SysTech_Analytics_Serv_SAP_v1.0
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 

Último (20)

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 

Democratizing Apache Spark for the Enterprise with Jonathan Gole

  • 1. Confidential Unifying Analytics with Apache Spark for the Enterprise Jonathan Gole Sr. Director, Product Management & Business Analytics US Card, Data Products
  • 2. 2Confidential About Capital One What we do.. Diversified Financial Services A great place to work
  • 3. 3Confidential We’re innovating to disrupt an industry “We founded Capital One on the belief that information and technology would revolutionize financial services. Two decades later, our belief is even stronger.” – Rich Fairbank, Co-founder & CEO, Capital One How We Work
  • 4. 4Confidential US Card - Data Products team lead Manage largest Apache Spark projects in Capital One Who am I and why am I here?
  • 5. 5Confidential Infrastructure challenges have slowed down digital transformation at banks Typical Bank Challenges ▸Mainframes ▸Slow batch data processes ▸Overly complex and redundant systems ▸Limited support for public cloud and open-source Harm to the Business ▸Less efficient marketing targeting strategies ▸Limitations with new underwriting techniques ▸Greater process and operational complexity ▸Disconnected or incomplete digital experiences
  • 6. 6Confidential These infrastructure limitations often limit and frustrate associates Data Engineer Business Analyst Repetitive work, building and fixing ETL pipelines Difficulty analyzing all data at scale, frustrated by inability to get new insights to customers We needed to improve our technology AND our culture Associates were isolated via separate technology paradigms.. Software Engineer Reliant on data engineers to build workloads, limited access to data sources Data Scientist Limited by compute, access to open-source, time to get new models to production
  • 7. 7Confidential Solution: Unifying data and AI through Apache Spark Test & Prototype Decision to build around Apache Spark Apply a Product Lens and Learn through Doing
  • 8. 8Confidential Learning from initial challenges Infrastructure & Resilience Associate Learning Curve Diversity & Complexity of Use Cases Value in centrally managed products to enable rapid innovation Realized we didn’t have the expertise to build it all ourselves
  • 9. 9Confidential Next Step: deploy distinct optimized strategies to meet needs of all users Operations Analytics • Well-tested, production-ready code • Deploy workloads/apps independently • Develop primarily in Scala, Java • Low-barrier to getting started, easy automation • Emphasis on fast iteration and collaboration • Work using SQL, Python, ML libraries Shared “Quantum” application framework and code libraries Unified Analytics Platform, leveraging a notebook UI & infrastructure automation Personas Defining Needs Solution Data Engineers, Software Engineers Data Scientists, Analysts
  • 10. 10Confidential Continuous investment in improving our products and our ecosystem Deploy a POC of a Unified Analytics Platform (UAP)Analytics Operations Opened platform to data scientists Deployed new architecture to scale Integrations with common enterprise Apps, data sources Build “stockpile” of well written code for common use cases Mass user training, through live & self-paced course Open to the enterprise! Developed ”Quantum” 1.0 framework and deployment tools Created ”inner-source” model inside enterprise Built JSON-DSL abstraction & common code libraries Enable Spark 2.0, structured streaming, always-on streaming operations Spark on Kubernetes for unified dev-ops Custom application for visually-editing and testing jobs Make it Work! Make it Scale! Make it Easy!Product Philosophy
  • 11. 11Confidential Creating a vast ecosystem around Apache Spark Infrastructure Data Integrations System Integrations Quantum Application Framework Unified Analytics Platform User Interfaces Notebooks Custom Workflow Editor BI Products +
  • 12. 12Confidential Enabling innovation across a wide range of data-intensive use cases ETL Marketing Campaigns Account Management External Data Sharing Feature Calculation Models & ML Streaming Alerts Cloud SQL analytics Business reporting …
  • 13. 13Confidential Creating GREAT jobs for our associates Data Engineer Business Analyst Transforming operations via new data sources, real-time streams, & machine learning Deriving better insights more quickly, partner more closely engineering, data science Software Engineer Operate as a full-stack team, quickly adding data operations to the application stack Data Scientist Use advanced ML techniques, easily deploy new models, and access valuable new data More effective and collaborative culture around data
  • 14. 14Confidential Lessons learned for unifying analytics within the enterprise Focus on your customers Start small, prove value, and iterate Embrace a community Take a unified approach

Notas do Editor

  1. Hello! Thank you to Databricks and the spark summit organizers for having us here. I want to share our story of unifying analytics by successfully spreading Apache spark across our enterprise
  2. You may know us for our credit cards and our catchy commercials.. Sorry I couldn’t bring Samuel L Jackson However, we are a diversified financial services leader. corporate offices nationwide – headquartered outside of Washington DC, office one block away in San Francisco. Cap One is relatively new company, IPO’d in 1994. First FinTech unicorn. To understand us, look to our long-term CEO and co-founder Rich Fairbank.
  3. Founded to disrupt the industry by leveraging the technology and data revolution Embraced rapid experimentation and rigorous data-driven decision making Data technology was and is still at the heart of Capital One’s success Apache Spark is the latest technology revolution transforming the way we use data to drive our businesses.
  4. PdM team delivering new products and services for transforming our ability to succeed with Data. In this role, my team has built the largest Spark projects in Capital one tell you the story of how we transformed our business through the unifying capabilities of Apache Spark
  5. Like other financial companies…Built a mature, growing business on prior technology paradigms As data systems aged, limited in ingenuity and innovation in our core businesses - company uses data at the heart of everything, the cost was meaningful - needed to change this paradigm
  6. associates were clearly impacted - less efficient, less able to focus on innovative, less able to test and learn. testing & experimentation were core to the foundation of Capital One, so technology limitations were challenging the nature of our culture shortcomings as a team - people were isolated from each other, working in different technology paradigms Create a new paradigm – both eliminate technology limitations and our culture around data
  7. typical evaluation process – work with experts, get hands dirty build around Spark - Mot active project, Unifies batch and streaming, flexible APIs and support for multiple languages more important - product-centric focus: Unified ownership, real high value use cases, not abstract technology principles Focus on our associates as customers Learn through doing - learn quickly and altering approach as needed Do not start by building an enterprise platform, first build use cases ourselves to deeply understand needs
  8. challenges in first use case: rolled our own infrastructure - several months to master Associates working in new paradigms: AWS & public cloud, distributed system, use of scala, separation of storage and compute quickly realized we could not upfront design one system that would meet everyone’s needs Diversity & complexity of our business - 80M customers, long-term relationships, regulatory burdens. Aware of our own ignorance, need to continue to learn through use cases & our customers
  9. Operations: business critical applications, fulfill customer & regulatory promise Enable a high degree of flexibility and customization by engineers Yet maintain a degree of efficiency in development, standards to well govern our critical capps Answer: build a custom in-house shared application framework, leveraging shared code libraries and tooling. Analytics – insight generational, data science, and machine learning Iterate on new ideas quickly, collaborate effectively Ignore infrastructure - It should just work, and let me be creative Answer: deploy a shared data science platform – a Unified Analytics Platform. We knew we didn’t have the expertise, so we partnered with Databricks
  10. iteratively expanded on that strategy Invested continuously into improving the products core product philosophy to drive our decision making – similar to Suffering Oriented programming Focus first on enabling and proving out new transformative functionalty – use real business use cases to validate progress - ( highlight Spark 2.0) set the foundation for growth. Improve performance and scalability. Reduce technical debt. Build tooling to attracting future customers (Highlight – JSON DSL, Inner-Sourcing) Expand through good user experience – invest in ease of use and efficiency (Ensure a great on-boarding expience for UAP, created self service training in notebooks)
  11. Over time, this product and customer focus organically led to a large ecosystem operating around our Apache Spark products. Spark as a primary data compute layer for many of our systems We didn’t have control over this whole ecosystem, but we embedding flexibility in our platforms and allowed for emergent design Apache sparks reflects this ecosystem approach- continue to deepen the connectivity across the data and software ecosystem
  12. Truly shows the massive impact of Spark- a design that unifies domains can have a massive ROI TO call out a few accomplishments, our developer & DS community: Re-building 100s of ETL jobs in the cloud self-service marketing automation platform improving our fraud defenses, protecting customer identities Enabled rapid testing of ML approaches, detecting changing credit trends, automating manual processes
  13. transforming what our associates could achieve leveling-up in their productivity Enable shift to more creative ideas. Given the shared technology paradigm, associates organically work together, collaborate more effectively. growing importance of small, empowered, cross-functional or cross-skilled teams.
  14. Highlight and reemphasize several learnings from our efforts with Spark: 1) see colleagues as customers, understand & solve for both broad platform needs & enable unique problem solving 2) Learn quickly through doing, instead of abstract evaluation. 3) enable community development, collect feedback, and provide training 4) unified approach – accelerate innovation by removing barriers Thank you for having me here, I’m proud to participate in such a great Spark community