SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
Josh McNutt and Keria Bermúdez-Hernández
Showtime Networks Inc.
Data-Driven Transformation:
Leveraging Big Data at
SHOWTIME with Apache Spark
#UnifiedAnalytics #SparkAISummit
Premium cable network known for bold and wholly
unique programming
In 2018 SHOWTIME had four of the top six scripted
hour-long series on premium cable: HOMELAND,
SHAMELESS, RAY DONOVAN and BILLIONS
Launch of SHOWTIME standalone streaming
service in July 2015 made available an
unprecedented level of user-level data
Showtime Networks Inc.
- Viewing History
- Customer Journey Information
- CRM interactions
...
Billions of records
Extremely high level of granularity
Big Data at SHOWTIME
Direct relationship with our
customers
SHOWTIME standalone streaming service
Rich data captured on
every customer interaction:
QuestionsBig Data at SHOWTIME
What is the lifetime value of a subscriber who
signs up to watch Ray Donovan?
How many free trial signups have been
generated by Shameless?
What is the probability a subscriber will begin
watching Billions for the first time in the next 7
days?
Questions
Capacity to Answer
Big Data at SHOWTIME
Big Data at SHOWTIME
Strong data science skills needed to
interact directly with data lake
Research/Business
Analyst
Basic questions answered via
operational reporting
Complex questions required bespoke
analyses, sometimes taking weeks or
longer to develop
Raw
Data
Questions
Capacity to Answer
How to reduce this
gap?
Big Data at SHOWTIME
Small team of data scientists working to…
Democratize data and analytics
Understand and predict subscriber behaviors
Support data-driven programming & scheduling
Data Strategy Team
We’re working to bring the data closer to the users and the users closer to the data
Business/Research
Analyst
Raw
Data
Augmenting Data
Supply Chain
Making the data
more suitable for
analysis
Democratizing Data and Analytics
Master Viewers Table
Foundational customer data
representation
Captures 1000s of metrics and
behaviors in Free Trial, Paid Month 1,
lifetime
Tracks relationship between each
user and series
Supports several use cases:
• machine learning
• reporting/dashboarding
• ad hoc analysis
Democratizing Data and Analytics
We’re working to bring the data closer to the users and the users closer to the data
Business/Research
Analyst
Raw
Data
Augmenting Data
Supply Chain
Data Skills
Training
Creating conditions
leading to a chain reaction
of curiosity, exploration,
analysis and insight
Making the data
more suitable for
analysis
Tableau
SQL
Democratizing Data and Analytics
We build actionable machine learning models to gain an intimate understanding of
how our users behave and how to strengthen their relationship to SHOWTIME
XGBoost
Random
Forests
Churn Propensity
Resubscription Propensity
Future Customer Value
Series Viewership Propensity
$
Understanding and Predicting Subscriber Behaviors
Put into production:
We are employing data and analytics to attribute revenue and subscribers to content
Free trial signups
Resubscribers
Reduction in Churn
among viewers
Renewal
Decisions
Schedule
Optimization
Subscriber
Growth
Revenue Growth
1 3 5 7 9 11
Churn % by Paid
Month
Mean subscriber
lifetime value (LTV)
Supporting Data-Driven Programming and Scheduling
How to stitch
everything
together?
FLASHBACK: In order to build this capability, we needed to
get basic infrastructure in place
• Team comprised exclusively of data scientists without a dedicated DevOps engineer
• Lot of time spent troubleshooting cluster configuration
• Launching clusters was slow and clumsy
• Debugging our bootstrap script (to install python libraries) was nontrivial
• Any software or hardware upgrades were scary and generally avoided for as long as
possible
#struggle
Configuration Challenges
Back to focusing on data science!
• Booting and managing clusters is now easy, fast and
reliable
• Simple to attach/detach notebooks to clusters
• Databricks handles installation of python libraries
• Easy to connect data sources to Tableau
• We can confidently experiment with new software or
hardware in an effort to improve our workflows
Databricks Unified Analytics Platform
Data Pipeline
19#UnifiedAnalytics #SparkAISummit
Optimizing Pipeline
Solutions
• Apache Airflow
• Code optimization tactics:
– Replacing RDD and Pandas
transformations with Pyspark
dataframe transformations
– Multithreading using futures
module
20#UnifiedAnalytics #SparkAISummit
Apache Airflow
• Produced different tables by using
dbutils.notebook API
– While technically you can run notebooks
concurrently, complex dependencies and
concurrent jobs are harder to do in a
single notebook and job
job1
job2
job3
job4
job5
job6
Workflow in a single Notebook Workflow in Apache Airflow
job1
job2
job3
job4
job5
job6
dbutils.notebook.run("notebook-name", 60,
{"argument": "data", "argument2": "data2",
...})
notebook_task = DatabricksSubmitRunOperator(
task_id='notebook_task’,
dag=dag,
json=notebook_task_params)
• Apache Airflow was the solution for
scheduling and managing pipelines
– Airflow and Databricks integration
– Manages dependencies
– Task job can be run in a different cluster
Code Optimization Tactics
22
• User level RDD with subscriber status and viewership records
• Combinations of methods from custom classes
Code Optimization Tactics
23
Code Optimization Tactics
24
Code Optimization Tactics
25
• To reduce tasks complexity we saved intermediate
tables
Different Levels of Concurrency
26
BEFORE AFTER
Tracking Data Quality
27#UnifiedAnalytics #SparkAISummit
28#UnifiedAnalytics #SparkAISummit
Tracking Data Quality
Using Delta to Update Tables
• Without Delta:
– Ingestion daily
data
– Updating fields
– Rewriting data
3.03
0.47
0
0.5
1
1.5
2
2.5
3
3.5
Without Delta With Delta
Hours
29#UnifiedAnalytics #SparkAISummit
MERGE INTO viewership AS hv
USING temp_tc AS t
ON hv.title = t.title
WHEN MATCHED AND hv.category IS NULL OR
hv.category != t.category THEN
UPDATE SET
hv.category = t.category
• With Delta
– Ingestion daily
data
– Updating fields
where necessary
– Rewriting data
Optimization Summary
• Apache Airflow
• Code optimizations
• MLflow for tracking data quality
• Delta for updating tables
30#UnifiedAnalytics #SparkAISummit
Unified platform allows us to democratize data and
analytics
We continue to engage in ongoing innovation and
optimization
Exciting cultural shift
Final Thoughts
Questions?
32#UnifiedAnalytics #SparkAISummit
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Mais conteúdo relacionado

Mais procurados

Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeDatabricks
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Databricks
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleDatabricks
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Databricks
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowDatabricks
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsDatabricks
 
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFsAutomating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFsDatabricks
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkDatabricks
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureDatabricks
 
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Databricks
 
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Databricks
 
Asynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkAsynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkDatabricks
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationDatabricks
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 

Mais procurados (20)

Accelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks RuntimeAccelerating Machine Learning on Databricks Runtime
Accelerating Machine Learning on Databricks Runtime
 
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFsAutomating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySpark
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
 
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
 
Asynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkAsynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache Spark
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script Transformation
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Tailored for Spark
Tailored for SparkTailored for Spark
Tailored for Spark
 

Semelhante a Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark

Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyNeo4j
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDatabricks
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Neo4j
 
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATIONLogitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATIONAvinash Deshpande
 
Designing a Future-proof API Program
Designing a Future-proof API ProgramDesigning a Future-proof API Program
Designing a Future-proof API ProgramPronovix
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...BigDataEverywhere
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 

Semelhante a Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark (20)

Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATIONLogitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
 
Designing a Future-proof API Program
Designing a Future-proof API ProgramDesigning a Future-proof API Program
Designing a Future-proof API Program
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 

Último (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 

Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark

  • 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  • 2. Josh McNutt and Keria Bermúdez-Hernández Showtime Networks Inc. Data-Driven Transformation: Leveraging Big Data at SHOWTIME with Apache Spark #UnifiedAnalytics #SparkAISummit
  • 3. Premium cable network known for bold and wholly unique programming In 2018 SHOWTIME had four of the top six scripted hour-long series on premium cable: HOMELAND, SHAMELESS, RAY DONOVAN and BILLIONS Launch of SHOWTIME standalone streaming service in July 2015 made available an unprecedented level of user-level data Showtime Networks Inc.
  • 4. - Viewing History - Customer Journey Information - CRM interactions ... Billions of records Extremely high level of granularity Big Data at SHOWTIME Direct relationship with our customers SHOWTIME standalone streaming service Rich data captured on every customer interaction:
  • 6. What is the lifetime value of a subscriber who signs up to watch Ray Donovan? How many free trial signups have been generated by Shameless? What is the probability a subscriber will begin watching Billions for the first time in the next 7 days?
  • 8. Big Data at SHOWTIME Strong data science skills needed to interact directly with data lake Research/Business Analyst Basic questions answered via operational reporting Complex questions required bespoke analyses, sometimes taking weeks or longer to develop Raw Data
  • 9. Questions Capacity to Answer How to reduce this gap? Big Data at SHOWTIME
  • 10. Small team of data scientists working to… Democratize data and analytics Understand and predict subscriber behaviors Support data-driven programming & scheduling Data Strategy Team
  • 11. We’re working to bring the data closer to the users and the users closer to the data Business/Research Analyst Raw Data Augmenting Data Supply Chain Making the data more suitable for analysis Democratizing Data and Analytics
  • 12. Master Viewers Table Foundational customer data representation Captures 1000s of metrics and behaviors in Free Trial, Paid Month 1, lifetime Tracks relationship between each user and series Supports several use cases: • machine learning • reporting/dashboarding • ad hoc analysis Democratizing Data and Analytics
  • 13. We’re working to bring the data closer to the users and the users closer to the data Business/Research Analyst Raw Data Augmenting Data Supply Chain Data Skills Training Creating conditions leading to a chain reaction of curiosity, exploration, analysis and insight Making the data more suitable for analysis Tableau SQL Democratizing Data and Analytics
  • 14. We build actionable machine learning models to gain an intimate understanding of how our users behave and how to strengthen their relationship to SHOWTIME XGBoost Random Forests Churn Propensity Resubscription Propensity Future Customer Value Series Viewership Propensity $ Understanding and Predicting Subscriber Behaviors Put into production:
  • 15. We are employing data and analytics to attribute revenue and subscribers to content Free trial signups Resubscribers Reduction in Churn among viewers Renewal Decisions Schedule Optimization Subscriber Growth Revenue Growth 1 3 5 7 9 11 Churn % by Paid Month Mean subscriber lifetime value (LTV) Supporting Data-Driven Programming and Scheduling
  • 16. How to stitch everything together? FLASHBACK: In order to build this capability, we needed to get basic infrastructure in place
  • 17. • Team comprised exclusively of data scientists without a dedicated DevOps engineer • Lot of time spent troubleshooting cluster configuration • Launching clusters was slow and clumsy • Debugging our bootstrap script (to install python libraries) was nontrivial • Any software or hardware upgrades were scary and generally avoided for as long as possible #struggle Configuration Challenges
  • 18. Back to focusing on data science! • Booting and managing clusters is now easy, fast and reliable • Simple to attach/detach notebooks to clusters • Databricks handles installation of python libraries • Easy to connect data sources to Tableau • We can confidently experiment with new software or hardware in an effort to improve our workflows Databricks Unified Analytics Platform
  • 20. Optimizing Pipeline Solutions • Apache Airflow • Code optimization tactics: – Replacing RDD and Pandas transformations with Pyspark dataframe transformations – Multithreading using futures module 20#UnifiedAnalytics #SparkAISummit
  • 21. Apache Airflow • Produced different tables by using dbutils.notebook API – While technically you can run notebooks concurrently, complex dependencies and concurrent jobs are harder to do in a single notebook and job job1 job2 job3 job4 job5 job6 Workflow in a single Notebook Workflow in Apache Airflow job1 job2 job3 job4 job5 job6 dbutils.notebook.run("notebook-name", 60, {"argument": "data", "argument2": "data2", ...}) notebook_task = DatabricksSubmitRunOperator( task_id='notebook_task’, dag=dag, json=notebook_task_params) • Apache Airflow was the solution for scheduling and managing pipelines – Airflow and Databricks integration – Manages dependencies – Task job can be run in a different cluster
  • 22. Code Optimization Tactics 22 • User level RDD with subscriber status and viewership records • Combinations of methods from custom classes
  • 25. Code Optimization Tactics 25 • To reduce tasks complexity we saved intermediate tables
  • 26. Different Levels of Concurrency 26 BEFORE AFTER
  • 29. Using Delta to Update Tables • Without Delta: – Ingestion daily data – Updating fields – Rewriting data 3.03 0.47 0 0.5 1 1.5 2 2.5 3 3.5 Without Delta With Delta Hours 29#UnifiedAnalytics #SparkAISummit MERGE INTO viewership AS hv USING temp_tc AS t ON hv.title = t.title WHEN MATCHED AND hv.category IS NULL OR hv.category != t.category THEN UPDATE SET hv.category = t.category • With Delta – Ingestion daily data – Updating fields where necessary – Rewriting data
  • 30. Optimization Summary • Apache Airflow • Code optimizations • MLflow for tracking data quality • Delta for updating tables 30#UnifiedAnalytics #SparkAISummit
  • 31. Unified platform allows us to democratize data and analytics We continue to engage in ongoing innovation and optimization Exciting cultural shift Final Thoughts
  • 33. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT