SlideShare uma empresa Scribd logo
1 de 39
ANSELMO SILVA
Building a Solid Foundation
Expedia Partner Solutions
Data Platform
The world’s travel platform
600M+
monthly visits in 75+ countries
20k
employees worldwide
90B usd
yearly travel sales
10k
Affiliates
Expedia Group is all
about travel;
Our secret is that
we are also all
about technology
and data science.
600M+
API hotel searches a day
1000+
EPS powered partners
30%+
YOY growth
10TB
Data processed every day
A FEW OF OUR PAR TNERS
Travel Industry challenges & opportunities
Augmented
Reality and VR
Market growth
and consolidation
source: skift.com
TRAVEL INDUSTRY CHALLENGES & OPPOR TUNITIES
Voice search and
personalisation
TRAVEL INDUSTRY CHALLENGES & OPPOR TUNITIES
AI - ML AT EXPEDIA PAR TNER SOLUTIONS
Sorting
AI - ML AT EXPEDIA PAR TNER SOLUTIONS
Image Classification
AI - ML AT EXPEDIA PAR TNER SOLUTIONS
Forecast & Anomaly Detection
AI - ML AT EXPEDIA PAR TNER SOLUTIONS
Voice & Bots
AI - ML AT EXPEDIA PAR TNER SOLUTIONS
Recomendations & Cross-Sell
Data Challenges
DATA-SCIENCE
Heterogenity (Partners and Supply)
Supply size (> 500K Properties)
Partners size (> 1K Partners)
Content size (> 1M Images)
Data size (10TB per day, PBs data lake)
Guiding principles
Data
Platform
Data Lake
Hive Metastore
On-Premises
CLOUD MIGRATION
Follow data producers path
Improve security, scalabity and resilience
Promote technology innovation
Separate computing from storage
Hive Metastore
Solid Foundation
Data Lake
Hive Metastore
On-Premises
Hive Metastore
DATA REPLICATION { CIRCUS-TRAIN }
Replicates Hive tables between clusters on request. It
replicates both the table's data and metadata.
It has a light touch, requiring no direct integration
with Hive's core services.
It can copy either entire unpartitioned tables or user
defined sets of partitions on partitioned tables.
it is not event driven and does not know how tables
differ between sites.
SOLID FOUNDATION
https://github.com/hotelsdotcom/circus-train
SOLID FOUNDATION
circus_train.yml
source-catalog:
name: on_prem_dw
disable-snapshots: true
hive-metastore-uris: ${on-prem-dw-foo-params.source-thrift-uris}
replica-catalog:
name: usw2_foo
hive-metastore-uris: ${usw2-foo-params.replica-thrift-uris}
copier-options:
tmp-dir: hdfs:///tmp/circus-train/
region: us-west-2
table-replications:
-copier-options:
task-count: ${usw2-foo-params.task-count}
source-table:
database-name: ${usw2-foo-params.database-name}
table-name: ${usw2-foo-params.table-name}
partition-filter: ${usw2-foo-params.partition-filter}
replica-table:
database-name: ${usw2-foo-params.target-database-name}
table-location: s3://${usw2-foo-params.s3-bucket-name} …
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Data Lake
Hive Metastore
Data Lake #2
Hive Metastore
SOLID FOUNDATION
Data Lake #3
Hive Metastore
DATA FEDERATION { WAGGLE-DANCE }
Waggle Dance is a request routing
Hive metastore proxy that allows
tables to be concurrently accessed
across multiple Hive deployments.
It was created to tackle the
appearance of dataset silos that
arose as our large organization
gradually migrated from monolithic
on-premises clusters to cloud based
platforms.
https://github.com/hotelsdotcom/waggle-dance
Data Lake
Hive Metastore
SOLID FOUNDATION
Data Lake #3
Hive Metastore
waggle_dance_federation.yml
primary-meta-store:
access-control-type: READ_AND_WRITE_ON_DATABASE_WHITELIST
name: primary
remote-meta-store-uris: ${ON_PREM_HIVE_METASTORE_URI}
writable-database-white-list:
- foo_user_.*
federated-meta-stores:
- name: zed-bar-prod
access-control-type: READ_ONLY
remote-meta-store-uris: ${USW2_6623552_PROD_HIVE_METASTORE_URI}
mapped-databases:
- foo_transaction
- bar_stream
- zed_common
- opp_charles
Data Lake
Hive Metastore
SOLID FOUNDATION
On-Premises
Hive Metastore
DATA QUALITY FRAMEWORK
Manage core data-assets like anyother product,
promoting instrumentation, observability and
alerting.
“First to Know” culture and process, measuring
how data-assets are accessible, fresh,
complete, accurate, enriched, integrated.
#BKG-MART #USR-TABLE
#CLK—STREAM
Easy to produce data
Online
Offline
Development
Data Lake
Hive Metastore
On-Premises
Hive Metastore
NRT Streaming Service
Data
Producers
Online
Offline
Development
Data Lake
Hive Metastore
On-Premises
Hive Metastore
NRT Streaming Service
Data
Producers
NRT SERVICE
One way to produce data
Scalability - perf/efficiency (Kafka)
Simplifiied schema management
Support on all environments
Strive for a full hands-off service
EASY TO PRODUCE DATA
Online
Offline
Development
Data Lake
Hive Metastore
On-Premises
Hive Metastore
NRT Streaming Service
Data
Producers
PRODUCER CONTRACT
Own the data schema
Own produced data (e2e)
Stream events in realtime
Obfuscate sensitive information
Document and update data assets
Monitor data in production
EASY TO PRODUCE DATA
Easy to consume data
Online
Offline
Development
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Orchestrator
NRT Streaming Service
Data
Producers
Data Exploration + Pipelines Setup DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Online
Offline
Development
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Orchestrator
NRT Streaming Service
Data
Producers
CONSUMER CONTRACT
Consume documented data-assets
Use approved access layers/libs
Report back any data quality issue
Anotate outputs with data-sources
Follow data governance guidelines
Adopt schema changes
*Do not duplicate data-assets*
EASY TO CONSUME DATA
Online
Offline
Development
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Orchestrator
NRT Streaming Service
Data
Producers
QUERY ENGINES + TOOLS
Hive, Presto, Spark
EMR (data processing)
Databricks (data science)
Qubole (query, insights)
Athena (operational support)
EASY TO CONSUME DATA
Data Exploration + Pipelines Setup DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Online
Offline
Development
Data Lake
Hive MetastoreOrchestrator
On-Premises
Hive Metastore
NRT Streaming Service
Data
Producers
ANALYTICS API (METRICS/DIMS STORE)
Programatical access to analytical data
with granular ACL on data-sets,
columns, rows.
Metadata, search, breakdown, filter,
timeseries, comparison, forecast on key
data-sets (sub-second response time).
EASY TO CONSUME DATA
ANALYTICS API
curl -o analytics.eps/bookings?
dateField=created_day&date_range=2018-03-01,2018-05-01|
2018-01-01,2018-03-01&groupby=partner
[top=10,by=foo]&fields=foo,zed,bar&interval=hour
Data Science pushes the envelope
Online
Offline
Development
Data Lake
Orchestrator Hive Metastore
On-Premises
Hive Metastore
NRT Streaming Service
Data
Producers
DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Batch Model
Execution
Prediction + Backtesting
Training Set
Validation Set
Algorithm TrainingModel Config
ML Model Store
Data Exploration + Pipelines Setup
Online
Offline
Development
Orchestrator
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Data Exploration + Pipelines Setup DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Batch Model
Execution
Prediction + Backtesting
Training Set
Validation Set
Algorithm TrainingModel Config
ML Model Store
DS DEVELOPMENT CYCLE
Models Tuning
Algorithm Training
ML Model storage
DATA SCIENCE PUSHES THE ENVELOPE
Online
Offline
Development
Orchestrator
Data Lake
Hive Metastore
On-Premises
Hive Metastore
NRT Streaming Service
Data
Producers
DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Batch Model
Execution
Prediction + Backtesting
Training Set
Validation Set
Algorithm TrainingModel Config
ML Model Store
Data Exploration + Pipelines Setup
DATA SCIENCE PUSHES THE ENVELOPE
FEATURES PIPELINE
Training sets
Validation sets
Parameters
Configuration
Online
Offline
Development
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Orchestrator
NRT Streaming Service
Data
Producers
DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Batch Model
Execution
Prediction + Backtesting
Training Set
Validation Set
Algorithm TrainingModel Config
ML Model Store
Data Exploration + Pipelines Setup
DATA SCIENCE PUSHES THE ENVELOPE
BATCH EXECUTION
Prediction backtesting
Online
Offline
Development
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Orchestrator
NRT Streaming Service
Data
Producers
DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Batch Model
Execution
Prediction + Backtesting
Training Set
Validation Set
Algorithm TrainingModel Config
ML Model Store
Performance
Set
Data Exploration + Pipelines Setup
DATA SCIENCE PUSHES THE ENVELOPE
MODEL PERFORMANCE
Performance evaluation
Observability
Model Performance / Monitoring
50k
23k
Online
Offline
Development
EPS API
book
Partner(s)
Service
Orchestrator
Data Lake
Hive Metastore
On-Premises
Hive Metastore
NRT Streaming Service
Data
Producers
DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Batch Model
Execution
Prediction + Backtesting
Training Set
Validation Set
Algorithm TrainingModel Config
ML Model Store
Features Store
ML Service
Data Exploration + Pipelines Setup
CI/CD
Performance
Set
Model Performance / Monitoring
50k
23k
DATA SCIENCE PUSHES THE ENVELOPE
ONLINE SERVICE
CI/CD
Online features store
Model serialisason
Model serving
{ Custom, MLeap, Tensorflow, PMML }
Model Performance / Monitoring
50k
23k
Online
Offline
Development
NRT Streaming Service
Data
Producers
Features Store
ML Service
CI/CD
Orchestrator
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Data Exploration + Pipelines Setup DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Batch Model
Execution
Prediction + Backtesting
Training Set
Validation Set
Algorithm TrainingModel Config
ML Model Store
Performance
Set
Model Performance / Monitoring
50k
23k
EPS API
book
Partner(s)
Service
Online
Offline
Development
NRT Streaming Service
Data
Producers
Features Store
ML Service
CI/CD
EPS API
book
Partner(s)
Service
Orchestrator
Data Lake
Hive Metastore
On-Premises
Hive Metastore
Data Exploration + Pipelines Setup DS Development
Execute
> SQL oiu aosiud
oa
dasdaosiud
oas
asodiuaosid
Batch Model
Execution
Prediction + Backtesting
Training Set
Validation Set
Algorithm TrainingModel Config
ML Model Store
Performance
Set
Model Performance / Monitoring
50k
23k
“It Takes a Village … ”
IT TAKES A VILLAGE ...
C R O S S
F U N C T I O N A L
T E A M S
$
P R O M O T E
B E S T
E N G I N E E R I N G
P R A C T I C E S
C R I T I C A L
E X E C U T I O N
P A T H
M E A S U R E
O P E R A T I O N A L
C O S T S
S O L I D
P L A T F O R M
T O B U I L D
O N T O P
#lifeatexpedia

Mais conteúdo relacionado

Mais procurados

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...confluent
 
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsMongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsLisa Roth, PMP
 
Real time analytics in Azure IoT
Real time analytics in Azure IoT Real time analytics in Azure IoT
Real time analytics in Azure IoT Sam Vanhoutte
 
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...HostedbyConfluent
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowPat Patterson
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowKai Wähner
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformconfluent
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Kai Wähner
 
KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較Yoshiyasu SAEKI
 
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...HostedbyConfluent
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentHostedbyConfluent
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...KafkaZone
 

Mais procurados (20)

0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsMongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
 
Real time analytics in Azure IoT
Real time analytics in Azure IoT Real time analytics in Azure IoT
Real time analytics in Azure IoT
 
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
Kafka in Context, Cloud, & Community (Simon Elliston Ball, Cloudera) Kafka Su...
 
Data Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, HowData Integration with Apache Kafka: What, Why, How
Data Integration with Apache Kafka: What, Why, How
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
Deep Learning at Extreme Scale (in the Cloud) 
with the Apache Kafka Open Sou...
 
KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較
 
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
Nordstrom's Event-Sourced Architecture and Kafka-as-a-Service | Adam Weyant a...
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...
 

Semelhante a Blueprint Series: Expedia Partner Solutions, Data Platform

Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BIKellyn Pot'Vin-Gorman
 
Deep Dive Data Management Gateway - SQLSaturday Edinburgh
Deep Dive Data Management Gateway - SQLSaturday EdinburghDeep Dive Data Management Gateway - SQLSaturday Edinburgh
Deep Dive Data Management Gateway - SQLSaturday EdinburghJean-Pierre Riehl
 
Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azureMohamed Tawfik
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWSAmazon Web Services
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWSAmazon Web Services
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Amazon Web Services
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayAjay Shriwastava
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 
[第43回 Machine Learning 15minutes! × 2] Azure AI Updates
[第43回 Machine Learning 15minutes! × 2] Azure AI Updates[第43回 Machine Learning 15minutes! × 2] Azure AI Updates
[第43回 Machine Learning 15minutes! × 2] Azure AI UpdatesNaoki (Neo) SATO
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...Amazon Web Services
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overviewgjuljo
 
Website Classification using Apache Spark
Website Classification using Apache SparkWebsite Classification using Apache Spark
Website Classification using Apache SparkAmith Nambiar
 
ORACLE FUSION - IBANK
ORACLE FUSION - IBANKORACLE FUSION - IBANK
ORACLE FUSION - IBANKibankuk
 

Semelhante a Blueprint Series: Expedia Partner Solutions, Data Platform (20)

Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Deep Dive Data Management Gateway - SQLSaturday Edinburgh
Deep Dive Data Management Gateway - SQLSaturday EdinburghDeep Dive Data Management Gateway - SQLSaturday Edinburgh
Deep Dive Data Management Gateway - SQLSaturday Edinburgh
 
Rest Fundamentals
Rest FundamentalsRest Fundamentals
Rest Fundamentals
 
Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azure
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
Lumberjacking on AWS: Cutting Through Logs to Find What Matters (ARC306) | AW...
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
[第43回 Machine Learning 15minutes! × 2] Azure AI Updates
[第43回 Machine Learning 15minutes! × 2] Azure AI Updates[第43回 Machine Learning 15minutes! × 2] Azure AI Updates
[第43回 Machine Learning 15minutes! × 2] Azure AI Updates
 
It takes a village (to raise a ML model)
It takes a village (to raise a ML model)It takes a village (to raise a ML model)
It takes a village (to raise a ML model)
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overview
 
Website Classification using Apache Spark
Website Classification using Apache SparkWebsite Classification using Apache Spark
Website Classification using Apache Spark
 
ORACLE FUSION - IBANK
ORACLE FUSION - IBANKORACLE FUSION - IBANK
ORACLE FUSION - IBANK
 

Mais de Matt Stubbs

Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesBlueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesMatt Stubbs
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Matt Stubbs
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.Matt Stubbs
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEBig Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEMatt Stubbs
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLBig Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLMatt Stubbs
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSMatt Stubbs
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
 
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPRBig Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPRMatt Stubbs
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Matt Stubbs
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Matt Stubbs
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Matt Stubbs
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Matt Stubbs
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSBig Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSMatt Stubbs
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEBig Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEMatt Stubbs
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGMatt Stubbs
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Matt Stubbs
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Matt Stubbs
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEBig Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEMatt Stubbs
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs
 
Big Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALE
Big Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALEBig Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALE
Big Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALEMatt Stubbs
 

Mais de Matt Stubbs (20)

Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesBlueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEBig Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLBig Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPRBig Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPR
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSBig Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEBig Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEBig Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
 
Big Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALE
Big Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALEBig Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALE
Big Data LDN 2018: FIGHTING DATA CHAOS: CONNECTING USERS TO DATA AT SCALE
 

Último

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 

Último (20)

RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 

Blueprint Series: Expedia Partner Solutions, Data Platform

  • 1. ANSELMO SILVA Building a Solid Foundation Expedia Partner Solutions Data Platform
  • 3. 600M+ monthly visits in 75+ countries 20k employees worldwide 90B usd yearly travel sales 10k Affiliates
  • 4. Expedia Group is all about travel; Our secret is that we are also all about technology and data science.
  • 5. 600M+ API hotel searches a day 1000+ EPS powered partners 30%+ YOY growth 10TB Data processed every day
  • 6. A FEW OF OUR PAR TNERS
  • 7. Travel Industry challenges & opportunities Augmented Reality and VR
  • 8. Market growth and consolidation source: skift.com TRAVEL INDUSTRY CHALLENGES & OPPOR TUNITIES
  • 9. Voice search and personalisation TRAVEL INDUSTRY CHALLENGES & OPPOR TUNITIES
  • 10. AI - ML AT EXPEDIA PAR TNER SOLUTIONS Sorting
  • 11. AI - ML AT EXPEDIA PAR TNER SOLUTIONS Image Classification
  • 12. AI - ML AT EXPEDIA PAR TNER SOLUTIONS Forecast & Anomaly Detection
  • 13. AI - ML AT EXPEDIA PAR TNER SOLUTIONS Voice & Bots
  • 14. AI - ML AT EXPEDIA PAR TNER SOLUTIONS Recomendations & Cross-Sell
  • 15. Data Challenges DATA-SCIENCE Heterogenity (Partners and Supply) Supply size (> 500K Properties) Partners size (> 1K Partners) Content size (> 1M Images) Data size (10TB per day, PBs data lake)
  • 17. Data Lake Hive Metastore On-Premises CLOUD MIGRATION Follow data producers path Improve security, scalabity and resilience Promote technology innovation Separate computing from storage Hive Metastore Solid Foundation
  • 18. Data Lake Hive Metastore On-Premises Hive Metastore DATA REPLICATION { CIRCUS-TRAIN } Replicates Hive tables between clusters on request. It replicates both the table's data and metadata. It has a light touch, requiring no direct integration with Hive's core services. It can copy either entire unpartitioned tables or user defined sets of partitions on partitioned tables. it is not event driven and does not know how tables differ between sites. SOLID FOUNDATION https://github.com/hotelsdotcom/circus-train
  • 19. SOLID FOUNDATION circus_train.yml source-catalog: name: on_prem_dw disable-snapshots: true hive-metastore-uris: ${on-prem-dw-foo-params.source-thrift-uris} replica-catalog: name: usw2_foo hive-metastore-uris: ${usw2-foo-params.replica-thrift-uris} copier-options: tmp-dir: hdfs:///tmp/circus-train/ region: us-west-2 table-replications: -copier-options: task-count: ${usw2-foo-params.task-count} source-table: database-name: ${usw2-foo-params.database-name} table-name: ${usw2-foo-params.table-name} partition-filter: ${usw2-foo-params.partition-filter} replica-table: database-name: ${usw2-foo-params.target-database-name} table-location: s3://${usw2-foo-params.s3-bucket-name} … Data Lake Hive Metastore On-Premises Hive Metastore
  • 20. Data Lake Hive Metastore Data Lake #2 Hive Metastore SOLID FOUNDATION Data Lake #3 Hive Metastore DATA FEDERATION { WAGGLE-DANCE } Waggle Dance is a request routing Hive metastore proxy that allows tables to be concurrently accessed across multiple Hive deployments. It was created to tackle the appearance of dataset silos that arose as our large organization gradually migrated from monolithic on-premises clusters to cloud based platforms. https://github.com/hotelsdotcom/waggle-dance
  • 21. Data Lake Hive Metastore SOLID FOUNDATION Data Lake #3 Hive Metastore waggle_dance_federation.yml primary-meta-store: access-control-type: READ_AND_WRITE_ON_DATABASE_WHITELIST name: primary remote-meta-store-uris: ${ON_PREM_HIVE_METASTORE_URI} writable-database-white-list: - foo_user_.* federated-meta-stores: - name: zed-bar-prod access-control-type: READ_ONLY remote-meta-store-uris: ${USW2_6623552_PROD_HIVE_METASTORE_URI} mapped-databases: - foo_transaction - bar_stream - zed_common - opp_charles
  • 22. Data Lake Hive Metastore SOLID FOUNDATION On-Premises Hive Metastore DATA QUALITY FRAMEWORK Manage core data-assets like anyother product, promoting instrumentation, observability and alerting. “First to Know” culture and process, measuring how data-assets are accessible, fresh, complete, accurate, enriched, integrated. #BKG-MART #USR-TABLE #CLK—STREAM
  • 23. Easy to produce data Online Offline Development Data Lake Hive Metastore On-Premises Hive Metastore NRT Streaming Service Data Producers
  • 24. Online Offline Development Data Lake Hive Metastore On-Premises Hive Metastore NRT Streaming Service Data Producers NRT SERVICE One way to produce data Scalability - perf/efficiency (Kafka) Simplifiied schema management Support on all environments Strive for a full hands-off service EASY TO PRODUCE DATA
  • 25. Online Offline Development Data Lake Hive Metastore On-Premises Hive Metastore NRT Streaming Service Data Producers PRODUCER CONTRACT Own the data schema Own produced data (e2e) Stream events in realtime Obfuscate sensitive information Document and update data assets Monitor data in production EASY TO PRODUCE DATA
  • 26. Easy to consume data Online Offline Development Data Lake Hive Metastore On-Premises Hive Metastore Orchestrator NRT Streaming Service Data Producers Data Exploration + Pipelines Setup DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid
  • 27. Online Offline Development Data Lake Hive Metastore On-Premises Hive Metastore Orchestrator NRT Streaming Service Data Producers CONSUMER CONTRACT Consume documented data-assets Use approved access layers/libs Report back any data quality issue Anotate outputs with data-sources Follow data governance guidelines Adopt schema changes *Do not duplicate data-assets* EASY TO CONSUME DATA
  • 28. Online Offline Development Data Lake Hive Metastore On-Premises Hive Metastore Orchestrator NRT Streaming Service Data Producers QUERY ENGINES + TOOLS Hive, Presto, Spark EMR (data processing) Databricks (data science) Qubole (query, insights) Athena (operational support) EASY TO CONSUME DATA Data Exploration + Pipelines Setup DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid
  • 29. Online Offline Development Data Lake Hive MetastoreOrchestrator On-Premises Hive Metastore NRT Streaming Service Data Producers ANALYTICS API (METRICS/DIMS STORE) Programatical access to analytical data with granular ACL on data-sets, columns, rows. Metadata, search, breakdown, filter, timeseries, comparison, forecast on key data-sets (sub-second response time). EASY TO CONSUME DATA ANALYTICS API curl -o analytics.eps/bookings? dateField=created_day&date_range=2018-03-01,2018-05-01| 2018-01-01,2018-03-01&groupby=partner [top=10,by=foo]&fields=foo,zed,bar&interval=hour
  • 30. Data Science pushes the envelope Online Offline Development Data Lake Orchestrator Hive Metastore On-Premises Hive Metastore NRT Streaming Service Data Producers DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid Batch Model Execution Prediction + Backtesting Training Set Validation Set Algorithm TrainingModel Config ML Model Store Data Exploration + Pipelines Setup
  • 31. Online Offline Development Orchestrator Data Lake Hive Metastore On-Premises Hive Metastore Data Exploration + Pipelines Setup DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid Batch Model Execution Prediction + Backtesting Training Set Validation Set Algorithm TrainingModel Config ML Model Store DS DEVELOPMENT CYCLE Models Tuning Algorithm Training ML Model storage DATA SCIENCE PUSHES THE ENVELOPE
  • 32. Online Offline Development Orchestrator Data Lake Hive Metastore On-Premises Hive Metastore NRT Streaming Service Data Producers DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid Batch Model Execution Prediction + Backtesting Training Set Validation Set Algorithm TrainingModel Config ML Model Store Data Exploration + Pipelines Setup DATA SCIENCE PUSHES THE ENVELOPE FEATURES PIPELINE Training sets Validation sets Parameters Configuration
  • 33. Online Offline Development Data Lake Hive Metastore On-Premises Hive Metastore Orchestrator NRT Streaming Service Data Producers DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid Batch Model Execution Prediction + Backtesting Training Set Validation Set Algorithm TrainingModel Config ML Model Store Data Exploration + Pipelines Setup DATA SCIENCE PUSHES THE ENVELOPE BATCH EXECUTION Prediction backtesting
  • 34. Online Offline Development Data Lake Hive Metastore On-Premises Hive Metastore Orchestrator NRT Streaming Service Data Producers DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid Batch Model Execution Prediction + Backtesting Training Set Validation Set Algorithm TrainingModel Config ML Model Store Performance Set Data Exploration + Pipelines Setup DATA SCIENCE PUSHES THE ENVELOPE MODEL PERFORMANCE Performance evaluation Observability Model Performance / Monitoring 50k 23k
  • 35. Online Offline Development EPS API book Partner(s) Service Orchestrator Data Lake Hive Metastore On-Premises Hive Metastore NRT Streaming Service Data Producers DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid Batch Model Execution Prediction + Backtesting Training Set Validation Set Algorithm TrainingModel Config ML Model Store Features Store ML Service Data Exploration + Pipelines Setup CI/CD Performance Set Model Performance / Monitoring 50k 23k DATA SCIENCE PUSHES THE ENVELOPE ONLINE SERVICE CI/CD Online features store Model serialisason Model serving { Custom, MLeap, Tensorflow, PMML } Model Performance / Monitoring 50k 23k
  • 36. Online Offline Development NRT Streaming Service Data Producers Features Store ML Service CI/CD Orchestrator Data Lake Hive Metastore On-Premises Hive Metastore Data Exploration + Pipelines Setup DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid Batch Model Execution Prediction + Backtesting Training Set Validation Set Algorithm TrainingModel Config ML Model Store Performance Set Model Performance / Monitoring 50k 23k EPS API book Partner(s) Service
  • 37. Online Offline Development NRT Streaming Service Data Producers Features Store ML Service CI/CD EPS API book Partner(s) Service Orchestrator Data Lake Hive Metastore On-Premises Hive Metastore Data Exploration + Pipelines Setup DS Development Execute > SQL oiu aosiud oa dasdaosiud oas asodiuaosid Batch Model Execution Prediction + Backtesting Training Set Validation Set Algorithm TrainingModel Config ML Model Store Performance Set Model Performance / Monitoring 50k 23k “It Takes a Village … ”
  • 38. IT TAKES A VILLAGE ... C R O S S F U N C T I O N A L T E A M S $ P R O M O T E B E S T E N G I N E E R I N G P R A C T I C E S C R I T I C A L E X E C U T I O N P A T H M E A S U R E O P E R A T I O N A L C O S T S S O L I D P L A T F O R M T O B U I L D O N T O P