SlideShare uma empresa Scribd logo
1 de 29
Gimel and PayPal Notebooks
Agenda Introduction
PayPal Scale
Data Scientist Challenges
Gimel – Data Access Simplified
PayPal Notebooks – Analytics For Everyone
Open Source
Q&A
©2018 PayPal Inc. Confidential and proprietary.
Romit Mehta Product manager, data processing products at
PayPal
20 years in data and analytics across
networking, semi-conductors, telecom, security
and fintech industries
Data warehouse developer, BI program
manager, Data product manager
romehta@paypal.com
https://www.linkedin.com/in/romit-mehta
©2018 PayPal Inc. Confidential and proprietary.
PayPal Context
©2018 PayPal Inc. Confidential and proprietary.
From: PayPal’s Q3 2018 Investor
Update
PayPal Customers, Transactions and Growth
PayPal Data & Analytics Ecosystem
Volume, Velocity, Variety
Over 250PB of
data
Operate across many zones and regions
Compute choice
5,000+ analytics
users
One of the largest deployments
of Oracle, Aerospike, Teradata
and Hortonworks
250,000+ batch jobs a
day
8,000+ replications
Polyglot Datastores
Customer challenges
Dataset Challenges
Data access tied to
compute and data
store versions
Hard to find
available
data sets
Storage-specific
dataset creation
results in duplication
and increased
latency
No audit trail
for dataset
access
No standards for on-
boarding data sets
for others to discover
No statistics on
data set usage
and access
trends
Datasets
Application development challenges
Learn Code Optimize Build Deploy RunOnboarding Big Data Apps
Learn Code Optimize Build Deploy RunCompute Engine Changed
Learn Code Optimize Build Deploy RunCompute Version Upgraded
Learn Code Optimize Build Deploy RunStorage API Changed
Learn Code Optimize Build Deploy RunStorage Connector Upgraded
Learn Code Optimize Build Deploy RunStorage Hosts Migrated
Learn Code Optimize Build Deploy RunStorage Changed
Learn Code Optimize Build Deploy Run*********************
Analytics lifecycle challenges
Reduce Time to Market for our
end customers by reducing data
latency, simplifying access,
increase discoverability of data
sets and streamlining
development.
Objective
Data Latency Data Access Development
Before
Now
Latency: Hours to days
Onboarding: Weeks
Latency: Near real-time
Onboarding: Minutes
Discoverability: Minimal
Data Access: Fragmented
Discoverability: 100% data sets
cataloged instantly
Data Access: Unified API and SQL
Access to all data
CLI based interactive access
Edge-nodes based development
Access to near real-time data
REST-based job servers
Days → Sec
Consumption
xDiscovery
Metadata
Services
Unified Data
Catalog
Gimel
PayPal
Notebooks
Gimel
SDK
BI Tools
SQL
Clients
Control
Plane
Streaming
Services
Consumer Custom
Router
CDHSources
Data Access Processing
Gimel
Data processing ecosystem
Elastic compute: Intelligent compute
including dynamic environments
Hybrid dataset: Intelligent data
persistence
In-memory store and cache: Reduce
connection flood on underlying systems
Unified Data Catalog: Find datasets
across data stores
Cross-cluster Data API: Eliminate ad-
hoc data movement
Self-service notebooks deployment:
DevOps for analysts
Developer Data scientist Analyst Operator
Gimel SDK Notebooks
UDC Data API
Infrastructure services leveraged for elasticity and redundancy
Multi-DC Public cloudPredictive resource allocation
Logging
Monitoring
Alerting
Security
Application
Lifecycle
Management
Compute
Frameworkand
APIs
GimelData
Platform
User
Experience
andAccess
R Studio BI tools
We’re not in SQL-land anymore
Spark Read From Hbase
More stores? More complexity!
Spark Read From Hbase Spark Read From Elastic Search
Spark Read From AeroSpike Spark Read From Druid
Gimel Data API
Spark Read From Hbase Spark Read From Elastic Search
Spark Read From AeroSpike Spark Read From Druid
With Data API
✔
SQL everywhere
Spark Read From Hbase Spark Read From Elastic Search
Spark Read From AeroSpike Spark Read From Druid
With Data API
✔
New data development lifecycle
Learn Code Optimize Build Deploy RunOnboarding Big Data Apps
RunCompute Engine Changed
Compute Version Upgraded
Storage API Changed
Storage Connector Upgraded
Storage Hosts Migrated
Storage Changed
*********************
Run
Run
Run
Run
Run
Run
Gimel – a powerful enabler
 Single unified data API to access any data store
 SQL capabilities against any data store
 Switch between interactive, batch and streaming modes
 Centralized metadata catalog (Unified Data Catalog) to abstract the
physical complexities of accessing data
 Open sourced: gimel.io and UnifiedDataCatalog.io
 Integrated with Jupyter notebooks through GSQL
 Dataset browser in notebooks powered by Unified Data Catalog
PayPal Notebooks
From Jupyter to PayPal Notebooks
Jupyter
deployed
PayPal Notebooks
Beta
PayPal Notebooks
Generally Available
PayPal Notebooks Today
Q3 2016 Q3 2017
~50 users
Feb 2018
~100 users
~1,500 users
SQL, Spark/PySpark,
Python, R
2016
Zeppelin
Individual use
Notebooks deployed as a platform
Highly available
JupyterHub
GPU integration
Standalone
Docker
• Enable deep learning through
notebooks
• Distributed TensorFlow training enabled
with dynamic GPU resource
management
• Container image with all
PPExtensions
• Required to deploy across various
security zones at PayPal
• Foundation for open sourcing
PPExtensions
• Grid of JupyterHub hosts
• Highly available and distributed
SSO + 2FA
integrationKerberos + LDAP integration
PPExtensions: PPMagics
• Query data from Hive (or Teradata)
• Insert data using csv/dataframes
• Publish to Tableau
%hive, %teradata
• Run any notebook from another notebook
• Run multiple notebooks in parallel
• Execute a pipeline of notebooks
%run, %run_pipeline
• Run SQL on csv files%csv
• Query data from Presto
• Publish to Tableau%presto
• Query data from Spark Thrift Server
• Includes progress bar for SQL execution%sts
Collaboration, Publishing, Deployment
Github sharing Project collaboration & ML
Tableau publishing Deployment/scheduling
• Push notebook to
common org-wide repo
• View full fidelity notebook
on Github
• Share link to notebook
instead of .ipynb file or
code snippets
• Seamlessly publish to
Tableau
• Download TDE to use
Tableau Desktop, or directly
publish as a data source
• Share notebook to personal
and team repos
• Resolve conflicts between
remote and local notebooks
with nbdime
• Integrated Tensorflow for
distributed model training
• Enabled Tensorflow with
GPU
• Integrate with Airflow
• Set up frequency, alerts,
optionally push to Github
after every run
• Add Celery executor for
scalability
Simplify & Empower
Data is truly democratized
• The next-generation development experience
• Rich, interactive data exploration and analysis
• Support for over 40 languages including SQL,
Python, Scala and R
• Big data and machine learning support built-in
• Built for PayPal: Integrated with SSO+2FA,
Kerberos, GitHub, Secure File Transfer, Tableau,
Gimel
PayPal Notebooks
• Single unified data API to access any data store
• SQL capabilities against any data store
• Switch between interactive, batch and streaming
modes
• Centralized metadata catalog to abstract the
physical complexities of accessing data
• Can be used in standalone or cluster mode
Gimel
More information:
ppextensions.io
gimel.io
UnifiedDataCatalog.io
Open Source
Gimel & PPExtensions Open Sourced
Open Source
Home
ppextensions.io
gimel.io
Google Groups
groups.google.com/d/forum/ppextensions
groups.google.com/d/forum/gimel-dev
Slack
ppextensions.slack.com [Invite link]
gimel-dev.slack.com [Invite link]
Install
pip install ppextensions
try.gimel.io
Questions?

Mais conteúdo relacionado

Mais procurados

Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 
Webinar: SnapLogic Winter 2015
Webinar: SnapLogic Winter 2015Webinar: SnapLogic Winter 2015
Webinar: SnapLogic Winter 2015SnapLogic
 
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...SnapLogic
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...DataWorks Summit
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source DatabaseAll Things Open
 
Big Data Ingestion Using Hadoop - Capstone Presentation
Big Data Ingestion Using Hadoop - Capstone PresentationBig Data Ingestion Using Hadoop - Capstone Presentation
Big Data Ingestion Using Hadoop - Capstone PresentationSamkannan
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jHow to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jGraphRM
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?Jeraldine Phneah
 
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...SnapLogic
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jDeepak Chandramouli
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0SnapLogic
 
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Flink Forward
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demoDatabricks
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...HostedbyConfluent
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsSnapLogic
 
VP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraVP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraBig Data Spain
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationDataWorks Summit
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarBig Data Spain
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Databricks
 

Mais procurados (20)

Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
Webinar: SnapLogic Winter 2015
Webinar: SnapLogic Winter 2015Webinar: SnapLogic Winter 2015
Webinar: SnapLogic Winter 2015
 
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
Strata + Hadoop World: Jump Into the Data Lake with Hadoop-Scale Data Integra...
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source Database
 
Big Data Ingestion Using Hadoop - Capstone Presentation
Big Data Ingestion Using Hadoop - Capstone PresentationBig Data Ingestion Using Hadoop - Capstone Presentation
Big Data Ingestion Using Hadoop - Capstone Presentation
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
How to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4jHow to leverage Kafka data streams with Neo4j
How to leverage Kafka data streams with Neo4j
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
 
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4jUnified Data Catalog - Recommendations powered by Apache Spark & Neo4j
Unified Data Catalog - Recommendations powered by Apache Spark & Neo4j
 
Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0Hadoop for Humans: Introducing SnapReduce 2.0
Hadoop for Humans: Introducing SnapReduce 2.0
 
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
Build and Run Streaming Applications with Apache Flink and Amazon Kinesis Dat...
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
Webinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud AnalyticsWebinar: BI in the Sky - The New Rules of Cloud Analytics
Webinar: BI in the Sky - The New Rules of Cloud Analytics
 
VP of WW Partners by Alan Chhabra
VP of WW Partners by Alan ChhabraVP of WW Partners by Alan Chhabra
VP of WW Partners by Alan Chhabra
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
 
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020
 

Semelhante a Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando

InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityElasticsearch
 
Les logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiéeLes logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiéeElasticsearch
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceSalesforce Developers
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHostedbyConfluent
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL ToolLyftron Data
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityElasticsearch
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youModusOptimum
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 

Semelhante a Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando (20)

InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Les logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiéeLes logs, traces et indicateurs au service d'une observabilité unifiée
Les logs, traces et indicateurs au service d'une observabilité unifiée
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, GoogleHybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
Hybrid Streaming Analytics for Apache Kafka Users | Firat Tekiner, Google
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL Tool
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Still on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for youStill on IBM BigInsights? We have the right path for you
Still on IBM BigInsights? We have the right path for you
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 

Último

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Último (20)

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando

  • 1. Gimel and PayPal Notebooks
  • 2. Agenda Introduction PayPal Scale Data Scientist Challenges Gimel – Data Access Simplified PayPal Notebooks – Analytics For Everyone Open Source Q&A ©2018 PayPal Inc. Confidential and proprietary.
  • 3. Romit Mehta Product manager, data processing products at PayPal 20 years in data and analytics across networking, semi-conductors, telecom, security and fintech industries Data warehouse developer, BI program manager, Data product manager romehta@paypal.com https://www.linkedin.com/in/romit-mehta ©2018 PayPal Inc. Confidential and proprietary.
  • 5. ©2018 PayPal Inc. Confidential and proprietary. From: PayPal’s Q3 2018 Investor Update PayPal Customers, Transactions and Growth
  • 6. PayPal Data & Analytics Ecosystem
  • 7. Volume, Velocity, Variety Over 250PB of data Operate across many zones and regions Compute choice 5,000+ analytics users One of the largest deployments of Oracle, Aerospike, Teradata and Hortonworks 250,000+ batch jobs a day 8,000+ replications Polyglot Datastores
  • 9. Dataset Challenges Data access tied to compute and data store versions Hard to find available data sets Storage-specific dataset creation results in duplication and increased latency No audit trail for dataset access No standards for on- boarding data sets for others to discover No statistics on data set usage and access trends Datasets
  • 10. Application development challenges Learn Code Optimize Build Deploy RunOnboarding Big Data Apps Learn Code Optimize Build Deploy RunCompute Engine Changed Learn Code Optimize Build Deploy RunCompute Version Upgraded Learn Code Optimize Build Deploy RunStorage API Changed Learn Code Optimize Build Deploy RunStorage Connector Upgraded Learn Code Optimize Build Deploy RunStorage Hosts Migrated Learn Code Optimize Build Deploy RunStorage Changed Learn Code Optimize Build Deploy Run*********************
  • 11. Analytics lifecycle challenges Reduce Time to Market for our end customers by reducing data latency, simplifying access, increase discoverability of data sets and streamlining development. Objective Data Latency Data Access Development Before Now Latency: Hours to days Onboarding: Weeks Latency: Near real-time Onboarding: Minutes Discoverability: Minimal Data Access: Fragmented Discoverability: 100% data sets cataloged instantly Data Access: Unified API and SQL Access to all data CLI based interactive access Edge-nodes based development Access to near real-time data REST-based job servers Days → Sec Consumption xDiscovery Metadata Services Unified Data Catalog Gimel PayPal Notebooks Gimel SDK BI Tools SQL Clients Control Plane Streaming Services Consumer Custom Router CDHSources Data Access Processing
  • 12. Gimel
  • 13. Data processing ecosystem Elastic compute: Intelligent compute including dynamic environments Hybrid dataset: Intelligent data persistence In-memory store and cache: Reduce connection flood on underlying systems Unified Data Catalog: Find datasets across data stores Cross-cluster Data API: Eliminate ad- hoc data movement Self-service notebooks deployment: DevOps for analysts Developer Data scientist Analyst Operator Gimel SDK Notebooks UDC Data API Infrastructure services leveraged for elasticity and redundancy Multi-DC Public cloudPredictive resource allocation Logging Monitoring Alerting Security Application Lifecycle Management Compute Frameworkand APIs GimelData Platform User Experience andAccess R Studio BI tools
  • 14. We’re not in SQL-land anymore Spark Read From Hbase
  • 15. More stores? More complexity! Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid
  • 16. Gimel Data API Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid With Data API ✔
  • 17. SQL everywhere Spark Read From Hbase Spark Read From Elastic Search Spark Read From AeroSpike Spark Read From Druid With Data API ✔
  • 18. New data development lifecycle Learn Code Optimize Build Deploy RunOnboarding Big Data Apps RunCompute Engine Changed Compute Version Upgraded Storage API Changed Storage Connector Upgraded Storage Hosts Migrated Storage Changed ********************* Run Run Run Run Run Run
  • 19. Gimel – a powerful enabler  Single unified data API to access any data store  SQL capabilities against any data store  Switch between interactive, batch and streaming modes  Centralized metadata catalog (Unified Data Catalog) to abstract the physical complexities of accessing data  Open sourced: gimel.io and UnifiedDataCatalog.io  Integrated with Jupyter notebooks through GSQL  Dataset browser in notebooks powered by Unified Data Catalog
  • 21. From Jupyter to PayPal Notebooks Jupyter deployed PayPal Notebooks Beta PayPal Notebooks Generally Available PayPal Notebooks Today Q3 2016 Q3 2017 ~50 users Feb 2018 ~100 users ~1,500 users SQL, Spark/PySpark, Python, R 2016 Zeppelin Individual use
  • 22. Notebooks deployed as a platform Highly available JupyterHub GPU integration Standalone Docker • Enable deep learning through notebooks • Distributed TensorFlow training enabled with dynamic GPU resource management • Container image with all PPExtensions • Required to deploy across various security zones at PayPal • Foundation for open sourcing PPExtensions • Grid of JupyterHub hosts • Highly available and distributed SSO + 2FA integrationKerberos + LDAP integration
  • 23. PPExtensions: PPMagics • Query data from Hive (or Teradata) • Insert data using csv/dataframes • Publish to Tableau %hive, %teradata • Run any notebook from another notebook • Run multiple notebooks in parallel • Execute a pipeline of notebooks %run, %run_pipeline • Run SQL on csv files%csv • Query data from Presto • Publish to Tableau%presto • Query data from Spark Thrift Server • Includes progress bar for SQL execution%sts
  • 24. Collaboration, Publishing, Deployment Github sharing Project collaboration & ML Tableau publishing Deployment/scheduling • Push notebook to common org-wide repo • View full fidelity notebook on Github • Share link to notebook instead of .ipynb file or code snippets • Seamlessly publish to Tableau • Download TDE to use Tableau Desktop, or directly publish as a data source • Share notebook to personal and team repos • Resolve conflicts between remote and local notebooks with nbdime • Integrated Tensorflow for distributed model training • Enabled Tensorflow with GPU • Integrate with Airflow • Set up frequency, alerts, optionally push to Github after every run • Add Celery executor for scalability
  • 26. Data is truly democratized • The next-generation development experience • Rich, interactive data exploration and analysis • Support for over 40 languages including SQL, Python, Scala and R • Big data and machine learning support built-in • Built for PayPal: Integrated with SSO+2FA, Kerberos, GitHub, Secure File Transfer, Tableau, Gimel PayPal Notebooks • Single unified data API to access any data store • SQL capabilities against any data store • Switch between interactive, batch and streaming modes • Centralized metadata catalog to abstract the physical complexities of accessing data • Can be used in standalone or cluster mode Gimel More information: ppextensions.io gimel.io UnifiedDataCatalog.io
  • 28. Gimel & PPExtensions Open Sourced Open Source Home ppextensions.io gimel.io Google Groups groups.google.com/d/forum/ppextensions groups.google.com/d/forum/gimel-dev Slack ppextensions.slack.com [Invite link] gimel-dev.slack.com [Invite link] Install pip install ppextensions try.gimel.io