SlideShare uma empresa Scribd logo
1 de 29
Graph of Enterprise Metadata–
Powered by Neo4J & Spark
Vladimir Bacvanski
Deepak Chandramouli
https://neo4j.com/nodes-2020/
AGENDA
 Prelude
 Introduction
 Architecture & Flow
 Scaling for enterprise
 Future
 Questions
About us
Vladimir Bacvanski
vbacvanski@paypal.com
Twitter: @OnSoftware
• Principal Architect, Strategic Architecture at PayPal
• In previous life: CTO of a development and
consulting firm
• PhD in Computer Science from RWTH Aachen,
Germany
• O’Reilly author: Courses on Big Data, Kafka
Deepak Chandramouli
dmohanakumarchan@paypal.com
LinkedIn: @deepakmc
• MT2 Software Engineer, Data Platform Services at
PayPal
• Data Enthusiast
• Tech lead
• Gimel (Big Data Framework for Apache Spark)
• Unified Data Catalog – PayPal’s Enterprise Data
Catalog
Prelude
2019 | Recommendation System – Unified Data Catalog
Building Recommendations for a Data Catalog
A
B
S
T
R
A
C
T
I
O
N
L
A
Y
E
R
Yeah, let’s
expand on
this !
Graph of
metadata
has much
more to
Offer !!
Graph of Enterprise Metadata
Introduction
The Enterprise
Landscape
8
On-Premise
Adjacencies
Data Technologies
Cloud Services
Multi-DC / Hybrid Cloud
Org Structures CollaborationPeople
Identity & Access Security Zones Data Catalog
Security
Organization
Artifacts Apps & Processes
Logs
Streams
ETL Apps
Connecting the Dots…
Effective
Data
Governance
Privacy
Regulation
Complianc
e
……
The whole is
greater ! Efficiency.
Connected
Components
Security
Organizatio
n
Data
Business
MetadataTechnical
Metadata
Artifacts
Code
Operational
Metadata
Internal
Social
Metadata
Data
Classification
(PII/PCI/PHI)
Connected components = Graph of Enterprise Metadata
So What  Top Down View !
©2016 PayPal Inc. Confidential and proprietary.
Demo & Sandbox Code
Code Base | Playground
Github
https://github.com/Dee-Pac/GEM
Gitter
https://gitter.im/graph_of_enterprise_metadata/GEM-Ask-Us-
Anything
Graph Build
Architecture & Data Flow
Metadata Data Sources...
• Data Classification
• PII / PCI / PHI
• Operational Metadata
• Jobs & Apps
• Databases & App – Query Logs
• Org Structure
• People, Geo , Hierarchies
• Technical Data Catalog
• Data Stores
• Databases / Containers
• Tables / Objects
• Columns / Fields
• Business Metadata / Glossary / Annotation
• IAM
• Access controls
• Owners, Users
• Social Metadata (Internal)
• Slack Chats, Threads, Annotations
• Artifacts
• Code base – GitHub/SQL
• Wiki – Documents, Mentions
• JIRA – Issues, Mentions
High Level – Data Flow
Merge Graph
CRUD &
Insights
App Data Sources
Data Sourcing
Transform
Application
s
Access Simplification
Distributed Processing for scale
(Future) Near Real Time - CDC
Graph Access
API Push
neo4j + Apache Spark + Gimel + GraphQL
Multiple Sources Systems
• Abstracts Graph Complexity
• Supports - Multiple Front-end
clients
• Accelerates UI Integration
• Reference | neo4j & GraphQL
• Spring-Data-neo4j [Scale]
Unified Access PatternData Processing
• Distributed Processing -
SCALE
• Spark-ML + Graphx. -
COMPLEXITY
• GIMEL + Spark - Data
access simplified
• Kafka Streams – Near Real
Time CDC in the future
Connected Components
• Easy Bootstrap
• Cypher & APOCs
• Robust libraries - Spark,
GraphQL
• Rich Graph Algorithms
<<<<<< Data Processing @ Scale
>>>>>
© 2020 PayPal Inc. Confidential and proprietary.
Technical Metadata
>1000s of Data Stores
>100K Containers
>7 Million Datasets
>50 Million Fields
>10K PII/PCI Class elements
Operational Metadata
>100K Data Apps Per Day
>25 Data Tech Stacks
>API, Streams & Batch Apps
Artifacts & Social
Metadata,
Artifacts
>10K Scripts & App code
>1000s of Wiki, JIRA Annotations
Data Social
~ 12 Active Analytics Channel in Slack
~ Evolving Teams conversations for data
Security & Organization
>1000’s of IAM roles
>20K Internal Users & Role Owners
Metadata Growth
Gimel + spark + neo4j
©2016 PayPal Inc. Confidential and proprietary.
GIMEL Data API
NEO4J Connector
Gimel + spark + neo4j (Continued..)
Unified Data API
Unified Connector Config
Unified Data API
Gimel + spark + neo4j (Continued..)
Nodes &
Relationship
Exposing Graph to Clients
GraphQL Query
GraphQL Query - Response
type column {
column:String!
storage_system_name:String!
dataset:String!
}
type dataset {
dataset:String!
storage_system_name:String!
column: [column]
@relation(name:"DATASET_HAS_COLUMN",
direction:OUT)
}
type Enterprise {
name:String!
}
Define GraphQL Schema
Cypher -> GraphQL
Cypher
Conclusion & Next Steps
Key Take-aways
• The whole is greater than Sum !
• Data Awareness: Data can be anywhere across the ecosystem : but we should know its shape & form of existence.
• Graphs are everywhere: Full potential can be realized by connecting the dots - across board.
• Enriching metadata: It is a collective effort; silos rarely achieve effectiveness.
• Graph’s Effectiveness: expose the data in a consumable way for – all tech stacks alike !
What’s next?
• Data Classification
• PII / PCI / PHI
• Org Structure
• People, Geo , Hierarchies
• Technical Data Catalog
• Data Stores
• Databases / Containers
• Tables / Objects
• Columns / Fields
• Business Metadata / Glossary / Annotation
• IAM
• Access controls
• Owners, Users
• Artifacts
• Code base – GitHub/SQL
• Wiki – Documents, Mentions
• JIRA – Issues, Mentions
• Operational Metadata
• Jobs & Apps
• Databases & App – Query Logs
• Lineage
• Social Metadata (Internal)
• Slack Chats, Threads,
Annotations
Connecting Further…
Scale – Millions of Nodes X Edges
Access - Simplification
Graph – Data Security &
Multitenancy
Insights driven by –
Graph / ML Algorithms
Foundational Capabilities…
Questions?
Code base – GITHUB_FOR_GEM
Let’s collaborate - GITTER_FOR_GEM
Thank You!

Mais conteúdo relacionado

Mais procurados

Phar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the RealityPhar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Databricks
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
DataWorks Summit
 
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Databricks
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 

Mais procurados (20)

Building a Self-Service Big Data Pipeline
Building a Self-Service Big Data PipelineBuilding a Self-Service Big Data Pipeline
Building a Self-Service Big Data Pipeline
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Phar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the RealityPhar Data Platform: From the Lakehouse Paradigm to the Reality
Phar Data Platform: From the Lakehouse Paradigm to the Reality
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
Democratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druidDemocratizing data science Using spark, hive and druid
Democratizing data science Using spark, hive and druid
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
Journey to Creating a 360 View of the Customer: Implementing Big Data Strateg...
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Building Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks DeltaBuilding Sessionization Pipeline at Scale with Databricks Delta
Building Sessionization Pipeline at Scale with Databricks Delta
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Managing R&D Data on Parallel Compute Infrastructure
Managing R&D Data on Parallel Compute InfrastructureManaging R&D Data on Parallel Compute Infrastructure
Managing R&D Data on Parallel Compute Infrastructure
 
Spark - Migration Story
Spark - Migration Story Spark - Migration Story
Spark - Migration Story
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 

Semelhante a Nodes2020 | Graph of enterprise_metadata | NEO4J Conference

2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
Chester Chen
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
IRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New ContentIRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New Content
Martin Sykora
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
 

Semelhante a Nodes2020 | Graph of enterprise_metadata | NEO4J Conference (20)

Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Neo4j GraphDay Seattle- Sept19- in the enterprise
Neo4j GraphDay Seattle- Sept19-  in the enterpriseNeo4j GraphDay Seattle- Sept19-  in the enterprise
Neo4j GraphDay Seattle- Sept19- in the enterprise
 
What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3What's New in Upcoming Apache Spark 2.3
What's New in Upcoming Apache Spark 2.3
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3 2018 02-08-what's-new-in-apache-spark-2.3
2018 02-08-what's-new-in-apache-spark-2.3
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
IRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New ContentIRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New Content
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 

Último

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Último (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

Nodes2020 | Graph of enterprise_metadata | NEO4J Conference

  • 1. Graph of Enterprise Metadata– Powered by Neo4J & Spark Vladimir Bacvanski Deepak Chandramouli
  • 3. AGENDA  Prelude  Introduction  Architecture & Flow  Scaling for enterprise  Future  Questions
  • 4. About us Vladimir Bacvanski vbacvanski@paypal.com Twitter: @OnSoftware • Principal Architect, Strategic Architecture at PayPal • In previous life: CTO of a development and consulting firm • PhD in Computer Science from RWTH Aachen, Germany • O’Reilly author: Courses on Big Data, Kafka Deepak Chandramouli dmohanakumarchan@paypal.com LinkedIn: @deepakmc • MT2 Software Engineer, Data Platform Services at PayPal • Data Enthusiast • Tech lead • Gimel (Big Data Framework for Apache Spark) • Unified Data Catalog – PayPal’s Enterprise Data Catalog
  • 6. 2019 | Recommendation System – Unified Data Catalog Building Recommendations for a Data Catalog A B S T R A C T I O N L A Y E R Yeah, let’s expand on this ! Graph of metadata has much more to Offer !!
  • 7. Graph of Enterprise Metadata Introduction
  • 8. The Enterprise Landscape 8 On-Premise Adjacencies Data Technologies Cloud Services Multi-DC / Hybrid Cloud Org Structures CollaborationPeople Identity & Access Security Zones Data Catalog Security Organization Artifacts Apps & Processes Logs Streams ETL Apps
  • 9. Connecting the Dots… Effective Data Governance Privacy Regulation Complianc e …… The whole is greater ! Efficiency. Connected Components Security Organizatio n Data Business MetadataTechnical Metadata Artifacts Code Operational Metadata Internal Social Metadata Data Classification (PII/PCI/PHI)
  • 10. Connected components = Graph of Enterprise Metadata
  • 11. So What  Top Down View ! ©2016 PayPal Inc. Confidential and proprietary.
  • 13. Code Base | Playground Github https://github.com/Dee-Pac/GEM Gitter https://gitter.im/graph_of_enterprise_metadata/GEM-Ask-Us- Anything
  • 15. Metadata Data Sources... • Data Classification • PII / PCI / PHI • Operational Metadata • Jobs & Apps • Databases & App – Query Logs • Org Structure • People, Geo , Hierarchies • Technical Data Catalog • Data Stores • Databases / Containers • Tables / Objects • Columns / Fields • Business Metadata / Glossary / Annotation • IAM • Access controls • Owners, Users • Social Metadata (Internal) • Slack Chats, Threads, Annotations • Artifacts • Code base – GitHub/SQL • Wiki – Documents, Mentions • JIRA – Issues, Mentions
  • 16. High Level – Data Flow Merge Graph CRUD & Insights App Data Sources Data Sourcing Transform Application s Access Simplification Distributed Processing for scale (Future) Near Real Time - CDC Graph Access API Push
  • 17. neo4j + Apache Spark + Gimel + GraphQL Multiple Sources Systems • Abstracts Graph Complexity • Supports - Multiple Front-end clients • Accelerates UI Integration • Reference | neo4j & GraphQL • Spring-Data-neo4j [Scale] Unified Access PatternData Processing • Distributed Processing - SCALE • Spark-ML + Graphx. - COMPLEXITY • GIMEL + Spark - Data access simplified • Kafka Streams – Near Real Time CDC in the future Connected Components • Easy Bootstrap • Cypher & APOCs • Robust libraries - Spark, GraphQL • Rich Graph Algorithms
  • 18. <<<<<< Data Processing @ Scale >>>>>
  • 19. © 2020 PayPal Inc. Confidential and proprietary. Technical Metadata >1000s of Data Stores >100K Containers >7 Million Datasets >50 Million Fields >10K PII/PCI Class elements Operational Metadata >100K Data Apps Per Day >25 Data Tech Stacks >API, Streams & Batch Apps Artifacts & Social Metadata, Artifacts >10K Scripts & App code >1000s of Wiki, JIRA Annotations Data Social ~ 12 Active Analytics Channel in Slack ~ Evolving Teams conversations for data Security & Organization >1000’s of IAM roles >20K Internal Users & Role Owners Metadata Growth
  • 20. Gimel + spark + neo4j ©2016 PayPal Inc. Confidential and proprietary. GIMEL Data API NEO4J Connector
  • 21. Gimel + spark + neo4j (Continued..) Unified Data API Unified Connector Config
  • 22. Unified Data API Gimel + spark + neo4j (Continued..) Nodes & Relationship
  • 23. Exposing Graph to Clients
  • 24. GraphQL Query GraphQL Query - Response type column { column:String! storage_system_name:String! dataset:String! } type dataset { dataset:String! storage_system_name:String! column: [column] @relation(name:"DATASET_HAS_COLUMN", direction:OUT) } type Enterprise { name:String! } Define GraphQL Schema Cypher -> GraphQL Cypher
  • 26. Key Take-aways • The whole is greater than Sum ! • Data Awareness: Data can be anywhere across the ecosystem : but we should know its shape & form of existence. • Graphs are everywhere: Full potential can be realized by connecting the dots - across board. • Enriching metadata: It is a collective effort; silos rarely achieve effectiveness. • Graph’s Effectiveness: expose the data in a consumable way for – all tech stacks alike !
  • 27. What’s next? • Data Classification • PII / PCI / PHI • Org Structure • People, Geo , Hierarchies • Technical Data Catalog • Data Stores • Databases / Containers • Tables / Objects • Columns / Fields • Business Metadata / Glossary / Annotation • IAM • Access controls • Owners, Users • Artifacts • Code base – GitHub/SQL • Wiki – Documents, Mentions • JIRA – Issues, Mentions • Operational Metadata • Jobs & Apps • Databases & App – Query Logs • Lineage • Social Metadata (Internal) • Slack Chats, Threads, Annotations Connecting Further… Scale – Millions of Nodes X Edges Access - Simplification Graph – Data Security & Multitenancy Insights driven by – Graph / ML Algorithms Foundational Capabilities…
  • 28. Questions? Code base – GITHUB_FOR_GEM Let’s collaborate - GITTER_FOR_GEM

Notas do Editor

  1. Connect to Org Values
  2. number