SlideShare a Scribd company logo
1 of 17
Download to read offline
Strongly Typed Data for
Machine Learning
Interconnected
tabular data
T
A
B
OR
Problems with either:
- Hard to build machine to “understand” not
learn by rote
- Normalisations of the data, not natural
representations
.
.
.
t
.
.
.
a1
.
.
.
a2
.
.
.
b1
.
.
.
b2
Problems:
- Order matters
- Increasing relations
increases dimensionality
Option 1 .
.
.
t
.
.
.
a1
.
.
.
a2
.
.
.
b1
.
.
.
b2
.
.
.
b1
.
.
.
b2
.
.
.
t
.
.
.
a1
.
.
.
t
.
.
.
t
.
.
.
a2
Problems:
- Lots of repetition
Option 2
ETL
(Extract Transform Load)
ad-hoc per project
for …
for …
for…
1
...
78
...
98
Flat feature data
ML Model
.
.
.
.
.
.
Results
Interconnected
tabular data
T
A
B
More ETL work
Results not good enough, fetch more context from the data
Data ETL
ML
Model
Abstraction in the Workflow
Data ETL
ML
Model
Project 1
ETL
ML
Model
Project 2
Lots of repetition when we
change the model even
over the same data
Can we make an
abstraction instead?
Data ETL ?
ML Model Project 1
ML Model Project 2
Data and ETL code is reused if we create an
interface that any ML model can read from
Abstraction in the Workflow
Message Passing!
The message to send needs to be
learned, but from what?
ML Model Project 1
ML Model Project 2
How do we avoid
learning from a flat
representation even
using a DB?
Data ETL
Database
We know how to
do ETL to a DB
Queries
Queries
Capturing Graph Context
What’s the context of the messages?
transaction
transaction
person
teaching
t
e
a
c
h
e
r
buyer
seller
First step: capture the
information in the model
.
.
.
?
.
.
.
?
.
.
.
?
.
.
.
?
.
.
.
?
.
.
.
?
message
Capturing Graph Context
person
name
address_1
address_2
email_1
email_2
dob
One-to-many
relations with
properties is
common
= <val>
= <val>
= <val>
= <val>
= <val>
= <val>
email
has
email
has
dob
h
a
s
name
h
a
s
address
h
a
s
address
has
person
message
Data ETL
ML Model
How do we get from DB queries to
features for our ML models?
What’s missing?
Queries ?
We’ve missed a step – we can’t go
straight from DB queries to ML
🤔
Database
entity
relation
attribute
Graph Embedding
transaction
transaction
person
teaching
t
e
a
c
h
e
r
buyer
seller
email
has
email
has
dob
h
a
s
name
has
address
has
address
has
person
Consider the model we have now
Graph Embedding
KGLIB <thing type>
Optional<value>
has OR <role type>
- Simplified model captures all things
entities, relations, attributes can do,
and treats roles and has as edges
- Now we can arrive at a simple
embedding structure
NetworkX
<
r
o
l
e
t
y
p
e
>
<attribute type>
<value>
h
a
s
has
<entity type>
<role type>
<relation type>
has
<role type>
- Nodes can own an attribute via has
- Nodes can play a role via a <role type>
- Attributes are nodes and carry only one value
- Thanks to the Type System of TypeDB all types behave
the same way, so we can embed them the same way
Graph Embedding
<thing type>
Optional<value>
has OR <role type>
Now we need to embed:
- thing types, role types,
has (same method)
- optional values
.
.
.
.
.
.
.
.
.
edge embedding
type embedding of has or role subtype
Type Embedder
value embedding
Categorical Value
Embedder
Continuous Value
Embedder
String Value
Embedder
OR
OR
node embedding
type embedding of thing subtype
Type Embedder
Basic Type Embedding
How do we embed types?
OOTB embed from TensorFlow & PyTorch
Treat types as
separate categories
Index them when
encoding the graph
vehicle
road-vehicle
car
motorbike
air-vehicle
plane
0
1
2
3
4
5
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
motorbike
-
node_representation
encoded = encode(node_representation) 3
-
embedding
representation = embedding_lookup(embedding, encoded[0]) .
.
.
.
.
.
.
.
3
Downstream
Backprop
Implemented in KGLIB!
Advanced Type Embedding
motorbike
-
node_representation
parent_types = match motorbike sub $t;
{ $t type motorbike sub road-vehicle; }
{ $t type road-vehicle sub vehicle; }
{ $t type vehicle sub entity; }
{ $t type entity sub thing; }
{ $t type thing; }
sub sub
isa
isa isa
isa
isa
isa
sub
motorbike car
road-vehicle
motorbike
car
motorbike
car
car
car
vehicle
Transfer understanding!
thing
entity
vehicle
road-vehicle
car
motorbike
air-vehicle
plane
0
1
2
3
4
5
6
7
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Not implemented in KGLIB... yet
A job for self-attention!
.
.
.
.
.
.
.
.
Combine how?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Modelling Choices
sub sub
transaction
transaction
person
teaching
t
e
a
c
h
e
r
buyer
seller
teacher buyer
person
Graph Embedding
<thing type>
Optional<value>
has OR <role type>
Now we need to embed:
- thing types, role types,
has (same method)
- optional values
.
.
.
.
.
.
.
.
.
edge embedding
type embedding of has or role subtype
Type Embedder
value embedding
Categorical Value
Embedder
Continuous Value
Embedder
String Value
Embedder
OR
OR
node embedding
type embedding of thing subtype
Type Embedder
Graph Embedding
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Data
What have we achieved?
ETL once Generally useful KG
ETL Database
<type>
<type>
<type>
<value>
Your data
KGLIB
NetworkX
Encoding
has OR <role type>
<thing type>
<value>
Your code
Feedback loop changes queries only
match
$p isa person,
has email “ab”;
$t(teacher: $p);
$r(buyer: $p);
Queries
Abstraction of data means
generic type & value
embedding tools
Embedding
.
.
.
.
.
.
KGCN
Improved machine understanding!

More Related Content

What's hot (8)

Facets
FacetsFacets
Facets
 
Machine learning on streams of data
Machine learning on streams of dataMachine learning on streams of data
Machine learning on streams of data
 
Procedure oriented programming
Procedure oriented programmingProcedure oriented programming
Procedure oriented programming
 
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of TonguesChoose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
Choose'10: Ralf Laemmel - Dealing Confortably with the Confusion of Tongues
 
Scala: functional programming for the imperative mind
Scala: functional programming for the imperative mindScala: functional programming for the imperative mind
Scala: functional programming for the imperative mind
 
Introduction to functional programming
Introduction to functional programmingIntroduction to functional programming
Introduction to functional programming
 
Modelling and Programming: Isn&rsquo;t it all the same?
Modelling and Programming: Isn&rsquo;t it all the same?Modelling and Programming: Isn&rsquo;t it all the same?
Modelling and Programming: Isn&rsquo;t it all the same?
 
R Programming: Variables & Data Types
R Programming: Variables & Data TypesR Programming: Variables & Data Types
R Programming: Variables & Data Types
 

Similar to Strongly Typed Data for Machine Learning

Machine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksMachine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - Databricks
Spark Summit
 
elm-d3 @ NYC D3.js Meetup (30 June, 2014)
elm-d3 @ NYC D3.js Meetup (30 June, 2014)elm-d3 @ NYC D3.js Meetup (30 June, 2014)
elm-d3 @ NYC D3.js Meetup (30 June, 2014)
Spiros
 
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
DB Tsai
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Spark Summit
 

Similar to Strongly Typed Data for Machine Learning (20)

The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||
 
Machine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksMachine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - Databricks
 
Xxx treme aggregation
Xxx treme aggregationXxx treme aggregation
Xxx treme aggregation
 
Aggregating In Accumulo
Aggregating In AccumuloAggregating In Accumulo
Aggregating In Accumulo
 
Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)Synapse india reviews on drupal 7 entities (stanford)
Synapse india reviews on drupal 7 entities (stanford)
 
elm-d3 @ NYC D3.js Meetup (30 June, 2014)
elm-d3 @ NYC D3.js Meetup (30 June, 2014)elm-d3 @ NYC D3.js Meetup (30 June, 2014)
elm-d3 @ NYC D3.js Meetup (30 June, 2014)
 
Reinventing the Transaction Script (NDC London 2020)
Reinventing the Transaction Script (NDC London 2020)Reinventing the Transaction Script (NDC London 2020)
Reinventing the Transaction Script (NDC London 2020)
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
2017 Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit Ea...
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Introduction To Algebird: Abstract algebra for analytics
Introduction To Algebird: Abstract algebra for analyticsIntroduction To Algebird: Abstract algebra for analytics
Introduction To Algebird: Abstract algebra for analytics
 
Designing Optimized Symbols for InduSoft Web Studio Projects
Designing Optimized Symbols for InduSoft Web Studio ProjectsDesigning Optimized Symbols for InduSoft Web Studio Projects
Designing Optimized Symbols for InduSoft Web Studio Projects
 
MATLAB & Image Processing
MATLAB & Image ProcessingMATLAB & Image Processing
MATLAB & Image Processing
 
Oracle Objects And Transactions
Oracle Objects And TransactionsOracle Objects And Transactions
Oracle Objects And Transactions
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 

More from Vaticle

Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryBuilding Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Vaticle
 
A Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security KnowledgeA Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security Knowledge
Vaticle
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Vaticle
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdf
Vaticle
 
TypeDB Academy | Modelling Principles
TypeDB Academy | Modelling PrinciplesTypeDB Academy | Modelling Principles
TypeDB Academy | Modelling Principles
Vaticle
 
Intro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed databaseIntro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed database
Vaticle
 
Graph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphsGraph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphs
Vaticle
 

More from Vaticle (20)

Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug DiscoveryBuilding Biomedical Knowledge Graphs for In-Silico Drug Discovery
Building Biomedical Knowledge Graphs for In-Silico Drug Discovery
 
Loading Huge Amounts of Data
Loading Huge Amounts of DataLoading Huge Amounts of Data
Loading Huge Amounts of Data
 
Natural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge GraphNatural Language Interface to Knowledge Graph
Natural Language Interface to Knowledge Graph
 
A Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security KnowledgeA Data Modelling Framework to Unify Cyber Security Knowledge
A Data Modelling Framework to Unify Cyber Security Knowledge
 
Unifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge GraphUnifying Space Mission Knowledge with NLP & Knowledge Graph
Unifying Space Mission Knowledge with NLP & Knowledge Graph
 
The Next Big Thing in AI - Causality
The Next Big Thing in AI - CausalityThe Next Big Thing in AI - Causality
The Next Big Thing in AI - Causality
 
Building a Cyber Threat Intelligence Knowledge Graph
Building a Cyber Threat Intelligence Knowledge GraphBuilding a Cyber Threat Intelligence Knowledge Graph
Building a Cyber Threat Intelligence Knowledge Graph
 
Knowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdfKnowledge Graphs for Supply Chain Operations.pdf
Knowledge Graphs for Supply Chain Operations.pdf
 
Building a Distributed Database with Raft.pdf
Building a Distributed Database with Raft.pdfBuilding a Distributed Database with Raft.pdf
Building a Distributed Database with Raft.pdf
 
Enabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdfEnabling the Computational Future of Biology.pdf
Enabling the Computational Future of Biology.pdf
 
TypeDB Academy | Inference with Rules
TypeDB Academy | Inference with RulesTypeDB Academy | Inference with Rules
TypeDB Academy | Inference with Rules
 
TypeDB Academy | Modelling Principles
TypeDB Academy | Modelling PrinciplesTypeDB Academy | Modelling Principles
TypeDB Academy | Modelling Principles
 
Beyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQLBeyond SQL - Comparing SQL to TypeQL
Beyond SQL - Comparing SQL to TypeQL
 
TypeDB Academy- Getting Started with Schema Design
TypeDB Academy- Getting Started with Schema DesignTypeDB Academy- Getting Started with Schema Design
TypeDB Academy- Getting Started with Schema Design
 
Comparing Semantic Web Technologies to TypeDB
Comparing Semantic Web Technologies to TypeDBComparing Semantic Web Technologies to TypeDB
Comparing Semantic Web Technologies to TypeDB
 
Reasoner, Meet Actors | TypeDB's Native Reasoning Engine
Reasoner, Meet Actors | TypeDB's Native Reasoning EngineReasoner, Meet Actors | TypeDB's Native Reasoning Engine
Reasoner, Meet Actors | TypeDB's Native Reasoning Engine
 
Intro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed databaseIntro to TypeDB and TypeQL | A strongly-typed database
Intro to TypeDB and TypeQL | A strongly-typed database
 
Graph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphsGraph Databases vs TypeDB | What you can't do with graphs
Graph Databases vs TypeDB | What you can't do with graphs
 
Pandora Paper Leaks With TypeDB
 Pandora Paper Leaks With TypeDB Pandora Paper Leaks With TypeDB
Pandora Paper Leaks With TypeDB
 
Open World Robotics
Open World RoboticsOpen World Robotics
Open World Robotics
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Strongly Typed Data for Machine Learning

  • 1. Strongly Typed Data for Machine Learning
  • 2. Interconnected tabular data T A B OR Problems with either: - Hard to build machine to “understand” not learn by rote - Normalisations of the data, not natural representations . . . t . . . a1 . . . a2 . . . b1 . . . b2 Problems: - Order matters - Increasing relations increases dimensionality Option 1 . . . t . . . a1 . . . a2 . . . b1 . . . b2 . . . b1 . . . b2 . . . t . . . a1 . . . t . . . t . . . a2 Problems: - Lots of repetition Option 2
  • 3. ETL (Extract Transform Load) ad-hoc per project for … for … for… 1 ... 78 ... 98 Flat feature data ML Model . . . . . . Results Interconnected tabular data T A B More ETL work Results not good enough, fetch more context from the data Data ETL ML Model
  • 4. Abstraction in the Workflow Data ETL ML Model Project 1 ETL ML Model Project 2 Lots of repetition when we change the model even over the same data Can we make an abstraction instead? Data ETL ? ML Model Project 1 ML Model Project 2 Data and ETL code is reused if we create an interface that any ML model can read from
  • 5. Abstraction in the Workflow Message Passing! The message to send needs to be learned, but from what? ML Model Project 1 ML Model Project 2 How do we avoid learning from a flat representation even using a DB? Data ETL Database We know how to do ETL to a DB Queries Queries
  • 6. Capturing Graph Context What’s the context of the messages? transaction transaction person teaching t e a c h e r buyer seller First step: capture the information in the model . . . ? . . . ? . . . ? . . . ? . . . ? . . . ? message
  • 7. Capturing Graph Context person name address_1 address_2 email_1 email_2 dob One-to-many relations with properties is common = <val> = <val> = <val> = <val> = <val> = <val> email has email has dob h a s name h a s address h a s address has person message
  • 8. Data ETL ML Model How do we get from DB queries to features for our ML models? What’s missing? Queries ? We’ve missed a step – we can’t go straight from DB queries to ML 🤔 Database entity relation attribute
  • 10. Graph Embedding KGLIB <thing type> Optional<value> has OR <role type> - Simplified model captures all things entities, relations, attributes can do, and treats roles and has as edges - Now we can arrive at a simple embedding structure NetworkX < r o l e t y p e > <attribute type> <value> h a s has <entity type> <role type> <relation type> has <role type> - Nodes can own an attribute via has - Nodes can play a role via a <role type> - Attributes are nodes and carry only one value - Thanks to the Type System of TypeDB all types behave the same way, so we can embed them the same way
  • 11. Graph Embedding <thing type> Optional<value> has OR <role type> Now we need to embed: - thing types, role types, has (same method) - optional values . . . . . . . . . edge embedding type embedding of has or role subtype Type Embedder value embedding Categorical Value Embedder Continuous Value Embedder String Value Embedder OR OR node embedding type embedding of thing subtype Type Embedder
  • 12. Basic Type Embedding How do we embed types? OOTB embed from TensorFlow & PyTorch Treat types as separate categories Index them when encoding the graph vehicle road-vehicle car motorbike air-vehicle plane 0 1 2 3 4 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . motorbike - node_representation encoded = encode(node_representation) 3 - embedding representation = embedding_lookup(embedding, encoded[0]) . . . . . . . . 3 Downstream Backprop Implemented in KGLIB!
  • 13. Advanced Type Embedding motorbike - node_representation parent_types = match motorbike sub $t; { $t type motorbike sub road-vehicle; } { $t type road-vehicle sub vehicle; } { $t type vehicle sub entity; } { $t type entity sub thing; } { $t type thing; } sub sub isa isa isa isa isa isa sub motorbike car road-vehicle motorbike car motorbike car car car vehicle Transfer understanding! thing entity vehicle road-vehicle car motorbike air-vehicle plane 0 1 2 3 4 5 6 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Not implemented in KGLIB... yet A job for self-attention! . . . . . . . . Combine how? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  • 15. Graph Embedding <thing type> Optional<value> has OR <role type> Now we need to embed: - thing types, role types, has (same method) - optional values . . . . . . . . . edge embedding type embedding of has or role subtype Type Embedder value embedding Categorical Value Embedder Continuous Value Embedder String Value Embedder OR OR node embedding type embedding of thing subtype Type Embedder
  • 17. Data What have we achieved? ETL once Generally useful KG ETL Database <type> <type> <type> <value> Your data KGLIB NetworkX Encoding has OR <role type> <thing type> <value> Your code Feedback loop changes queries only match $p isa person, has email “ab”; $t(teacher: $p); $r(buyer: $p); Queries Abstraction of data means generic type & value embedding tools Embedding . . . . . . KGCN Improved machine understanding!