Heterogenous data holds significant inherent context. We would like our machine learning models to understand this context, and utilise this ancillary but critical information to improve the accuracy and versatility of our models.
How can we systematically make use of context in Machine Learning?
We delve in and investigate the knowledge modelling techniques, which applied with the right ML strategies, give us a promising approach for robustly handling heterogeneous data in large knowledge models. We aim to do this in a way that allows us to build any Machine Learning models, including graph learning models like our KGCN.
Speaker: James Fletcher, Vaticle
James comes from a background of Computer Vision, specialising in automated diagnostics. As Principal Scientist at Vaticle, his mission is to demonstrate to the world how traditional symbolic approaches to AI, built-in to TypeDB, can be combined with present-day research in machine learning.
2. Interconnected
tabular data
T
A
B
OR
Problems with either:
- Hard to build machine to “understand” not
learn by rote
- Normalisations of the data, not natural
representations
.
.
.
t
.
.
.
a1
.
.
.
a2
.
.
.
b1
.
.
.
b2
Problems:
- Order matters
- Increasing relations
increases dimensionality
Option 1 .
.
.
t
.
.
.
a1
.
.
.
a2
.
.
.
b1
.
.
.
b2
.
.
.
b1
.
.
.
b2
.
.
.
t
.
.
.
a1
.
.
.
t
.
.
.
t
.
.
.
a2
Problems:
- Lots of repetition
Option 2
3. ETL
(Extract Transform Load)
ad-hoc per project
for …
for …
for…
1
...
78
...
98
Flat feature data
ML Model
.
.
.
.
.
.
Results
Interconnected
tabular data
T
A
B
More ETL work
Results not good enough, fetch more context from the data
Data ETL
ML
Model
4. Abstraction in the Workflow
Data ETL
ML
Model
Project 1
ETL
ML
Model
Project 2
Lots of repetition when we
change the model even
over the same data
Can we make an
abstraction instead?
Data ETL ?
ML Model Project 1
ML Model Project 2
Data and ETL code is reused if we create an
interface that any ML model can read from
5. Abstraction in the Workflow
Message Passing!
The message to send needs to be
learned, but from what?
ML Model Project 1
ML Model Project 2
How do we avoid
learning from a flat
representation even
using a DB?
Data ETL
Database
We know how to
do ETL to a DB
Queries
Queries
6. Capturing Graph Context
What’s the context of the messages?
transaction
transaction
person
teaching
t
e
a
c
h
e
r
buyer
seller
First step: capture the
information in the model
.
.
.
?
.
.
.
?
.
.
.
?
.
.
.
?
.
.
.
?
.
.
.
?
message
8. Data ETL
ML Model
How do we get from DB queries to
features for our ML models?
What’s missing?
Queries ?
We’ve missed a step – we can’t go
straight from DB queries to ML
🤔
Database
entity
relation
attribute
10. Graph Embedding
KGLIB <thing type>
Optional<value>
has OR <role type>
- Simplified model captures all things
entities, relations, attributes can do,
and treats roles and has as edges
- Now we can arrive at a simple
embedding structure
NetworkX
<
r
o
l
e
t
y
p
e
>
<attribute type>
<value>
h
a
s
has
<entity type>
<role type>
<relation type>
has
<role type>
- Nodes can own an attribute via has
- Nodes can play a role via a <role type>
- Attributes are nodes and carry only one value
- Thanks to the Type System of TypeDB all types behave
the same way, so we can embed them the same way
11. Graph Embedding
<thing type>
Optional<value>
has OR <role type>
Now we need to embed:
- thing types, role types,
has (same method)
- optional values
.
.
.
.
.
.
.
.
.
edge embedding
type embedding of has or role subtype
Type Embedder
value embedding
Categorical Value
Embedder
Continuous Value
Embedder
String Value
Embedder
OR
OR
node embedding
type embedding of thing subtype
Type Embedder
12. Basic Type Embedding
How do we embed types?
OOTB embed from TensorFlow & PyTorch
Treat types as
separate categories
Index them when
encoding the graph
vehicle
road-vehicle
car
motorbike
air-vehicle
plane
0
1
2
3
4
5
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
motorbike
-
node_representation
encoded = encode(node_representation) 3
-
embedding
representation = embedding_lookup(embedding, encoded[0]) .
.
.
.
.
.
.
.
3
Downstream
Backprop
Implemented in KGLIB!
13. Advanced Type Embedding
motorbike
-
node_representation
parent_types = match motorbike sub $t;
{ $t type motorbike sub road-vehicle; }
{ $t type road-vehicle sub vehicle; }
{ $t type vehicle sub entity; }
{ $t type entity sub thing; }
{ $t type thing; }
sub sub
isa
isa isa
isa
isa
isa
sub
motorbike car
road-vehicle
motorbike
car
motorbike
car
car
car
vehicle
Transfer understanding!
thing
entity
vehicle
road-vehicle
car
motorbike
air-vehicle
plane
0
1
2
3
4
5
6
7
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Not implemented in KGLIB... yet
A job for self-attention!
.
.
.
.
.
.
.
.
Combine how?
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15. Graph Embedding
<thing type>
Optional<value>
has OR <role type>
Now we need to embed:
- thing types, role types,
has (same method)
- optional values
.
.
.
.
.
.
.
.
.
edge embedding
type embedding of has or role subtype
Type Embedder
value embedding
Categorical Value
Embedder
Continuous Value
Embedder
String Value
Embedder
OR
OR
node embedding
type embedding of thing subtype
Type Embedder
17. Data
What have we achieved?
ETL once Generally useful KG
ETL Database
<type>
<type>
<type>
<value>
Your data
KGLIB
NetworkX
Encoding
has OR <role type>
<thing type>
<value>
Your code
Feedback loop changes queries only
match
$p isa person,
has email “ab”;
$t(teacher: $p);
$r(buyer: $p);
Queries
Abstraction of data means
generic type & value
embedding tools
Embedding
.
.
.
.
.
.
KGCN
Improved machine understanding!