SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
© 2019 IBM Corporation
Graph generation using a graph grammar
Hiroshi Kajino
IBM Research - Tokyo
IBIS 2019
© 2019 IBM Corporation
/52
I will talk about an application of formal language to a graph generation
problem
§Formal language
–Context-free grammar
–Hyperedge replacement grammar (HRG)
–HRG inference algorithm
§Application to molecular graph generation
–Molecular hypergraph grammar (a special case of HRG)
–MHG inference algorithm
–Combination with VAE
2
Contents
© 2019 IBM Corporation
/52
A molecular graph should satisfy some constraints to be valid
§Learning a generative model of a molecular graph
–Input: set of molecular graphs ! = {$%, $', … , $)}
–Output: probability distribution +($) such that $. ∼ +
§Hard vs. soft constraints on +($)’s support
–Hard constraint: valence condition
∃ rule-based classifier that judges this constraint
–Soft constraint: stability
∄ rule-based classifier, in general
3
Why should we care about a formal language?
!"
Formal language
can help
© 2019 IBM Corporation
/52
A formal language defines a set of strings with certain properties;
an associated grammar tells us how to generate them
§Formal language
Typically defined as a set of strings
–Language point of view
= a set of strings with certain properties
(= a subset of all possible strings)
–Generative point of view
A grammar is often associated with a language
= how to generate strings in ℒ
4
Why should we care about a formal language?
All possible strings Σ∗
Language ℒ
Grammar 5
ℒ = 6.7. ∶ 9 ≥ 1 ⊂ 6, 7 ∗ = Σ∗
5 = = , 6, 7 , =, = → 67, = → 6=7
© 2019 IBM Corporation
/52
A formal language defines a set of graphs satisfying hard constraints;
an associated grammar tells us how to generate them
§Formal language
Typically defined as a set of graphs
–Language point of view
= a set of graphs satisfying the hard constraints
(= a subset of all possible graphs)
–Generative point of view
A grammar is often associated with a language
= how to generate graphs in ℒ
5
Why should we care about a formal language?
All possible graphs Σ∗
Language ℒ
Grammar 5
ℒ =
Molecules satisfying
the valence conditions
⊂ {All possible graphs}
???
© 2019 IBM Corporation
/52
I will talk about an application of formal language to a graph generation
problem
§Formal language
–Context-free grammar
–Hyperedge replacement grammar (HRG)
–HRG inference algorithm
§Application to molecular graph generation
–Molecular hypergraph grammar (a special case of HRG)
–MHG inference algorithm
–Combination with VAE
6
Contents
© 2019 IBM Corporation
/52
CFG generates a string by repeatedly applying a production rule to a
non-terminal, until there exists no non-terminal
§Context-free grammar 5 = (T, Σ, U, =)
– T: set of non-terminals
– Σ: set of terminals
– U: set of production rules
– = ∈ T: the start symbol
7
Context-free grammar
§ Example
– T = {=}
– Σ = 6, 7
– U = {= → 67, = → 6=7}
– =
=
© 2019 IBM Corporation
/52
CFG generates a string by repeatedly applying a production rule to a
non-terminal, until there exists no non-terminal
§Context-free grammar 5 = (T, Σ, U, =)
– T: set of non-terminals
– Σ: set of terminals
– U: set of production rules
– = ∈ T: the start symbol
8
Context-free grammar
§ Example
– T = {=}
– Σ = 6, 7
– U = {= → 67, = → 6=7}
– =
=
© 2019 IBM Corporation
/52
CFG generates a string by repeatedly applying a production rule to a
non-terminal, until there exists no non-terminal
§Context-free grammar 5 = (T, Σ, U, =)
– T: set of non-terminals
– Σ: set of terminals
– U: set of production rules
– = ∈ T: the start symbol
9
Context-free grammar
§ Example
– T = {=}
– Σ = 6, 7
– U = {= → 67, = → 6=7}
– =
6=7
© 2019 IBM Corporation
/52
CFG generates a string by repeatedly applying a production rule to a
non-terminal, until there exists no non-terminal
§Context-free grammar 5 = (T, Σ, U, =)
– T: set of non-terminals
– Σ: set of terminals
– U: set of production rules
– = ∈ T: the start symbol
10
Context-free grammar
§ Example
– T = {=}
– Σ = 6, 7
– U = {= → 67, = → 6=7}
– =
6=7
© 2019 IBM Corporation
/52
CFG generates a string by repeatedly applying a production rule to a
non-terminal, until there exists no non-terminal
§Context-free grammar 5 = (T, Σ, U, =)
– T: set of non-terminals
– Σ: set of terminals
– U: set of production rules
– = ∈ T: the start symbol
11
Context-free grammar
§ Example
– T = {=}
– Σ = 6, 7
– U = {= → 67, = → 6=7}
– =
66=77
© 2019 IBM Corporation
/52
CFG generates a string by repeatedly applying a production rule to a
non-terminal, until there exists no non-terminal
§Context-free grammar 5 = (T, Σ, U, =)
– T: set of non-terminals
– Σ: set of terminals
– U: set of production rules
– = ∈ T: the start symbol
12
Context-free grammar
§ Example
– T = {=}
– Σ = 6, 7
– U = {= → 67, = → 6=7}
– =
66=77
© 2019 IBM Corporation
/52
CFG generates a string by repeatedly applying a production rule to a
non-terminal, until there exists no non-terminal
§Context-free grammar 5 = (T, Σ, U, =)
– T: set of non-terminals
– Σ: set of terminals
– U: set of production rules
– = ∈ T: the start symbol
13
Context-free grammar
§ Example
– T = {=}
– Σ = 6, 7
– U = {= → 67, = → 6=7}
– =
666777
© 2019 IBM Corporation
/52
I will talk about an application of formal language to a graph generation
problem
§Formal language
–Context-free grammar
–Hyperedge replacement grammar (HRG)
–HRG inference algorithm
§Application to molecular graph generation
–Molecular hypergraph grammar (a special case of HRG)
–MHG inference algorithm
–Combination with VAE
14
Contents
© 2019 IBM Corporation
/52
Hypergraph is a generalization of a graph
§Hypergraph W = (T, X) consists of…
–Node Y ∈ T
–Hyperedge Z ∈ X ⊆ 2 ]
: Connect an arbitrary number of nodes
cf, An edge in a graph connects exactly two nodes
15
Hyperedge replacement grammar
HyperedgeNode
© 2019 IBM Corporation
/52
HRG generates a hypergraph by repeatedly replacing
non-terminal hyperedges with hypergraphs
§Hyperedge replacement grammar (HRG) 5 = (T, Σ, U, =)
–T: set of non-terminals
–Σ: set of terminals
–=: start symbol
–U: set of production rules
– A rule replaces a non-terminal hyperedge with a hypergraph
16
Hyperedge replacement grammar
1C
2
H
H1N
2
N
N
NS
Labels on hyperedges
C
N
S
[Feder, 71], [Pavlidis+, 72]
© 2019 IBM Corporation
/52
Start from start symbol S
17
Hyperedge replacement grammar
S
1C
2
H
H1N
2
N
N
NS
Production rules ^
© 2019 IBM Corporation
/52
The left rule is applicable
18
Hyperedge replacement grammar
S
1C
2
H
H1N
2
N
N
NS
Production rules ^
© 2019 IBM Corporation
/52
We obtain a hypergraph with three non-terminals
19
Hyperedge replacement grammar
1C
2
H
H1N
2
N
N
NS
Production rules ^
N
N
N
© 2019 IBM Corporation
/52
Apply the right rule to one of the non-terminals
20
Hyperedge replacement grammar
1C
2
H
H1N
2
N
N
NS
Production rules ^
N
N
N
© 2019 IBM Corporation
/52
Two non-terminals remain
21
Hyperedge replacement grammar
1C
2
H
H1N
2
N
N
NS
Production rules ^
C
N
N
H
H
© 2019 IBM Corporation
/52
Repeat the procedure until there is no non-terminal
22
Hyperedge replacement grammar
1C
2
H
H1N
2
N
N
NS
Production rules ^
C
N
N
H
H
© 2019 IBM Corporation
/52
Repeat the procedure until there is no non-terminal
23
Hyperedge replacement grammar
1C
2
H
H1N
2
N
N
NS
Production rules ^
C
C
N
H
H
H
H
© 2019 IBM Corporation
/52
Repeat the procedure until there is no non-terminal
24
Hyperedge replacement grammar
1C
2
H
H1N
2
N
N
NS
Production rules ^
C
C
N
H
H
H
H
© 2019 IBM Corporation
/52
Graph generation halts when there is no non-terminal
25
Hyperedge replacement grammar
1C
2
H
H1N
2
N
N
NS
Production rules ^
C
C
C
H
H
H
H
H
H
© 2019 IBM Corporation
/52
I will talk about an application of formal language to a graph generation
problem
§Formal language
–Context-free grammar
–Hyperedge replacement grammar (HRG)
–HRG inference algorithm
§Application to molecular graph generation
–Molecular hypergraph grammar (a special case of HRG)
–MHG inference algorithm
–Combination with VAE
26
Contents
© 2019 IBM Corporation
/52
HRG inference algorithm outputs HRG that can reconstruct the input
§HRG inference algorithm [Aguiñaga+, 16]
–Input: Set of hypergraphs ℋ
–Output: HRG such that ℋ ⊆ ℒ(HRG)
Minimum requirement of the grammar’s expressiveness
–Idea: Infer production rules necessary to obtain each hypergraph
Decompose each hypergraph into a set of production rules
27
HRG inference algorithm
Language of the grammar
© 2019 IBM Corporation
/52
Tree decomposition discovers a tree-like structure in a graph
§Tree decomposition
–All the nodes and edges must be included in the tree
–For each node, the tree nodes that contain it must be connected
28
HRG inference algorithm
* Digits represent the node correspondence
1
3
2
1
3
4
3C
4
H
H
3 C
2
H
H
1C
4
H
H 1 C
2
H
H
C C H
HH
H
C C
H
HH
H
© 2019 IBM Corporation
/52
Tree decomposition and (a syntax tree of) HRG are equivalent
§Relationship between tree decomposition and HRG
1. Connecting hypergraphs in tree recovers the original hypergraph
2. Connection ⇔ Hyperedge replacement
29
HRG inference algorithm
1
3
2
1
3
4
3C
4
H
H
3 C
2
H
H
1C
4
H
H 1 C
2
H
H
C C H
HH
H
C C
H
HH
H
© 2019 IBM Corporation
/52
Tree decomposition and (a syntax tree of) HRG are equivalent
§Relationship between tree decomposition and HRG
1. Connecting hypergraphs in tree recovers the original hypergraph
2. Connection ⇔ Hyperedge replacement
31
HRG inference algorithm
1
3
2
1
3
4
3C
4
H
H
3 C
2
H
H
1C
4
H
H 1 C
2
H
H
1C
4
H
H 1
4 N
1
3
4
1
4 N
Production rule
=
attach
© 2019 IBM Corporation
/52
Tree decomposition and (a syntax tree of) HRG are equivalent
§Relationship between tree decomposition and HRG
1. Connecting hypergraphs in tree recovers the original hypergraph
2. Connection ⇔ Hyperedge replacement
32
HRG inference algorithm
1
3
2
1
3
4
3C
4
H
H
3 C
2
H
H
1C
4
H
H 1 C
2
H
H
1C
4
H
H 1
4 N
1
3
4
1
4 N
Production rule
=
attach
Children
© 2019 IBM Corporation
/52
Tree decomposition and (a syntax tree of) HRG are equivalent
§Relationship between tree decomposition and HRG
1. Connecting hypergraphs in tree recovers the original hypergraph
2. Connection ⇔ Hyperedge replacement
33
HRG inference algorithm
1
3
2
1
3
4
3C
4
H
H
3 C
2
H
H
1C
4
H
H 1 C
2
H
H
1C
4
H
H 1
4 N
1
3
4
1
4 N
Production rule
=
attach
N
N
© 2019 IBM Corporation
/52
HRG can be inferred from tree decompositions of input hypergraphs.
§HRG inference algorithm [Aguiñaga+, 16]
–Algorithm:
–Expressiveness: ℋ ⊆ ℒ(HRG)
The resultant HRG can generate all input hypergraphs.
(∵ clear from its algorithm)
34
HRG inference algorithm
1. Compute tree decompositions of input hypergraphs
2. Extract production rules
3. Compose HRG by taking their union
© 2019 IBM Corporation
/52
I will talk about an application of formal language to a graph generation
problem
§Formal language
–Context-free grammar
–Hyperedge replacement grammar (HRG)
–HRG inference algorithm
§Application to molecular graph generation
–Molecular hypergraph grammar (a special case of HRG)
–MHG inference algorithm
–Combination with VAE
35
Contents
© 2019 IBM Corporation
/52
We want a graph grammar that guarantees hard constraints
§Objective
Construct a graph grammar that never violates the valence condition
§Application: Generative model of a molecule
–Grammar-based generation guarantees the valence condition
–Probabilistic model could learn soft constraints
36
Molecular hypergraph grammar
!"
© 2019 IBM Corporation
/52
A simple application to molecular graphs doesn’t work
§A simple application to molecular graphs
–Input: Molecular graphs
–Issue: Valence conditions can be violated
37
Molecular hypergraph grammar
H C C H
H H
H H
Input
C
H
H
C
C
H
H
C
H C C C C H
Tree decomposition
→
→
→
S
NC
NC
NH C
NC C
N
N
N
N
C H
Extracted rules
This rule increases
the degree of carbon
© 2019 IBM Corporation
/52
Our idea is to use a hypergraph representation of a molecule
§Conserved quantity
–HRG: # of nodes in a hyperedge
–Our grammar: # of bonds connected to each atom (valence)
∴ Atom should be modeled as a hyperedge
§Molecular hypergraph
– Atom = hyperedge
– Bond = node
38
Molecular hypergraph grammar
C C
H
H
HH
H
H
C C H
HH
H
C C
H
HH
H
© 2019 IBM Corporation
/52
A language for molecular hypergraphs consists of two properties
§Molecular hypergraph as a language
A set of hypergraphs with the following properties:
1. Each node has degree 2 (=2-regular)
2. Label on a hyperedge determines # of nodes it has (= valence)
39
Molecular hypergraph grammar
H HC
H
C
H
H HC
H
C
H
C C
H
H
HH
H
H
C C H
HH
H
C C
H
HH
H
© 2019 IBM Corporation
/52
MHG, a grammar for the language, is defined as a subclass of HRG
§Molecular Hypergraph Grammar (MHG)
–Definition: HRG that generates molecular hypergraphs only
–Counterexamples:
40
Molecular hypergraph grammar
MHG
HRG
C C
H
H
HH
H
HH
C C
H
H
HH
H
HH
Valence #
2-regularity #
This can be
avoided by
learning HRG from
data [Aguiñaga+, 16]
Use an irredundant
tree decomposition
(our contribution)
© 2019 IBM Corporation
/52
I will talk about an application of formal language to a graph generation
problem
§Formal language
–Context-free grammar
–Hyperedge replacement grammar (HRG)
–HRG inference algorithm
§Application to molecular graph generation
–Molecular hypergraph grammar (a special case of HRG)
–MHG inference algorithm
–Combination with VAE
41
Contents
© 2019 IBM Corporation
/52
A naive application of the existing algorithm doesn’t work
§Naive application of the HRG inference algorithm [Aguiñaga+, 16]
–Input: Set of hypergraphs
–Output: HRG w/ the following properties:
• All the input hypergraphs are in the language $
• Guarantee the valence conditions $
• No guarantee on 2-regularity %
42
MHG inference algorithm
C C
H
H
HH
H
HH
This cannot be transformed
into a molecular graph
© 2019 IBM Corporation
/52
Irredundant tree decomposition is a key to guarantee 2-regularity
§Irredundant tree decomposition
–The connected subgraph induced by a node must be a path
–Any tree decomposition can be made irredundant in poly-time
43
MHG inference algorithm
1
3
2
1
3
4
3C
4
H
H
3 C
2
H
H
1C
4
H
H 1 C
2
H
H
4
RedundantIrredundant
© 2019 IBM Corporation
/52
MHG inference algorithm is different from the existing one by two steps
§MHG Inference algorithm [Kajino, 19]
–Input: Set of molecular graphs
–Output: MHG w/ the following properties:
• All the input hypergraphs are in the language $
• Guarantee the valence conditions $
• Guarantee 2-regularity &
44
MHG inference algorithm
1. Convert molecular graphs into molecular hypergraphs
2. Compute tree decompositions of molecular hypergraphs
3. Convert each tree decomposition to be irredundant
4. Extract production rules
5. Compose MHG by taking their union
Thanks to HRG
Our contribution
© 2019 IBM Corporation
/52
I will talk about an application of formal language to a graph generation
problem
§Formal language
–Context-free grammar
–Hyperedge replacement grammar (HRG)
–HRG inference algorithm
§Application to molecular graph generation
–Molecular hypergraph grammar (a special case of HRG)
–MHG inference algorithm
–Combination with VAE
45
Contents
© 2019 IBM Corporation
/52
We obtain (Enc, Dec) between molecule and latent vector
by combining MHG and RNN-VAE
§MHG-VAE: (Enc, Dec) between molecule & latent vector
46
Combination with VAE
EncG
EncN
EncH
Molecular
graph
Molecular
hypergraph
Parse Tree
according to MHG
! ∈ ℝ$
Latent vector
MHG-VAE encoder
MHG Enc of RNN-VAE
© 2019 IBM Corporation
/52
First, we learn (Enc, Dec) between a molecule and its vector
representation using MHG-VAE
§Global molecular optimization [Gómez-Bombarelli+, 16]
–Find: Molecule that maximizes the target
–Method: VAE+BO
1. Obtain MHG from the input molecules
2. Train RNN-VAE on syntax trees
3. Obtain vector representations f. ∈ ℝh
.i%
)
1. Some of which have target values {j. ∈ ℝ}
4. BO gives us candidates fk ∈ ℝh
ki%
l that may maximize the target
5. Decode them to obtain molecules !k ki%
l
47
Combination with VAE
Image from [Gómez-Bombarelli+, 16]
© 2019 IBM Corporation
/52
Given vector representations and their target values,
we use BO to obtain a vector that optimizes the target
§Global molecular optimization [Gómez-Bombarelli+, 16]
–Find: Molecule that maximizes the target
–Method: VAE+BO
1. Obtain MHG from the input molecules
2. Train RNN-VAE on syntax trees
3. Obtain vector representations f. ∈ ℝh
.i%
)
1. Some of which have target values {j. ∈ ℝ}
4. BO gives us candidates fk ∈ ℝh
ki%
l that may maximize the target
5. Decode them to obtain molecules !k ki%
l
48
Combination with VAE
Image from [Gómez-Bombarelli+, 16]
© 2019 IBM Corporation
/52
We evaluate the benefit of our grammar-based representation,
compared with existing ones
§Empirical study
–Purpose: How much does our representation facilitate VAE training?
–Baselines:
• {C,G,SD}VAE use SMILES (text repr.)
• JT-VAE assembles molecular components
– It requires NNs other than VAE for scoring
–Tasks:
• VAE reconstruction
• Valid prior ratio
• Global molecular optimization
49
Combination with VAE
Image from [Jin+, 18]
© 2019 IBM Corporation
/52
Our grammar-based representation achieves better scores.
This result empirically supports the effectiveness of our approach.
§Result
50
Combination with VAE
Synthetic accessibility score
Penalty to a ring larger than six
Water solubility
Maximizing m(n)
© 2019 IBM Corporation
/52
A graph grammar can be a building block for a graph generative model
§Classify constraints into hard ones and soft ones
ML for the soft ones, rules for the hard ones
§Define a language by encoding hard constraints
E.g., valence conditions
§Design a grammar for the language
Sometime, w/ an inference algorithm
Code is now public on Github
https://github.com/ibm-research-tokyo/graph_grammar
51
Takeaways
© 2019 IBM Corporation
/52
[Aguiñaga+, 16] Aguiñaga, S., Palacios, R., Chiang, D., and Weninger, T.: Growing graphs from hyperedge
replacement graph grammars. In Proceedings of the 25th ACM International on Conference on Information and
Knowledge Management, pp. 469–478, 2016.
[Feder, 71] Feder, J: Plex languages. Information Sciences, 3, pp. 225-241, 1971.
[Gómez-Bombarelli+, 16] Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-
Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A.: Automatic
chemical design using a data-driven continuous representation of molecules. ACS Central Science, 2018. (ArXiv ver.
appears in 2016)
[Jin+, 18] Jin, W., Barzilay, R., and Jaakkola, T.: Junction tree variational autoencoder for molecular graph generation.
In Proceedings of the Thirty-fifth International Conference on Machine Learning, 2018.
[Kajino, 19] Kajino, H.: Molecular hypergraph grammar with its application to molecular optimization. In Proceedings of
the Thirty-sixth International Conference on Machine Learning, 2019.
[Pavlidis, 72] Pavlidis, T.: Linear and context-free graph grammars. Journal of the ACM, 19(1), pp.11-23, 1972.
[You+, 18] You, J., Liu, B., Ying, Z., Pande, V., and Leskovec, J.: Graph convolutional policy network for goal-directed
molecular graph generation. In Advances in Neural Information Processing Systems 31, pp. 6412–6422, 2018.
.
52
References

Mais conteúdo relacionado

Mais procurados

Deterministic context free grammars &non-deterministic
Deterministic context free grammars &non-deterministicDeterministic context free grammars &non-deterministic
Deterministic context free grammars &non-deterministic
Leyo Stephen
 

Mais procurados (20)

Bayesian networks in AI
Bayesian networks in AIBayesian networks in AI
Bayesian networks in AI
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free Grammar
 
Closure properties
Closure propertiesClosure properties
Closure properties
 
Pda to cfg h2
Pda to cfg h2Pda to cfg h2
Pda to cfg h2
 
Deterministic context free grammars &non-deterministic
Deterministic context free grammars &non-deterministicDeterministic context free grammars &non-deterministic
Deterministic context free grammars &non-deterministic
 
Turing machine and Halting Introduction
Turing machine and Halting IntroductionTuring machine and Halting Introduction
Turing machine and Halting Introduction
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
Intoduction to Homotopy Type Therory
Intoduction to Homotopy Type TheroryIntoduction to Homotopy Type Therory
Intoduction to Homotopy Type Therory
 
Pumping lemma for cfl
Pumping lemma for cflPumping lemma for cfl
Pumping lemma for cfl
 
純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門
 
プログラミングコンテストでのデータ構造 2 ~平衡二分探索木編~
プログラミングコンテストでのデータ構造 2 ~平衡二分探索木編~プログラミングコンテストでのデータ構造 2 ~平衡二分探索木編~
プログラミングコンテストでのデータ構造 2 ~平衡二分探索木編~
 
高速フーリエ変換
高速フーリエ変換高速フーリエ変換
高速フーリエ変換
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)Artificial Neural Network(Artificial intelligence)
Artificial Neural Network(Artificial intelligence)
 
Algebraic DP: 動的計画法を書きやすく
Algebraic DP: 動的計画法を書きやすくAlgebraic DP: 動的計画法を書きやすく
Algebraic DP: 動的計画法を書きやすく
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Finite Automata
Finite AutomataFinite Automata
Finite Automata
 
Asymptotic Notation
Asymptotic NotationAsymptotic Notation
Asymptotic Notation
 
直交領域探索
直交領域探索直交領域探索
直交領域探索
 
Pumping lemma for regular language
Pumping lemma for regular languagePumping lemma for regular language
Pumping lemma for regular language
 

Semelhante a Graph generation using a graph grammar

An introduction on language processing
An introduction on language processingAn introduction on language processing
An introduction on language processing
Ralf Laemmel
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
RIILP
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
Ruymán Reyes
 

Semelhante a Graph generation using a graph grammar (16)

An introduction on language processing
An introduction on language processingAn introduction on language processing
An introduction on language processing
 
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
Learning to Generate Pseudo-code from Source Code using Statistical Machine T...
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Pattern-Based Debugging of Declarative Models
Pattern-Based Debugging of Declarative ModelsPattern-Based Debugging of Declarative Models
Pattern-Based Debugging of Declarative Models
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
 
Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocado
 
parellel computing
parellel computingparellel computing
parellel computing
 
ijcai2020_submitted_paper_but_not_acceptted
ijcai2020_submitted_paper_but_not_accepttedijcai2020_submitted_paper_but_not_acceptted
ijcai2020_submitted_paper_but_not_acceptted
 
chapter4.ppt
chapter4.pptchapter4.ppt
chapter4.ppt
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
chapter7.ppt
chapter7.pptchapter7.ppt
chapter7.ppt
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Open mp
Open mpOpen mp
Open mp
 
Timed Concurrent Language for Argumentation
Timed Concurrent Language for ArgumentationTimed Concurrent Language for Argumentation
Timed Concurrent Language for Argumentation
 

Mais de Hiroshi Kajino

能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
Hiroshi Kajino
 
Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Instance-privacy Preserving Crowdsourcing (HCOMP2014)Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Hiroshi Kajino
 

Mais de Hiroshi Kajino (10)

化学構造式のためのハイパーグラフ文法(JSAI2018)
化学構造式のためのハイパーグラフ文法(JSAI2018)化学構造式のためのハイパーグラフ文法(JSAI2018)
化学構造式のためのハイパーグラフ文法(JSAI2018)
 
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
能動学習による多関係データセットの構築(IBIS2015 博士課程招待講演)
 
Active Learning for Multi-relational Data Construction
Active Learning for Multi-relational Data ConstructionActive Learning for Multi-relational Data Construction
Active Learning for Multi-relational Data Construction
 
能動学習による多関係データセットの構築
能動学習による多関係データセットの構築能動学習による多関係データセットの構築
能動学習による多関係データセットの構築
 
Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Instance-privacy Preserving Crowdsourcing (HCOMP2014)Instance-privacy Preserving Crowdsourcing (HCOMP2014)
Instance-privacy Preserving Crowdsourcing (HCOMP2014)
 
Preserving Worker Privacy in Crowdsourcing
Preserving Worker Privacy in CrowdsourcingPreserving Worker Privacy in Crowdsourcing
Preserving Worker Privacy in Crowdsourcing
 
プライバシ保護クラウドソーシング
プライバシ保護クラウドソーシングプライバシ保護クラウドソーシング
プライバシ保護クラウドソーシング
 
20130716 aaai13-short
20130716 aaai13-short20130716 aaai13-short
20130716 aaai13-short
 
20130605-JSAI2013
20130605-JSAI201320130605-JSAI2013
20130605-JSAI2013
 
20130304-DEIM2013
20130304-DEIM201320130304-DEIM2013
20130304-DEIM2013
 

Último

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Último (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Graph generation using a graph grammar

  • 1. © 2019 IBM Corporation Graph generation using a graph grammar Hiroshi Kajino IBM Research - Tokyo IBIS 2019
  • 2. © 2019 IBM Corporation /52 I will talk about an application of formal language to a graph generation problem §Formal language –Context-free grammar –Hyperedge replacement grammar (HRG) –HRG inference algorithm §Application to molecular graph generation –Molecular hypergraph grammar (a special case of HRG) –MHG inference algorithm –Combination with VAE 2 Contents
  • 3. © 2019 IBM Corporation /52 A molecular graph should satisfy some constraints to be valid §Learning a generative model of a molecular graph –Input: set of molecular graphs ! = {$%, $', … , $)} –Output: probability distribution +($) such that $. ∼ + §Hard vs. soft constraints on +($)’s support –Hard constraint: valence condition ∃ rule-based classifier that judges this constraint –Soft constraint: stability ∄ rule-based classifier, in general 3 Why should we care about a formal language? !" Formal language can help
  • 4. © 2019 IBM Corporation /52 A formal language defines a set of strings with certain properties; an associated grammar tells us how to generate them §Formal language Typically defined as a set of strings –Language point of view = a set of strings with certain properties (= a subset of all possible strings) –Generative point of view A grammar is often associated with a language = how to generate strings in ℒ 4 Why should we care about a formal language? All possible strings Σ∗ Language ℒ Grammar 5 ℒ = 6.7. ∶ 9 ≥ 1 ⊂ 6, 7 ∗ = Σ∗ 5 = = , 6, 7 , =, = → 67, = → 6=7
  • 5. © 2019 IBM Corporation /52 A formal language defines a set of graphs satisfying hard constraints; an associated grammar tells us how to generate them §Formal language Typically defined as a set of graphs –Language point of view = a set of graphs satisfying the hard constraints (= a subset of all possible graphs) –Generative point of view A grammar is often associated with a language = how to generate graphs in ℒ 5 Why should we care about a formal language? All possible graphs Σ∗ Language ℒ Grammar 5 ℒ = Molecules satisfying the valence conditions ⊂ {All possible graphs} ???
  • 6. © 2019 IBM Corporation /52 I will talk about an application of formal language to a graph generation problem §Formal language –Context-free grammar –Hyperedge replacement grammar (HRG) –HRG inference algorithm §Application to molecular graph generation –Molecular hypergraph grammar (a special case of HRG) –MHG inference algorithm –Combination with VAE 6 Contents
  • 7. © 2019 IBM Corporation /52 CFG generates a string by repeatedly applying a production rule to a non-terminal, until there exists no non-terminal §Context-free grammar 5 = (T, Σ, U, =) – T: set of non-terminals – Σ: set of terminals – U: set of production rules – = ∈ T: the start symbol 7 Context-free grammar § Example – T = {=} – Σ = 6, 7 – U = {= → 67, = → 6=7} – = =
  • 8. © 2019 IBM Corporation /52 CFG generates a string by repeatedly applying a production rule to a non-terminal, until there exists no non-terminal §Context-free grammar 5 = (T, Σ, U, =) – T: set of non-terminals – Σ: set of terminals – U: set of production rules – = ∈ T: the start symbol 8 Context-free grammar § Example – T = {=} – Σ = 6, 7 – U = {= → 67, = → 6=7} – = =
  • 9. © 2019 IBM Corporation /52 CFG generates a string by repeatedly applying a production rule to a non-terminal, until there exists no non-terminal §Context-free grammar 5 = (T, Σ, U, =) – T: set of non-terminals – Σ: set of terminals – U: set of production rules – = ∈ T: the start symbol 9 Context-free grammar § Example – T = {=} – Σ = 6, 7 – U = {= → 67, = → 6=7} – = 6=7
  • 10. © 2019 IBM Corporation /52 CFG generates a string by repeatedly applying a production rule to a non-terminal, until there exists no non-terminal §Context-free grammar 5 = (T, Σ, U, =) – T: set of non-terminals – Σ: set of terminals – U: set of production rules – = ∈ T: the start symbol 10 Context-free grammar § Example – T = {=} – Σ = 6, 7 – U = {= → 67, = → 6=7} – = 6=7
  • 11. © 2019 IBM Corporation /52 CFG generates a string by repeatedly applying a production rule to a non-terminal, until there exists no non-terminal §Context-free grammar 5 = (T, Σ, U, =) – T: set of non-terminals – Σ: set of terminals – U: set of production rules – = ∈ T: the start symbol 11 Context-free grammar § Example – T = {=} – Σ = 6, 7 – U = {= → 67, = → 6=7} – = 66=77
  • 12. © 2019 IBM Corporation /52 CFG generates a string by repeatedly applying a production rule to a non-terminal, until there exists no non-terminal §Context-free grammar 5 = (T, Σ, U, =) – T: set of non-terminals – Σ: set of terminals – U: set of production rules – = ∈ T: the start symbol 12 Context-free grammar § Example – T = {=} – Σ = 6, 7 – U = {= → 67, = → 6=7} – = 66=77
  • 13. © 2019 IBM Corporation /52 CFG generates a string by repeatedly applying a production rule to a non-terminal, until there exists no non-terminal §Context-free grammar 5 = (T, Σ, U, =) – T: set of non-terminals – Σ: set of terminals – U: set of production rules – = ∈ T: the start symbol 13 Context-free grammar § Example – T = {=} – Σ = 6, 7 – U = {= → 67, = → 6=7} – = 666777
  • 14. © 2019 IBM Corporation /52 I will talk about an application of formal language to a graph generation problem §Formal language –Context-free grammar –Hyperedge replacement grammar (HRG) –HRG inference algorithm §Application to molecular graph generation –Molecular hypergraph grammar (a special case of HRG) –MHG inference algorithm –Combination with VAE 14 Contents
  • 15. © 2019 IBM Corporation /52 Hypergraph is a generalization of a graph §Hypergraph W = (T, X) consists of… –Node Y ∈ T –Hyperedge Z ∈ X ⊆ 2 ] : Connect an arbitrary number of nodes cf, An edge in a graph connects exactly two nodes 15 Hyperedge replacement grammar HyperedgeNode
  • 16. © 2019 IBM Corporation /52 HRG generates a hypergraph by repeatedly replacing non-terminal hyperedges with hypergraphs §Hyperedge replacement grammar (HRG) 5 = (T, Σ, U, =) –T: set of non-terminals –Σ: set of terminals –=: start symbol –U: set of production rules – A rule replaces a non-terminal hyperedge with a hypergraph 16 Hyperedge replacement grammar 1C 2 H H1N 2 N N NS Labels on hyperedges C N S [Feder, 71], [Pavlidis+, 72]
  • 17. © 2019 IBM Corporation /52 Start from start symbol S 17 Hyperedge replacement grammar S 1C 2 H H1N 2 N N NS Production rules ^
  • 18. © 2019 IBM Corporation /52 The left rule is applicable 18 Hyperedge replacement grammar S 1C 2 H H1N 2 N N NS Production rules ^
  • 19. © 2019 IBM Corporation /52 We obtain a hypergraph with three non-terminals 19 Hyperedge replacement grammar 1C 2 H H1N 2 N N NS Production rules ^ N N N
  • 20. © 2019 IBM Corporation /52 Apply the right rule to one of the non-terminals 20 Hyperedge replacement grammar 1C 2 H H1N 2 N N NS Production rules ^ N N N
  • 21. © 2019 IBM Corporation /52 Two non-terminals remain 21 Hyperedge replacement grammar 1C 2 H H1N 2 N N NS Production rules ^ C N N H H
  • 22. © 2019 IBM Corporation /52 Repeat the procedure until there is no non-terminal 22 Hyperedge replacement grammar 1C 2 H H1N 2 N N NS Production rules ^ C N N H H
  • 23. © 2019 IBM Corporation /52 Repeat the procedure until there is no non-terminal 23 Hyperedge replacement grammar 1C 2 H H1N 2 N N NS Production rules ^ C C N H H H H
  • 24. © 2019 IBM Corporation /52 Repeat the procedure until there is no non-terminal 24 Hyperedge replacement grammar 1C 2 H H1N 2 N N NS Production rules ^ C C N H H H H
  • 25. © 2019 IBM Corporation /52 Graph generation halts when there is no non-terminal 25 Hyperedge replacement grammar 1C 2 H H1N 2 N N NS Production rules ^ C C C H H H H H H
  • 26. © 2019 IBM Corporation /52 I will talk about an application of formal language to a graph generation problem §Formal language –Context-free grammar –Hyperedge replacement grammar (HRG) –HRG inference algorithm §Application to molecular graph generation –Molecular hypergraph grammar (a special case of HRG) –MHG inference algorithm –Combination with VAE 26 Contents
  • 27. © 2019 IBM Corporation /52 HRG inference algorithm outputs HRG that can reconstruct the input §HRG inference algorithm [Aguiñaga+, 16] –Input: Set of hypergraphs ℋ –Output: HRG such that ℋ ⊆ ℒ(HRG) Minimum requirement of the grammar’s expressiveness –Idea: Infer production rules necessary to obtain each hypergraph Decompose each hypergraph into a set of production rules 27 HRG inference algorithm Language of the grammar
  • 28. © 2019 IBM Corporation /52 Tree decomposition discovers a tree-like structure in a graph §Tree decomposition –All the nodes and edges must be included in the tree –For each node, the tree nodes that contain it must be connected 28 HRG inference algorithm * Digits represent the node correspondence 1 3 2 1 3 4 3C 4 H H 3 C 2 H H 1C 4 H H 1 C 2 H H C C H HH H C C H HH H
  • 29. © 2019 IBM Corporation /52 Tree decomposition and (a syntax tree of) HRG are equivalent §Relationship between tree decomposition and HRG 1. Connecting hypergraphs in tree recovers the original hypergraph 2. Connection ⇔ Hyperedge replacement 29 HRG inference algorithm 1 3 2 1 3 4 3C 4 H H 3 C 2 H H 1C 4 H H 1 C 2 H H C C H HH H C C H HH H
  • 30. © 2019 IBM Corporation /52 Tree decomposition and (a syntax tree of) HRG are equivalent §Relationship between tree decomposition and HRG 1. Connecting hypergraphs in tree recovers the original hypergraph 2. Connection ⇔ Hyperedge replacement 31 HRG inference algorithm 1 3 2 1 3 4 3C 4 H H 3 C 2 H H 1C 4 H H 1 C 2 H H 1C 4 H H 1 4 N 1 3 4 1 4 N Production rule = attach
  • 31. © 2019 IBM Corporation /52 Tree decomposition and (a syntax tree of) HRG are equivalent §Relationship between tree decomposition and HRG 1. Connecting hypergraphs in tree recovers the original hypergraph 2. Connection ⇔ Hyperedge replacement 32 HRG inference algorithm 1 3 2 1 3 4 3C 4 H H 3 C 2 H H 1C 4 H H 1 C 2 H H 1C 4 H H 1 4 N 1 3 4 1 4 N Production rule = attach Children
  • 32. © 2019 IBM Corporation /52 Tree decomposition and (a syntax tree of) HRG are equivalent §Relationship between tree decomposition and HRG 1. Connecting hypergraphs in tree recovers the original hypergraph 2. Connection ⇔ Hyperedge replacement 33 HRG inference algorithm 1 3 2 1 3 4 3C 4 H H 3 C 2 H H 1C 4 H H 1 C 2 H H 1C 4 H H 1 4 N 1 3 4 1 4 N Production rule = attach N N
  • 33. © 2019 IBM Corporation /52 HRG can be inferred from tree decompositions of input hypergraphs. §HRG inference algorithm [Aguiñaga+, 16] –Algorithm: –Expressiveness: ℋ ⊆ ℒ(HRG) The resultant HRG can generate all input hypergraphs. (∵ clear from its algorithm) 34 HRG inference algorithm 1. Compute tree decompositions of input hypergraphs 2. Extract production rules 3. Compose HRG by taking their union
  • 34. © 2019 IBM Corporation /52 I will talk about an application of formal language to a graph generation problem §Formal language –Context-free grammar –Hyperedge replacement grammar (HRG) –HRG inference algorithm §Application to molecular graph generation –Molecular hypergraph grammar (a special case of HRG) –MHG inference algorithm –Combination with VAE 35 Contents
  • 35. © 2019 IBM Corporation /52 We want a graph grammar that guarantees hard constraints §Objective Construct a graph grammar that never violates the valence condition §Application: Generative model of a molecule –Grammar-based generation guarantees the valence condition –Probabilistic model could learn soft constraints 36 Molecular hypergraph grammar !"
  • 36. © 2019 IBM Corporation /52 A simple application to molecular graphs doesn’t work §A simple application to molecular graphs –Input: Molecular graphs –Issue: Valence conditions can be violated 37 Molecular hypergraph grammar H C C H H H H H Input C H H C C H H C H C C C C H Tree decomposition → → → S NC NC NH C NC C N N N N C H Extracted rules This rule increases the degree of carbon
  • 37. © 2019 IBM Corporation /52 Our idea is to use a hypergraph representation of a molecule §Conserved quantity –HRG: # of nodes in a hyperedge –Our grammar: # of bonds connected to each atom (valence) ∴ Atom should be modeled as a hyperedge §Molecular hypergraph – Atom = hyperedge – Bond = node 38 Molecular hypergraph grammar C C H H HH H H C C H HH H C C H HH H
  • 38. © 2019 IBM Corporation /52 A language for molecular hypergraphs consists of two properties §Molecular hypergraph as a language A set of hypergraphs with the following properties: 1. Each node has degree 2 (=2-regular) 2. Label on a hyperedge determines # of nodes it has (= valence) 39 Molecular hypergraph grammar H HC H C H H HC H C H C C H H HH H H C C H HH H C C H HH H
  • 39. © 2019 IBM Corporation /52 MHG, a grammar for the language, is defined as a subclass of HRG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: 40 Molecular hypergraph grammar MHG HRG C C H H HH H HH C C H H HH H HH Valence # 2-regularity # This can be avoided by learning HRG from data [Aguiñaga+, 16] Use an irredundant tree decomposition (our contribution)
  • 40. © 2019 IBM Corporation /52 I will talk about an application of formal language to a graph generation problem §Formal language –Context-free grammar –Hyperedge replacement grammar (HRG) –HRG inference algorithm §Application to molecular graph generation –Molecular hypergraph grammar (a special case of HRG) –MHG inference algorithm –Combination with VAE 41 Contents
  • 41. © 2019 IBM Corporation /52 A naive application of the existing algorithm doesn’t work §Naive application of the HRG inference algorithm [Aguiñaga+, 16] –Input: Set of hypergraphs –Output: HRG w/ the following properties: • All the input hypergraphs are in the language $ • Guarantee the valence conditions $ • No guarantee on 2-regularity % 42 MHG inference algorithm C C H H HH H HH This cannot be transformed into a molecular graph
  • 42. © 2019 IBM Corporation /52 Irredundant tree decomposition is a key to guarantee 2-regularity §Irredundant tree decomposition –The connected subgraph induced by a node must be a path –Any tree decomposition can be made irredundant in poly-time 43 MHG inference algorithm 1 3 2 1 3 4 3C 4 H H 3 C 2 H H 1C 4 H H 1 C 2 H H 4 RedundantIrredundant
  • 43. © 2019 IBM Corporation /52 MHG inference algorithm is different from the existing one by two steps §MHG Inference algorithm [Kajino, 19] –Input: Set of molecular graphs –Output: MHG w/ the following properties: • All the input hypergraphs are in the language $ • Guarantee the valence conditions $ • Guarantee 2-regularity & 44 MHG inference algorithm 1. Convert molecular graphs into molecular hypergraphs 2. Compute tree decompositions of molecular hypergraphs 3. Convert each tree decomposition to be irredundant 4. Extract production rules 5. Compose MHG by taking their union Thanks to HRG Our contribution
  • 44. © 2019 IBM Corporation /52 I will talk about an application of formal language to a graph generation problem §Formal language –Context-free grammar –Hyperedge replacement grammar (HRG) –HRG inference algorithm §Application to molecular graph generation –Molecular hypergraph grammar (a special case of HRG) –MHG inference algorithm –Combination with VAE 45 Contents
  • 45. © 2019 IBM Corporation /52 We obtain (Enc, Dec) between molecule and latent vector by combining MHG and RNN-VAE §MHG-VAE: (Enc, Dec) between molecule & latent vector 46 Combination with VAE EncG EncN EncH Molecular graph Molecular hypergraph Parse Tree according to MHG ! ∈ ℝ$ Latent vector MHG-VAE encoder MHG Enc of RNN-VAE
  • 46. © 2019 IBM Corporation /52 First, we learn (Enc, Dec) between a molecule and its vector representation using MHG-VAE §Global molecular optimization [Gómez-Bombarelli+, 16] –Find: Molecule that maximizes the target –Method: VAE+BO 1. Obtain MHG from the input molecules 2. Train RNN-VAE on syntax trees 3. Obtain vector representations f. ∈ ℝh .i% ) 1. Some of which have target values {j. ∈ ℝ} 4. BO gives us candidates fk ∈ ℝh ki% l that may maximize the target 5. Decode them to obtain molecules !k ki% l 47 Combination with VAE Image from [Gómez-Bombarelli+, 16]
  • 47. © 2019 IBM Corporation /52 Given vector representations and their target values, we use BO to obtain a vector that optimizes the target §Global molecular optimization [Gómez-Bombarelli+, 16] –Find: Molecule that maximizes the target –Method: VAE+BO 1. Obtain MHG from the input molecules 2. Train RNN-VAE on syntax trees 3. Obtain vector representations f. ∈ ℝh .i% ) 1. Some of which have target values {j. ∈ ℝ} 4. BO gives us candidates fk ∈ ℝh ki% l that may maximize the target 5. Decode them to obtain molecules !k ki% l 48 Combination with VAE Image from [Gómez-Bombarelli+, 16]
  • 48. © 2019 IBM Corporation /52 We evaluate the benefit of our grammar-based representation, compared with existing ones §Empirical study –Purpose: How much does our representation facilitate VAE training? –Baselines: • {C,G,SD}VAE use SMILES (text repr.) • JT-VAE assembles molecular components – It requires NNs other than VAE for scoring –Tasks: • VAE reconstruction • Valid prior ratio • Global molecular optimization 49 Combination with VAE Image from [Jin+, 18]
  • 49. © 2019 IBM Corporation /52 Our grammar-based representation achieves better scores. This result empirically supports the effectiveness of our approach. §Result 50 Combination with VAE Synthetic accessibility score Penalty to a ring larger than six Water solubility Maximizing m(n)
  • 50. © 2019 IBM Corporation /52 A graph grammar can be a building block for a graph generative model §Classify constraints into hard ones and soft ones ML for the soft ones, rules for the hard ones §Define a language by encoding hard constraints E.g., valence conditions §Design a grammar for the language Sometime, w/ an inference algorithm Code is now public on Github https://github.com/ibm-research-tokyo/graph_grammar 51 Takeaways
  • 51. © 2019 IBM Corporation /52 [Aguiñaga+, 16] Aguiñaga, S., Palacios, R., Chiang, D., and Weninger, T.: Growing graphs from hyperedge replacement graph grammars. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 469–478, 2016. [Feder, 71] Feder, J: Plex languages. Information Sciences, 3, pp. 225-241, 1971. [Gómez-Bombarelli+, 16] Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez- Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru-Guzik, A.: Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 2018. (ArXiv ver. appears in 2016) [Jin+, 18] Jin, W., Barzilay, R., and Jaakkola, T.: Junction tree variational autoencoder for molecular graph generation. In Proceedings of the Thirty-fifth International Conference on Machine Learning, 2018. [Kajino, 19] Kajino, H.: Molecular hypergraph grammar with its application to molecular optimization. In Proceedings of the Thirty-sixth International Conference on Machine Learning, 2019. [Pavlidis, 72] Pavlidis, T.: Linear and context-free graph grammars. Journal of the ACM, 19(1), pp.11-23, 1972. [You+, 18] You, J., Liu, B., Ying, Z., Pande, V., and Leskovec, J.: Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems 31, pp. 6412–6422, 2018. . 52 References