SlideShare a Scribd company logo
1 of 13
2023. 03. 31
Van Thuy Hoang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: hoangvanthuy90@gmail.com
1
Contributions
 main contribution:
 a new design of the self-attention formula for molecular graphs
that carefully handles various input features to obtain
increased accuracy and robustness in numerous chemical
domains.
 Tackle the aforementioned issues with Relative Molecule Attention
Transformer (RMAT), our pre-trained transformer-based model,
shown in Figure 1.
2
Relative Molecule Self-Attention layer
 (a) neighbourhood embedding one-hot encodes graph distances (neighbourhood order)
from the source node marked with an arrow;
 (b) bond embedding one-hot encodes the bond order (numbers next to the graph
edges) and other bond features for neighbouring nodes;
 (c) distance embedding uses radial basis functions to encode pairwise distances in the 3D
space
3
RELATIVE MOLECULE SELF-ATTENTION
 Molecule Attention Transformer Our work is closely related to Molecule Attention
Transformer (MAT), a transformer-based model with self-attention tailored to processing
molecular data.
 In contrast to aforementioned approaches, MAT incorporates the distance information in
its self-attention module.
 MAT stacks N Molecule Self-Attention blocks followed by a mean pooling and a
prediction layer. For a D-dimensional state x ∈ R^D, the standard.
 Vanilla self-attention operation is defined as:
weights given to individual parts of the attention module
4
Neighbourhood embeddings
 Encode the neighbourhood order between two atoms as a 6 dimensional one hot
encoding
 How many other vertices are between nodes i and j in the original molecular graph
5
Distance embeddings
 Hypothesis: a much more flexible representation of the distance information should be
facilitated in MAT.
 To achieve this, a radial basis distance encoding proposed by:
(d is the distance between two atoms).
6
RELATIVE MOLECULE SELF-ATTENTION
 A hidden layer, shared between all attention heads and output layer, that create a
separate relative embedding for different attention heads.
 in Relative Molecule Self-Attention: e_{ij} as:
if there exists a bo
nd between atoms
i and j
7
RELATIVE MOLECULE ATTENTION TRANSFORMER
 Replace simple mean pooling with an attention-based pooling layer.
 After applying N self-attention layers, we use the following self-attention pooling to get
the graph-level embedding of the molecule:
 Where:
 H is the hidden state obtained from self-attention layers,
 P equal to the pooling hidden dimension and S equal to the number of pooling
attention heads.
 g is then passed to the two layer MLP, with leaky-ReLU activation in order to make
the prediction
8
Pretraining
 two-step pretraining procedure.
 network is trained with the contextual property prediction task where we mask not
only selected atoms, but also their neighbours.
 The goal of the task is to predict the whole atom context. This task is much more
demanding for the network than the classical masking approach presented by
 The second task is a graph-level prediction to predict a set of real-valued descriptors
of properties.
9
Results on molecule property prediction benchmark
 First two datasets are regression tasks (lower is better)
 other datasets are classification tasks (higher is better)
10
LARGE-SCALE EXPERIMENTS
 Models are fine-tuned under a large hyperparameters budget.
 Models fine-tuned with only tuning the learning rate are presented in the last group.
 The last dataset is classification task (higher is better), the remaining datasets are
regression tasks (lower is better)
11
Inconclusion
 Relative Molecule Attention Transformer, a model based on Relative Molecule
SelfAttention, achieves state-of-the-art or very competitive results across a wide range of
molecular property prediction tasks.
 R-MAT is a highly versatile model, showing state-of-the-art results in both quantum
property prediction tasks
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transformer", ICLR 2022

More Related Content

Similar to NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transformer", ICLR 2022

AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 
MultiLevelROM2_Washinton
MultiLevelROM2_WashintonMultiLevelROM2_Washinton
MultiLevelROM2_Washinton
Mohammad Abdo
 
Finite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUIFinite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUI
Colby White
 

Similar to NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transformer", ICLR 2022 (20)

Development of Multi-level Reduced Order MOdeling Methodology
Development of Multi-level Reduced Order MOdeling MethodologyDevelopment of Multi-level Reduced Order MOdeling Methodology
Development of Multi-level Reduced Order MOdeling Methodology
 
Em molnar2015
Em molnar2015Em molnar2015
Em molnar2015
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
Macromodel of High Speed Interconnect using Vector Fitting Algorithm
Macromodel of High Speed Interconnect using Vector Fitting AlgorithmMacromodel of High Speed Interconnect using Vector Fitting Algorithm
Macromodel of High Speed Interconnect using Vector Fitting Algorithm
 
MultiLevelROM2_Washinton
MultiLevelROM2_WashintonMultiLevelROM2_Washinton
MultiLevelROM2_Washinton
 
International Journal of Advanced Smart Sensor Network Systems (IJASSN)
International Journal of Advanced Smart Sensor Network Systems (IJASSN)International Journal of Advanced Smart Sensor Network Systems (IJASSN)
International Journal of Advanced Smart Sensor Network Systems (IJASSN)
 
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTORARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
ARTIFICIAL NEURAL NETWORK APPROACH TO MODELING OF POLYPROPYLENE REACTOR
 
Identifying Critical Neurons in ANN Architectures using Mixed Integer Program...
Identifying Critical Neurons in ANN Architectures using Mixed Integer Program...Identifying Critical Neurons in ANN Architectures using Mixed Integer Program...
Identifying Critical Neurons in ANN Architectures using Mixed Integer Program...
 
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
Using Multi-layered Feed-forward Neural Network (MLFNN) Architecture as Bidir...
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Theories of error back propagation in the brain review
Theories of error back propagation in the brain reviewTheories of error back propagation in the brain review
Theories of error back propagation in the brain review
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
 
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
Proximity Detection in Distributed Simulation of Wireless Mobile SystemsProximity Detection in Distributed Simulation of Wireless Mobile Systems
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
 
Bistablecamnets
BistablecamnetsBistablecamnets
Bistablecamnets
 
Finite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUIFinite_Element_Analysis_with_MATLAB_GUI
Finite_Element_Analysis_with_MATLAB_GUI
 
2006 ssiai
2006 ssiai2006 ssiai
2006 ssiai
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 
Molecular dynamics and namd simulation
Molecular dynamics and namd simulationMolecular dynamics and namd simulation
Molecular dynamics and namd simulation
 
Using spectral radius ratio for node degree
Using spectral radius ratio for node degreeUsing spectral radius ratio for node degree
Using spectral radius ratio for node degree
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 

More from ssuser4b1f48

More from ssuser4b1f48 (20)

NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
NS-CUK Seminar: J.H.Lee, Review on "Graph Propagation Transformer for Graph R...
 
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.B.Kim,  Review on "Cluster-GCN: An Efficient Algorithm for ...NS-CUK Seminar: H.B.Kim,  Review on "Cluster-GCN: An Efficient Algorithm for ...
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
 
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar: H.E.Lee,  Review on "Weisfeiler and Leman Go Neural: Higher-O...NS-CUK Seminar: H.E.Lee,  Review on "Weisfeiler and Leman Go Neural: Higher-O...
NS-CUK Seminar: H.E.Lee, Review on "Weisfeiler and Leman Go Neural: Higher-O...
 
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
NS-CUK Seminar:V.T.Hoang, Review on "GRPE: Relative Positional Encoding for G...
 
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
 
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
Aug 22nd, 2023: Case Studies - The Art and Science of Animation Production)
 
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
Aug 17th, 2023: Case Studies - Examining Gamification through Virtual/Augment...
 
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
Aug 10th, 2023: Case Studies - The Power of eXtended Reality (XR) with 360°
 
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
Aug 8th, 2023: Case Studies - Utilizing eXtended Reality (XR) in Drones)
 
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
NS-CUK Seminar: J.H.Lee, Review on "Learnable Structural Semantic Readout for...
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
 
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
NS-CUK Seminar:V.T.Hoang, Review on "Augmentation-Free Self-Supervised Learni...
 
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
NS-CUK Journal club: H.E.Lee, Review on " A biomedical knowledge graph-based ...
 
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
 
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...NS-CUK Seminar: H.B.Kim,  Review on "Inductive Representation Learning on Lar...
NS-CUK Seminar: H.B.Kim, Review on "Inductive Representation Learning on Lar...
 
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...NS-CUK Seminar: H.E.Lee,  Review on "PTE: Predictive Text Embedding through L...
NS-CUK Seminar: H.E.Lee, Review on "PTE: Predictive Text Embedding through L...
 
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
NS-CUK Seminar: J.H.Lee, Review on "Relational Self-Supervised Learning on Gr...
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: H.E.Lee,  Review on "Graph Star Net for Generalized Multi-Tas...NS-CUK Seminar: H.E.Lee,  Review on "Graph Star Net for Generalized Multi-Tas...
NS-CUK Seminar: H.E.Lee, Review on "Graph Star Net for Generalized Multi-Tas...
 
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar:  V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...NS-CUK Seminar:  V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
NS-CUK Seminar: V.T.Hoang, Review on "Namkyeong Lee, et al. Relational Self-...
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transformer", ICLR 2022

  • 1. 2023. 03. 31 Van Thuy Hoang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: hoangvanthuy90@gmail.com
  • 2. 1 Contributions  main contribution:  a new design of the self-attention formula for molecular graphs that carefully handles various input features to obtain increased accuracy and robustness in numerous chemical domains.  Tackle the aforementioned issues with Relative Molecule Attention Transformer (RMAT), our pre-trained transformer-based model, shown in Figure 1.
  • 3. 2 Relative Molecule Self-Attention layer  (a) neighbourhood embedding one-hot encodes graph distances (neighbourhood order) from the source node marked with an arrow;  (b) bond embedding one-hot encodes the bond order (numbers next to the graph edges) and other bond features for neighbouring nodes;  (c) distance embedding uses radial basis functions to encode pairwise distances in the 3D space
  • 4. 3 RELATIVE MOLECULE SELF-ATTENTION  Molecule Attention Transformer Our work is closely related to Molecule Attention Transformer (MAT), a transformer-based model with self-attention tailored to processing molecular data.  In contrast to aforementioned approaches, MAT incorporates the distance information in its self-attention module.  MAT stacks N Molecule Self-Attention blocks followed by a mean pooling and a prediction layer. For a D-dimensional state x ∈ R^D, the standard.  Vanilla self-attention operation is defined as: weights given to individual parts of the attention module
  • 5. 4 Neighbourhood embeddings  Encode the neighbourhood order between two atoms as a 6 dimensional one hot encoding  How many other vertices are between nodes i and j in the original molecular graph
  • 6. 5 Distance embeddings  Hypothesis: a much more flexible representation of the distance information should be facilitated in MAT.  To achieve this, a radial basis distance encoding proposed by: (d is the distance between two atoms).
  • 7. 6 RELATIVE MOLECULE SELF-ATTENTION  A hidden layer, shared between all attention heads and output layer, that create a separate relative embedding for different attention heads.  in Relative Molecule Self-Attention: e_{ij} as: if there exists a bo nd between atoms i and j
  • 8. 7 RELATIVE MOLECULE ATTENTION TRANSFORMER  Replace simple mean pooling with an attention-based pooling layer.  After applying N self-attention layers, we use the following self-attention pooling to get the graph-level embedding of the molecule:  Where:  H is the hidden state obtained from self-attention layers,  P equal to the pooling hidden dimension and S equal to the number of pooling attention heads.  g is then passed to the two layer MLP, with leaky-ReLU activation in order to make the prediction
  • 9. 8 Pretraining  two-step pretraining procedure.  network is trained with the contextual property prediction task where we mask not only selected atoms, but also their neighbours.  The goal of the task is to predict the whole atom context. This task is much more demanding for the network than the classical masking approach presented by  The second task is a graph-level prediction to predict a set of real-valued descriptors of properties.
  • 10. 9 Results on molecule property prediction benchmark  First two datasets are regression tasks (lower is better)  other datasets are classification tasks (higher is better)
  • 11. 10 LARGE-SCALE EXPERIMENTS  Models are fine-tuned under a large hyperparameters budget.  Models fine-tuned with only tuning the learning rate are presented in the last group.  The last dataset is classification task (higher is better), the remaining datasets are regression tasks (lower is better)
  • 12. 11 Inconclusion  Relative Molecule Attention Transformer, a model based on Relative Molecule SelfAttention, achieves state-of-the-art or very competitive results across a wide range of molecular property prediction tasks.  R-MAT is a highly versatile model, showing state-of-the-art results in both quantum property prediction tasks