HELM, which was originally developed by Pfizer, provides a way to represent molecules that are too large to represent atomically or which contain non-natural chemical modifications that make it impractical to represent them as sequences.
HELM's structure hierarchy consists of complex and simple polymers, monomers, and atoms. It describes monomers using atoms and bonds, single-type polymers are described as a sequence of monomers, and complex multi-type polymers are described as connected polymers.
A detailed description of HELM is available in a paper that was published in the Journal of Chemical Information and Modeling.
2. What is a “Biomolecule”?
2
Peptides
Therapeutic
Proteins
ADCs
Antibodies
Vaccines
ASOs
siRNAs
For our purposes, anything
that is not a small molecule is
a biomolecule
Goal
• Eliminate biomolecule
penalty
• Make these entities first-
class citizens of the
Informatics tool portfolio
3. G
A
P
So what’s the problem?
3
N
NH
O
O
O
N
NH
O
O
O
Small
Molecules
Sequences
Biomolecules
Small Molecule Tools Sequence-Based Tools
4. “Fit-for-Purpose” Structure Representation
We need to enable the
representation, manipulation and
visualization of each molecule type in
a way that is appropriate for its size
and complexity
4
5. Fit for Purpose: “Monomer” Level
• While you could draw out an oligonucleotide like this:
• The representation is likely more intuitive / practical:
5
6. Fit for Purpose: Sequence Level
• But even the monomer level representation would not scale well to
proteins with hundreds of amino acids. Larger molecules require a
more sequence-oriented representation:
6
7. Fit for Purpose: Component Level
• For multi-component structures such as antibody drug
conjugates, component level representations are required to enable
each component to dealt with separately.
7
“Collapsed” Antibody
Expanded Drug
Ab
8. Hierarchical Editing Language for Macromolecules
– Hierarchical – Amenable to the various “levels”
• Complex Polymer ⇒ Simple Polymer ⇒ Monomer ⇒ Atom
– Extensible
• Allowing addition of new biopolymer types
– (Reasonably) comprehensive
• e.g. Allowing representation of oligonucleotide
hybridization
– Canonicalizable
• Facilitating uniqueness checking
– (Somewhat) human-readable
8
11. Monomer Database
• Each monomer used in the notation needs to be predefined in a
monomer database
• The database includes the chemical structure of the monomer and
a description of all acceptable attachment points
11