SlideShare uma empresa Scribd logo
1 de 21
Protein Threading Using Context-
Specific Alignment Potential
Sheng Wang
http://raptorx.uchicago.edu
Toyota Technological Institute at Chicago,
Joint work with Jianzhu Ma, Feng Zhao and Jinbo Xu
ISMB 2013
Jul 22, ICC Berlin, Germany
Outline
• Where we are @ template-based modeling
• What’s our work
• What’s the problem
• What’s our solution
• Welcome to our server
Template-based Modeling (or, Threading)
• Observation
– ~50,000 non-redundant structures in PDB
– ~ 1,200 unique structure folds (SCOP)
• Methodology
– Use known structures to predict a new one
Template sequence
Query sequence DDVYILDQAEEG
DE-FIVD-PDEH
DDVYILDQAEEG
SPCKR---ADEG
DDVYILDQAEEG
E--IFVDQADDS
DDVYILDQAEEG
NMCVFGQWERTY
database
Template-based Modeling Procedures
 Easy: similar sequences → similar structures
 Sequence-based method, e.g., BLAST, FASTA
 Works only for close homologous (>70% sequence identity)
 Medium: similar profiles → similar structures
 Protein profile is a matrix that represents a multiple sequence
alignment of the similar proteins
 Profile-based method, e.g., PSI-BLAST , HHMER, HHpred,
 Works for relative remote homologous (>40% sequence identity)
 Challenge: dissimilar profiles → similar structures
 Adding structural information, or context-specific into sequence/profile
based methods
 Threading method, e.g., MUSTER, RAPTOR, CS-BLAST
 Works for distant remote homologous (<40% sequence identity)
Our Work
• CNFpred: Transform a template-sequence
alignment problem into a Machine Learning
problem to calculate the alignment’s probability.
• DeepAlign: Prepare for high quality training
data of structural alignment.
• CNF model: Combined Machine Learning model
that incorporate Conditional Random Field (CRF)
and Neural Network (NN).
Protein Alignment Model
S A L R Q
L
P
L
S
E
M
M
M
M
L P L S - E
S A - L R Q
Template
Sequence
Match states (M)
M M Is M It M
Insertion at sequence (Is)
Insertion at template (It)
The structural alignment generated by DeepAlign is used for training data
DeepAlign for Structure Alignment
• evolutionary information
• local sub-structure similarity
• angular similarity for hydrogen bonding
BLOSUM is the local amino acid substitution matrix;
CLESUM is the local sub-structure substitution matrix;
v(i,j) measures the angular similarity for hydrogen bonding;
d(i,j) measures the spatial proximity of two aligned residues.
local similarity global similarity
Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
CNF-based Alignment Model
E: a neural network estimating the log-likelihood of state transition
Z(S,T): normalization factor
1 2{ , ,..., }LA a a a { , , }i t sa M I IGiven an alignment
Define a conditional probability
between Sequence S and Template T
Where,
),(/)),,,(exp(),|( 1 TSZTSaaETSAp
i
ii 
Context-Specific
Comprehensive Features
MTYKLILN--GKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE
How similar two
residues : EAA
How similar query’s
sequence and profile and
template’s profile: Esp,
Epp
How similar template’s
secondary structure and
sequence’s predicted second
structure (3-class and 8-class):
Ess3, Ess8
Sequence S
How similar is the query’s solvent
accessibility and template’s
solvent accessibility: Esa
Total scoring function is a non-linear combination of:
E( ai, ai-1, EAA , Esp , Epp , Ediso, Ess3 , Ess8 , Esa )
Template T
MTYKLILNSTVRTKSDTVTDAVP---ADKICSFAQQLPWEREWSF--
For disordered regions, Ediso,
no structure information used.
What’s the problem?
• Only the alignment probability is described,
instead of the log-odds potential compared to
background.
• Only incorporate local information, insufficient
of global information.
Our solution
Propose a protein alignment potential
• With an elaborately designed reference state.
• Can be generalized into sequence-sequence,
sequence-structure as well as structure-structure
alignment.
Incorporate both local and global terms
• For local term, CNFpred potential is applied.
• For global term, EPAD potential is employed.
Protein alignment potential
Similarly, given one alignment A between sequence S and template T,
we define the potential of A as follows.
N
N
i
ref
yxAP
TSAP
AP
TSAP
TSAu
 


1
),|(
),|(
log
)(
),|(
log),|(
Given 2 AAs a and b, their mutation potential is defined as follows.
)()(
)(
log
)(
)(
log)(
bPaP
baP
baP
baP
bau
ref





x and y are two random proteins with
the as S and T, respectively.
Assumption: the alignment maximizing the potential is the optimal.
),(/)),|(),|(exp(),|( TSZTSAGTSAFTSAP 
The alignment probability given sequence S and template T could be modeled
as follows,
local term global term
partition function

A
TSAPtsZ ),|(),(
Protein alignment potential
),(),|(),|(
),|(),|(
),(/)),|(),|(exp(
),(/)),|(),|(exp(
log
),|(
),|(
log),|(
,
,
1
1
TScyxAGEXPTSAG
yxAFEXPTSAF
yxZyxAGyxAF
TSZTSAGTSAF
yxAP
TSAP
TSAu
yx
yx
N
N
i
N
N
i










Expected score, can be calculated in advance by sampling
Independent of any
specific alignment.
Protein alignment potential
Model the local potential
 
i
ii TSaaETSAF ),,,(),|( 1
From CNFpred, we use a context-specific linear chain model as,
The expectation term can be calculated by uniformly sampling a few
thousand protein pairs, so the local potential is
The local potential is defined as,
),|(),|(),|( , yxAFEXPTSAFTSAU yxlocal 
  
i
iiiilocal aaETSaaETSAU )),(),,,((),|( 11
Maximize on probability Maximize on potential
Long but less informative and
highly false positive.
Good for building models.
Template Template
Sequence
Sequence
Short but relevant and highly
significant.
Good for ranking templates.
What’s the difference between
Model the global potential


ji
ji
T
ij ssdPTSAG ),|(log),|(
From EPAD, we use a context-specific distance-dependent model as,
The expectation term can be calculated by uniformly sampling a few
thousand residue pairs from templates, so the global potential is
The global potential is defined as,
),|(),|(),|( , yxAGEXPTSAGTSAU yxglobal 


ji
T
ijji
T
ijglobal dPssdPTSAU ))(log),|((log),|(
What’s global information given an
alignment?
i j
i j


ji
ji
T
ij ssdPTSAG ),|(log),|(
Template T
Sequence S
T
ijd
T
ijd
i j
If the alignment is good, the distance of a sequence residue pair
shall match well with that of their aligned template residue pair.
si
sj
Result on 1000*6000
CNFpred (local+global potential) compared to,
HHpred CNFpred (local potential)
Welcome to our server
http://raptorx.uchicago.edu/
Binding
Contact
Thank you 
Jinbo Xu
Feng Zhao
Jianzhu Ma
National Institutes of Health (R01GM0897532)
National Science Foundation (DBI-0960390)
NSF CAREER award CCF-1149811
Alfred P. Sloan Research Fellowship

Mais conteúdo relacionado

Mais procurados

Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure predictionSamvartika Majumdar
 
Protein structure 2
Protein structure 2Protein structure 2
Protein structure 2Rainu Rajeev
 
Protein computational analysis
Protein computational analysisProtein computational analysis
Protein computational analysisKinza Irshad
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure predictionSubin E K
 
Homology modeling and molecular docking
Homology modeling and molecular dockingHomology modeling and molecular docking
Homology modeling and molecular dockingRangika Munaweera
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonNatalio Krasnogor
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYShikha Popali
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure predictionkaramveer prajapat
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_predictionShwetA Kumari
 
Protein Remote Homology Detection
Protein Remote Homology DetectionProtein Remote Homology Detection
Protein Remote Homology DetectionAlia Hamwi
 
Protien Structure Prediction
Protien Structure PredictionProtien Structure Prediction
Protien Structure PredictionSelimReza76
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and SimulationsAbhilash Kannan
 
Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Melvin Alex
 
Protein structure prediction (1)
Protein structure prediction (1)Protein structure prediction (1)
Protein structure prediction (1)Sabahat Ali
 

Mais procurados (20)

Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein structure 2
Protein structure 2Protein structure 2
Protein structure 2
 
Protein computational analysis
Protein computational analysisProtein computational analysis
Protein computational analysis
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
 
Molecular modelling (1)
Molecular modelling (1)Molecular modelling (1)
Molecular modelling (1)
 
Homology modeling and molecular docking
Homology modeling and molecular dockingHomology modeling and molecular docking
Homology modeling and molecular docking
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
 
Protein Remote Homology Detection
Protein Remote Homology DetectionProtein Remote Homology Detection
Protein Remote Homology Detection
 
Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
 
Protien Structure Prediction
Protien Structure PredictionProtien Structure Prediction
Protien Structure Prediction
 
Molecular dynamics and Simulations
Molecular dynamics and SimulationsMolecular dynamics and Simulations
Molecular dynamics and Simulations
 
Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)Homology modeling of proteins (ppt)
Homology modeling of proteins (ppt)
 
Sir hussain
Sir hussainSir hussain
Sir hussain
 
Protein structure prediction (1)
Protein structure prediction (1)Protein structure prediction (1)
Protein structure prediction (1)
 

Semelhante a Protein threading using context specific alignment potential ismb-2013

lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shahguest5de83e
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerRothamsted Research, UK
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonbomxuan868
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia岳華 杜
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.pptSilpa87
 
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAM
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAMOptimization of Test Pattern Using Genetic Algorithm for Testing SRAM
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAMIJERA Editor
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 
Foundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual SystemsFoundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual Systemsijtsrd
 
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories ISSEL
 
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού ΛογισμικούΕξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού ΛογισμικούISSEL
 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody
 

Semelhante a Protein threading using context specific alignment potential ismb-2013 (20)

lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shah
 
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMinerA Preliminary survey of RDF/Neo4j as backends for KnetMiner
A Preliminary survey of RDF/Neo4j as backends for KnetMiner
 
So sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparisonSo sánh cấu trúc protein_Protein structure comparison
So sánh cấu trúc protein_Protein structure comparison
 
Introduction to Julia
Introduction to JuliaIntroduction to Julia
Introduction to Julia
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
PPT
PPTPPT
PPT
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAM
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAMOptimization of Test Pattern Using Genetic Algorithm for Testing SRAM
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAM
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Colombo14a
Colombo14aColombo14a
Colombo14a
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Foundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual SystemsFoundation and Synchronization of the Dynamic Output Dual Systems
Foundation and Synchronization of the Dynamic Output Dual Systems
 
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories Abstract - Mining Source Code Change Patterns from Open-Source Repositories
Abstract - Mining Source Code Change Patterns from Open-Source Repositories
 
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού ΛογισμικούΕξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
Εξαγωγή Προτύπων Αλλαγών Κώδικα από Αποθετήρια Ανοικτού Λογισμικού
 
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...
 

Último

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Protein threading using context specific alignment potential ismb-2013

  • 1. Protein Threading Using Context- Specific Alignment Potential Sheng Wang http://raptorx.uchicago.edu Toyota Technological Institute at Chicago, Joint work with Jianzhu Ma, Feng Zhao and Jinbo Xu ISMB 2013 Jul 22, ICC Berlin, Germany
  • 2. Outline • Where we are @ template-based modeling • What’s our work • What’s the problem • What’s our solution • Welcome to our server
  • 3. Template-based Modeling (or, Threading) • Observation – ~50,000 non-redundant structures in PDB – ~ 1,200 unique structure folds (SCOP) • Methodology – Use known structures to predict a new one Template sequence Query sequence DDVYILDQAEEG DE-FIVD-PDEH DDVYILDQAEEG SPCKR---ADEG DDVYILDQAEEG E--IFVDQADDS DDVYILDQAEEG NMCVFGQWERTY database
  • 4. Template-based Modeling Procedures  Easy: similar sequences → similar structures  Sequence-based method, e.g., BLAST, FASTA  Works only for close homologous (>70% sequence identity)  Medium: similar profiles → similar structures  Protein profile is a matrix that represents a multiple sequence alignment of the similar proteins  Profile-based method, e.g., PSI-BLAST , HHMER, HHpred,  Works for relative remote homologous (>40% sequence identity)  Challenge: dissimilar profiles → similar structures  Adding structural information, or context-specific into sequence/profile based methods  Threading method, e.g., MUSTER, RAPTOR, CS-BLAST  Works for distant remote homologous (<40% sequence identity)
  • 5. Our Work • CNFpred: Transform a template-sequence alignment problem into a Machine Learning problem to calculate the alignment’s probability. • DeepAlign: Prepare for high quality training data of structural alignment. • CNF model: Combined Machine Learning model that incorporate Conditional Random Field (CRF) and Neural Network (NN).
  • 6. Protein Alignment Model S A L R Q L P L S E M M M M L P L S - E S A - L R Q Template Sequence Match states (M) M M Is M It M Insertion at sequence (Is) Insertion at template (It) The structural alignment generated by DeepAlign is used for training data
  • 7. DeepAlign for Structure Alignment • evolutionary information • local sub-structure similarity • angular similarity for hydrogen bonding BLOSUM is the local amino acid substitution matrix; CLESUM is the local sub-structure substitution matrix; v(i,j) measures the angular similarity for hydrogen bonding; d(i,j) measures the spatial proximity of two aligned residues. local similarity global similarity Score(i,j)=( max(0,BLOSUM(i,j) )+CLESUM(i,j) )*v(i,j)*d(i,j)
  • 8. CNF-based Alignment Model E: a neural network estimating the log-likelihood of state transition Z(S,T): normalization factor 1 2{ , ,..., }LA a a a { , , }i t sa M I IGiven an alignment Define a conditional probability between Sequence S and Template T Where, ),(/)),,,(exp(),|( 1 TSZTSaaETSAp i ii  Context-Specific
  • 9. Comprehensive Features MTYKLILN--GKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE How similar two residues : EAA How similar query’s sequence and profile and template’s profile: Esp, Epp How similar template’s secondary structure and sequence’s predicted second structure (3-class and 8-class): Ess3, Ess8 Sequence S How similar is the query’s solvent accessibility and template’s solvent accessibility: Esa Total scoring function is a non-linear combination of: E( ai, ai-1, EAA , Esp , Epp , Ediso, Ess3 , Ess8 , Esa ) Template T MTYKLILNSTVRTKSDTVTDAVP---ADKICSFAQQLPWEREWSF-- For disordered regions, Ediso, no structure information used.
  • 10. What’s the problem? • Only the alignment probability is described, instead of the log-odds potential compared to background. • Only incorporate local information, insufficient of global information.
  • 11. Our solution Propose a protein alignment potential • With an elaborately designed reference state. • Can be generalized into sequence-sequence, sequence-structure as well as structure-structure alignment. Incorporate both local and global terms • For local term, CNFpred potential is applied. • For global term, EPAD potential is employed.
  • 12. Protein alignment potential Similarly, given one alignment A between sequence S and template T, we define the potential of A as follows. N N i ref yxAP TSAP AP TSAP TSAu     1 ),|( ),|( log )( ),|( log),|( Given 2 AAs a and b, their mutation potential is defined as follows. )()( )( log )( )( log)( bPaP baP baP baP bau ref      x and y are two random proteins with the as S and T, respectively. Assumption: the alignment maximizing the potential is the optimal.
  • 13. ),(/)),|(),|(exp(),|( TSZTSAGTSAFTSAP  The alignment probability given sequence S and template T could be modeled as follows, local term global term partition function  A TSAPtsZ ),|(),( Protein alignment potential
  • 15. Model the local potential   i ii TSaaETSAF ),,,(),|( 1 From CNFpred, we use a context-specific linear chain model as, The expectation term can be calculated by uniformly sampling a few thousand protein pairs, so the local potential is The local potential is defined as, ),|(),|(),|( , yxAFEXPTSAFTSAU yxlocal     i iiiilocal aaETSaaETSAU )),(),,,((),|( 11
  • 16. Maximize on probability Maximize on potential Long but less informative and highly false positive. Good for building models. Template Template Sequence Sequence Short but relevant and highly significant. Good for ranking templates. What’s the difference between
  • 17. Model the global potential   ji ji T ij ssdPTSAG ),|(log),|( From EPAD, we use a context-specific distance-dependent model as, The expectation term can be calculated by uniformly sampling a few thousand residue pairs from templates, so the global potential is The global potential is defined as, ),|(),|(),|( , yxAGEXPTSAGTSAU yxglobal    ji T ijji T ijglobal dPssdPTSAU ))(log),|((log),|(
  • 18. What’s global information given an alignment? i j i j   ji ji T ij ssdPTSAG ),|(log),|( Template T Sequence S T ijd T ijd i j If the alignment is good, the distance of a sequence residue pair shall match well with that of their aligned template residue pair. si sj
  • 19. Result on 1000*6000 CNFpred (local+global potential) compared to, HHpred CNFpred (local potential)
  • 20. Welcome to our server http://raptorx.uchicago.edu/ Binding Contact
  • 21. Thank you  Jinbo Xu Feng Zhao Jianzhu Ma National Institutes of Health (R01GM0897532) National Science Foundation (DBI-0960390) NSF CAREER award CCF-1149811 Alfred P. Sloan Research Fellowship

Notas do Editor

  1. Currently, template-based modeling is the main-stream approach in protein structure prediction. This is based on the observation that although we have around 50,000 non-redundant structures in PDB, the unique structure fold in SCOP is only about 12 hundred. And what most important thing is, in recent years after 2010, the new unique fold less appeared, which implies that number of naturally occurring protein fold is limited, and this becomes a fundamental assumption that, we could use known structures to predict an unknown query sequence.More formally, the definition of template-based modelingis, given a query protein one-dimension amino acid sequence, and a template database with known three-dimension structure, we align each template and query to find the best match and build the query model upon the template.
  2. Here we move into the first part, how to define the label for protein alignment data. In details, we transfer an alignment path into a series of continuous labels with M,Is and It, these three states. So there are nine adjacent state transitions in total.After defined the label, we could apply DeepAlign to generate the training data by structurally similar proteins.