SlideShare uma empresa Scribd logo
1 de 15
Transcription Factor-DNA binding
                 prediction
Tahmina Ahmed
Prosunjit Biswas
Iffat Sharmin Chowdhury
Badri Sampath




                                 1
Motivation
• Label the unlabeled DNA sequences by the model,
  built by examining the labeled DNA sequences
  and be able to perceive some real world Machine
  Learning problems.




                                        2
Approaches
• K-mer based
     Fixed length K-mer
     K-mer with Mismatches
     Using Regular Expression
• PWM based
     MEME and MAST
• Combined Model
    Unite both model




                                3
K-mer Approach Based on Regular
                Expression
Motivation
  2-mer appears mostly in the sequences. So, emphasize
 mostly on 2-mer.

Strategy
  - For any two 2-mers X & Y, generate regular expression
  X(.*)Y and Y(.*)X.
  - Use these Regular expression as candidate attribute.
Classifier Selection




                        Fig : Around 9 classifiers applied on TF data set
Algorithms are numbered as follows -
      (1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging
                               7)LogitBoost (8)RandomForest (9)J48
Summary -
     * 9 classifiers are applied on 10 data set. 3 are shown among them
     * choosing an absolute classifier is not a trivial task
     * same classifier behaves differently on different data sets
                                                                            5
Change in Accuracy due to Different Classifiers




                 Logistic         J48       RandomForest     NaiveBayes                      Logistic         J48       RandomForest     NaiveBayes

 Fig : The performance of different types of Classifiers on TF_3 data set   Fig : The performance of different types of Classifiers on TF_5 data set




Summary -
       * classifiers have great consequences on accuracy
       * one has to be prudent when choosing classifiers

                                                                                                                       6
Change in Accuracy due to Different K-mer
                  Length




                        4-mer             5-mer             6-mer
            Fig : The performance of different length K-mer on TF_3 data set


Summary -
    * K-mer length also has consequences on accuracy
    * not trivial, difficult to find the absolute one


                                                                               7
Attribute Space Selection




        Fig : The performance of different selecting k-mer on TF_4 data set


Summary -
    * considering number of attributes also has consequences on accuracy
    * accuracy increases if we consider greater number of attributes, but from such
   saturation point it decreases.


                                                                              8
PWM based Analysis on Accuracy
                       (TF_1 data set)




Fig : J48, minW 6 - maxW 15, no. of sites 10               Fig : J48, minW 6 – maxW 15, no. of motifs 5
Summary -
      * accuracy increases when we have more motifs but fixed no. of sites
      * accuracy increases when we have more sites but fixed no. of motifs
      * what happened when we increases both ?????


                                                                                 9
PWM based Analysis




                            Fig : Accuracy vary on no. of motifs and no. of sites


* 1st bar concern with no. of sites
* 2nd bar concern with no. of motifs
* 3rd bar concern with accuracy
* the point is that accuracy decreases when we increases no. of motifs and no. of sites.
Extra Work for TF_20


                  Sequences
                identified by
                 both model
K-mer
                                                                   The New Model
  +                                                                  for TF-20
Pwm              Sequences         Biased 2-         Newly
                 identified        mer Model         Labeled
                 differently                        Sequences



              Fig : Flow diagram of Building New Model for TF-20


Summary -
    * we have done some extra work for TF_20
AUC based on the Feedback (bonus model)




                    Fig : AUC of 10 data sets based on last submission


* accuracy improved than first submission
* PWM does not have pleasant result



                                                                         12
Participation
            Background      Working      Working   Paramete   Automation
              Study         with Tools    with     r Tuning
                                         Models
  Badri     DNA,RNA,        AlignAce,     PWM       K-mer     Arff Writer,
 Sampath     protein,        MEME,                            Mast output
              motif          MAST                               writer
   Iffat      Protein,       Weka,        K-mer     PWM        Script for
 Sharmin       Motif,       AlignAce,                          FASTA,
Chowdhury   Transcriptio    ScanAce                             Weka
                 n
Prosunjit      DNA,          MEME,        K-mer     PWM        Script for
 Biswas     Transcriptio     MAST                             RE, for new
              nK-mer                                            model
 Tahmina      MEME,          MEME,        PWM       K-mer      Script for
  Ahmed       MAST,          MAST,                              MEME,
              PWM             Weka                              MAST




                                                                   13
Acknowledgment




                 14
Questions ???

Mais conteúdo relacionado

Destaque (7)

Presentation
PresentationPresentation
Presentation
 
Dna protein
Dna proteinDna protein
Dna protein
 
DNA binding Domains
DNA binding DomainsDNA binding Domains
DNA binding Domains
 
Protein dna interaction
Protein dna interactionProtein dna interaction
Protein dna interaction
 
Lac operon
Lac operonLac operon
Lac operon
 
Dna fingerprinting powerpoint
Dna fingerprinting powerpointDna fingerprinting powerpoint
Dna fingerprinting powerpoint
 
Protein – DNA interactions, an overview
Protein – DNA interactions, an overviewProtein – DNA interactions, an overview
Protein – DNA interactions, an overview
 

Semelhante a Transcription Factor DNA Binding Prediction

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduhoKim Du-Ho
 
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...TSC University of Mondragon
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
Pragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsPragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsUniversität Rostock
 
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...Naoki Shibata
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
Exploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionExploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionSebastian Hafner
 
Presentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturnePresentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturneRenuda SARL
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization MachineInMobi
 
Ai final ppt with InMobi template
Ai  final ppt with InMobi templateAi  final ppt with InMobi template
Ai final ppt with InMobi templateGunjan Sharma
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMOptimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMcloudSME
 
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...Mumbai B.Sc.IT Study
 

Semelhante a Transcription Factor DNA Binding Prediction (20)

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho
 
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
Pragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementationsPragmatic model checking: from theory to implementations
Pragmatic model checking: from theory to implementations
 
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
 
The CTO's Espresso Guide to SON
The CTO's Espresso Guide to SONThe CTO's Espresso Guide to SON
The CTO's Espresso Guide to SON
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Exploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognitionExploiting contextual information for improved phoeneme recognition
Exploiting contextual information for improved phoeneme recognition
 
Presentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_SaturnePresentation of the open source CFD code Code_Saturne
Presentation of the open source CFD code Code_Saturne
 
BWA-MEM2-IPDPS 2019
BWA-MEM2-IPDPS 2019BWA-MEM2-IPDPS 2019
BWA-MEM2-IPDPS 2019
 
Neural Field aware Factorization Machine
Neural Field aware Factorization MachineNeural Field aware Factorization Machine
Neural Field aware Factorization Machine
 
Ai final ppt with InMobi template
Ai  final ppt with InMobi templateAi  final ppt with InMobi template
Ai final ppt with InMobi template
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
Solido Pvt Corner Package Datasheet
Solido Pvt Corner Package DatasheetSolido Pvt Corner Package Datasheet
Solido Pvt Corner Package Datasheet
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCMOptimization of Electrical Machines in the Cloud with SyMSpace by LCM
Optimization of Electrical Machines in the Cloud with SyMSpace by LCM
 
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...
 
UNIT 2.pptx
UNIT 2.pptxUNIT 2.pptx
UNIT 2.pptx
 

Mais de UT, San Antonio

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formatsUT, San Antonio
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with SonarlintUT, San Antonio
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityUT, San Antonio
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelUT, San Antonio
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)UT, San Antonio
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystoneUT, San Antonio
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjitUT, San Antonio
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based EncryptionUT, San Antonio
 
Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction UT, San Antonio
 

Mais de UT, San Antonio (20)

digital certificate - types and formats
digital certificate - types and formatsdigital certificate - types and formats
digital certificate - types and formats
 
Saml metadata
Saml metadataSaml metadata
Saml metadata
 
Static Analysis with Sonarlint
Static Analysis with SonarlintStatic Analysis with Sonarlint
Static Analysis with Sonarlint
 
Shellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerabilityShellshock- from bug towards vulnerability
Shellshock- from bug towards vulnerability
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Abac17 prosun-slides
Abac17 prosun-slidesAbac17 prosun-slides
Abac17 prosun-slides
 
Recitation
RecitationRecitation
Recitation
 
Recitation
RecitationRecitation
Recitation
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Enumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) modelEnumerated authorization policy ABAC (EP-ABAC) model
Enumerated authorization policy ABAC (EP-ABAC) model
 
Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)Where is my Privacy presentation slideshow (one page only)
Where is my Privacy presentation slideshow (one page only)
 
Three month course
Three month courseThree month course
Three month course
 
One month-syllabus
One month-syllabusOne month-syllabus
One month-syllabus
 
Zerovm backgroud
Zerovm backgroudZerovm backgroud
Zerovm backgroud
 
Security_of_openstack_keystone
Security_of_openstack_keystoneSecurity_of_openstack_keystone
Security_of_openstack_keystone
 
Research seminar group_1_prosunjit
Research seminar group_1_prosunjitResearch seminar group_1_prosunjit
Research seminar group_1_prosunjit
 
Ksi
KsiKsi
Ksi
 
Attribute Based Encryption
Attribute Based EncryptionAttribute Based Encryption
Attribute Based Encryption
 
Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction Final Project Transciption Factor DNA binding Prediction
Final Project Transciption Factor DNA binding Prediction
 
Cyber Security Exam 2
Cyber Security Exam 2Cyber Security Exam 2
Cyber Security Exam 2
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Transcription Factor DNA Binding Prediction

  • 1. Transcription Factor-DNA binding prediction Tahmina Ahmed Prosunjit Biswas Iffat Sharmin Chowdhury Badri Sampath 1
  • 2. Motivation • Label the unlabeled DNA sequences by the model, built by examining the labeled DNA sequences and be able to perceive some real world Machine Learning problems. 2
  • 3. Approaches • K-mer based Fixed length K-mer K-mer with Mismatches Using Regular Expression • PWM based MEME and MAST • Combined Model Unite both model 3
  • 4. K-mer Approach Based on Regular Expression Motivation 2-mer appears mostly in the sequences. So, emphasize mostly on 2-mer. Strategy - For any two 2-mers X & Y, generate regular expression X(.*)Y and Y(.*)X. - Use these Regular expression as candidate attribute.
  • 5. Classifier Selection Fig : Around 9 classifiers applied on TF data set Algorithms are numbered as follows - (1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging 7)LogitBoost (8)RandomForest (9)J48 Summary - * 9 classifiers are applied on 10 data set. 3 are shown among them * choosing an absolute classifier is not a trivial task * same classifier behaves differently on different data sets 5
  • 6. Change in Accuracy due to Different Classifiers Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data set Summary - * classifiers have great consequences on accuracy * one has to be prudent when choosing classifiers 6
  • 7. Change in Accuracy due to Different K-mer Length 4-mer 5-mer 6-mer Fig : The performance of different length K-mer on TF_3 data set Summary - * K-mer length also has consequences on accuracy * not trivial, difficult to find the absolute one 7
  • 8. Attribute Space Selection Fig : The performance of different selecting k-mer on TF_4 data set Summary - * considering number of attributes also has consequences on accuracy * accuracy increases if we consider greater number of attributes, but from such saturation point it decreases. 8
  • 9. PWM based Analysis on Accuracy (TF_1 data set) Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 – maxW 15, no. of motifs 5 Summary - * accuracy increases when we have more motifs but fixed no. of sites * accuracy increases when we have more sites but fixed no. of motifs * what happened when we increases both ????? 9
  • 10. PWM based Analysis Fig : Accuracy vary on no. of motifs and no. of sites * 1st bar concern with no. of sites * 2nd bar concern with no. of motifs * 3rd bar concern with accuracy * the point is that accuracy decreases when we increases no. of motifs and no. of sites.
  • 11. Extra Work for TF_20 Sequences identified by both model K-mer The New Model + for TF-20 Pwm Sequences Biased 2- Newly identified mer Model Labeled differently Sequences Fig : Flow diagram of Building New Model for TF-20 Summary - * we have done some extra work for TF_20
  • 12. AUC based on the Feedback (bonus model) Fig : AUC of 10 data sets based on last submission * accuracy improved than first submission * PWM does not have pleasant result 12
  • 13. Participation Background Working Working Paramete Automation Study with Tools with r Tuning Models Badri DNA,RNA, AlignAce, PWM K-mer Arff Writer, Sampath protein, MEME, Mast output motif MAST writer Iffat Protein, Weka, K-mer PWM Script for Sharmin Motif, AlignAce, FASTA, Chowdhury Transcriptio ScanAce Weka n Prosunjit DNA, MEME, K-mer PWM Script for Biswas Transcriptio MAST RE, for new nK-mer model Tahmina MEME, MEME, PWM K-mer Script for Ahmed MAST, MAST, MEME, PWM Weka MAST 13