SlideShare uma empresa Scribd logo
1 de 58
Baixar para ler offline
AI-Bio 융합 전문 과정
2022-8~10
윤형기 (hky@openwith.net)
4일차
주제 세부사항
1일차 인사 및 과정 소개
인사
수강생 현황 및 수강목적 등 파악
의료/바이오 개관 (기술/산업) 의료/바이오 기술 및 산업동향
기반기술 (1-1) Python과 분석 패키지 분석도구 (1) (Python, Scipy, numpy/pandas)
2일차 기반기술 (1-2) R과 통계분석 분석도구 (2) (R과 통계학)
생명통계 활용 (1) 생명정보와 ANOVA, 다변량분석 등
유전체 분석
3일차 생명통계 활용 (2) 메타분석
유전체 분석 (Omics) (1)
유전체(genome) 분석
전사체(transcriptome) 분석
4일차 유전체 분석 (Omics) (2)
후성유전체(epigenome) 분석
단백체(proteome) 분석
차세대 Sequencing
GenBank와 NCBI데이터
VCF 데이터 분석, NGS 데이터 처리 등
5일차 기반기술 (3) 기계학습 (1)
모델링 방법론 (모델 개념 및 Cross-Validation)
지도학습 알고리즘 (선형모델, 분류)
기반기술 (3) 기계학습 (2) 비지도학습 알고리즘 (군집, 연관분석 등)
6일차 지도학습과 생명정보 응용
의료데이터에서의 예측모델
선형모델과 헬스케어 데이터의 분류
비지도학습과 생명정보 응용
임상데이터의 연관성분석
동반질병 (comorbidity) 분석
의료/바이오 도메인 이해
헬스케어 데이터셋과 생명통계
바이오 데이터와 기계학습
일정
주제 세부사항
7일차 기반기술 (4) 딥러닝 (1) 신경망 학습과 딥러닝 모델
기반기술 (3) 딥러닝 (2)
TensorFlow
PyTorch
8일차 딥러닝과 생명정보 응용
Bi-LSTM을 이용한 헬스케어 시뮬레이션
딥러닝을 이용한 피부병 식별
온톨로지와 생명정보 응용
세만틱웹과 ontologies
Ontology의 생명정보 응용
9일차 기반 기술 (3) 이미지 처리 이미지 처리와 컴퓨터 비전 개요
의료영상분석 (1)
Segmentation
영상등록 (image registration)
10일차 의료영상분석 (2)
심전도 (ECG)
Rendering과 Surface Models
MRI
11일차 기반기술 (4) 생명정보와 계산화학 계산화학 (computational chemistry) 개요
신약개발 (drug discovery) (1)
표적규명 (target identification)
시약과 검정법 개발
ADME (흡수, 분포, 대사, 배설)
독성학과 기계학습 응용
12일차 기반 기술 (5) GAN GAN (Generative Adversarial Networks)과 VAE
신약개발과 GAN 생성모델을 이용한 신약후보물질 추천
총정리 Wrap-up 총정리
의료영상 분석
약물분석과 신약설계
바이오 데이터와 딥러닝
유전체 분석
생명정보학 주요 주제
• 서열정렬
– Pairwise Sequence Alignment
– Database 유사도 검색
– Multiple Sequence Alignment
– Profile과 HMM
– Protein Motifs and Domain
Prediction
• Gene과 Promoter 예측
– 유전자 예측
– Promoter and Regulatory
Element Prediction
• 분자 계통 발생학
(Molecular Phylogenetics)
– Phylogenetics Basics
– Phylogenetic Tree Construction
Methods and Programs
• 구조적 생명정보학
(Structural Bioinformatics)
– 단백질 구조 시각화, 비교 & 분류
– Protein 구조 Structure 예측
(2ndary, Tertiary)
– RNA 구조 예측
• 유전체학과 전사체학
(Genomics & Proteomics)
– 유전체 Mapping, Assembly, 비교
– 기능 유전체학
– Proteomics
• Genome rearrangements
• Motif finding
• Gene expression analysis
서열정렬
보충: 유전 부호(genetic code)
• 1. 개요
– 각 codon이 어떤 아미노산을 부호화(encoding)할지를 정해놓은 규칙
• 2. 코돈 Codon
– 단백질의 아미노산을 지정하는 RNA의 유전 정보
– RNA 구성 염기: Uracil, Guanine, Cytosine, Adenine
– 한 codon은 3개 염기로 구성 - 이론상 4×4×4=64종의 정보 지정.
• 3. 종류
– 3.1. 개시 코돈 start codon
• 5'-AUG-3’ (일부 박테리아에서 변형된 개시 코돈 사용).
• 진핵 생물에서는 메싸이오닌(Methionine, Met)을,
원핵생물에서는 N-포르밀메싸이오닌(N-Formylmethionine, fMet)을 지정.
• 또한 mRNA가 리보솜과 결합해 단백질 번역을 시작하도록 하는 역할도 수행
– 3.2. 종결 코돈 Stop Codon, Nonsense Codon
• 단백질 번역의 끝을 알리는 codon으로서 UAA, UAG, UGA의 세 종류
• 종결 코돈에는 대응하는 tRNA가 없고 대신 '종결 인자'라는 단백질이 붙으며, 번역 과
정에서 종결 코돈에 도달하면 리보솜의 두 단위체가 분리되어 번역이 종결된다.
– 3.3. 안티코돈(역코돈) anticodon
• tRNA의 RNA 사슬을 이루는 특정 구간의 염기 서열.
Pairwise Sequence Alignment
• 배경
• Sequence Homology (서열 상동성) vs. Sequence Similarity
• Sequence Similarity vs. Sequence Identity
• 기법
– Global Alignment and Local Alignment
– Alignment Algorithms
– Dot Matrix Method
– Dynamic Programming Method
• Gap Penalties
• Dynamic Programming for Global Alignment
• Dynamic Programming for Local Alignment
• Scoring 행렬
– Amino Acid Scoring 행렬
– PAM 행렬
– BLOSUM 행렬
– Comparison between PAM and BLOSUM
• Sequence Alignment의 통계적 유의성
• (Goal)
• 서열 비교
 “공통 character patterns” 과 residue–residue 대응관계를 찾아냄
• 배경 – 진화
• DNA와 protein은 진화의 소산
– The degree of sequence conservation in the alignment reveals
evolutionary relatedness of different sequences, whereas the
variation between sequences reflects the changes that have occurred
during evolution in the form of substitutions, insertions, and
deletions.
• sequence alignment
– can be used as basis for prediction of structure and function of
uncharacterized sequences.
– provides inference for the relatedness of two sequences under study.
Sequence Homology vs. Similarity
• (…)
– 용어 구별
• Homologous relationship or share homology.
– an inference or a conclusion about a common ancestral relationship
drawn from sequence similarity comparison when the two sequences
share a high enough degree of similarity. (qualitative)
• Sequence similarity
– is a direct result of observation from the sequence alignment.
– % of aligned residues that are similar in physiochemical properties
such as size, charge, and hydrophobicity. (quantitative)
– 문제는 sequence similarity level
• Nucleotide sequences consist of only 4 characters → unrelated
sequences have at least a 25% chance of being identical.
• protein sequences - 20 possible amino acid residues → two
unrelated sequences can match up 5% of the residues by random
chance.
– 단, % identity values only provide a tentative guidance for homology
identification
3 zones of protein sequence alignments. (Source: Modified from Rost 1999).
Sequence Similarity vs. Sequence Identity
• (…)
• nucleotide sequence의 경우 사실상 같은 의미
• Protein sequence의 경우 구별할 것
– sequence identity = % of matches of the same amino acid residues
between two aligned sequences.
– Similarity = % of aligned residues that have similar physicochemical
characteristics and can be more readily substituted for each other.
– Sequence similarity 및 identity 계산 방법
– One involves use of the overall sequence lengths of both sequences
– the other normalizes by the size of the shorter sequence.
Methods
• Global Alignment and Local Alignment
• Global Alignment
– 처음부터 끝까지 비교
» is more applicable for aligning two closely related sequences of
roughly the same length.
» For divergent sequences and sequences of variable lengths, this
method may not be able to generate optimal results because it
fails to recognize highly similar local regions between the two
sequences.
• Local alignment
– only finds local regions with the highest level of similarity between
the two sequences and aligns these regions without regard for the
alignment of the rest of the sequence regions
– Two sequences to be aligned can be of different lengths
pairwise sequence 비교의 예
• 정렬 알고리즘
– Dot Matrix Method (= dot plot method)
– Dynamic Programming Method
• Gap Penalties
• Dynamic Programming for Global Alignment
• Dynamic Programming for Local Alignment
– Word method
– Dot Matrix Method
dot plot에 의한 서열비교의 예. Lines linking the dots in diagonals indicate
sequence alignment. Diagonal lines above or below the main diagonal
represent internal repeats of either sequence
• Problem when comparing large sequences using dot matrix
method
– high noise level.
» In most dot plots, dots are plotted all over the graph, obscuring
identification of the true alignment - particularly acute for DNA
sequences because only 4 possible characters in DNA and each
residue therefore has a 1-in-4 chance of matching a residue in
another sequence.
» To reduce noise, instead of using a single residue to scan for
similarity, a filtering technique has to be applied, which uses a
“window” of fixed length covering a stretch of residue pairs.
• self comparison as a variation of using the dot plot method.
– a main diagonal for perfect matching of each residue  identify
internal repeat elements
– If repeats are present, short parallel lines are observed above and
below the main diagonal.
» Self complementarity of DNA sequences (also called inverted
repeats) can also be identified using a dot plot.
» In this case, a DNA sequence is compared with its reverse-
complemented sequence.
– Parallel diagonals represent the inverted repeats.
– 장점
» easy identification of greatest similarities.
– 단점
» it is often up to the user to construct a full alignment with
insertions and deletions by linking nearby diagonals.
» it lacks statistical rigor in assessing the quality of the alignment.
» is also restricted to pairwise alignment. It is difficult for the
method to scale up to multiple alignment.
– Dynamic Programming Method
• (…)
– convert a dot matrix into a scoring matrix to account for matches
and mismatches between sequences. By searching for the set of
highest scores in this matrix, the best alignment can be accurately
obtained.
– construct a 2-D matrix.
» The residue matching is according to a particular scoring matrix.
The scores are calculated one row at a time. This starts with the
first row of one sequence, which is used to scan through the
entire length of the other sequence, followed by scanning of
the second row. The matching scores are calculated.
• Gap Penalties
– Apply gaps that represent insertions and deletions.
– cost difference between opening a gap and extending an existing
gap.
» it is easier to extend a gap that has already been started. Thus,
gap opening have a much higher penalty  if insertions and
deletions ever occur, several adjacent residues are likely to have
been inserted or deleted together.
» affine gap penalties (= These differential gap penalties).
» Strategy: use preset gap penalty values for introducing and
extending gaps.
» The total gap penalty (W) is a linear function of gap length:
» a constant gap penalty - less realistic
γ = gap opening penalty,
δ = gap extension penalty,
k = length of the gap.
• DP for Global Alignment (Needleman–Wunsch algorithm)
– an optimal alignment is obtained over the entire lengths of the two
sequences.
– Drawback = risk of missing the best local similarity → only suitable
for aligning two closely related sequences that are of the same
length. (For divergent sequences or sequences with different domain
structures, the approach does not produce optimal alignment)
• DP for Local Alignment (Smith–Waterman algorithm)
– identification of regional sequence similarity
Scoring 행렬
• (…) = a substitution 행렬
• is derived from statistical analysis of residue substitution data
from sets of reliable alignments of highly related sequences.
– A positive value or high score is given for a match and a negative
value or low score for a mismatch.
– Assumption: the frequencies of mutation are equal for all bases.
단, 비현실적 가정임
• Scoring matrices for amino acids are more complicated
–  the physicochemical properties of amino acid residues, as well as
the likelihood of certain residues being substituted among true
homologous sequences.
– Certain amino acids with similar physicochemical properties can be
more easily substituted than those without similar characteristics.
Substitutions among similar residues are likely to preserve the
essential functional and structural features. However, substitutions
between residues of different physicochemical properties are more
likely to cause disruptions to the structure and function.
• Amino Acid Scoring 행렬
– 20 x 20 matrices to reflect the likelihood of residue substitutions
• 2 types of amino acid substitution matrices.
– (i) based on interchangeability of the genetic code or amino acid
properties,
» is based on genetic code or the physicochemical features of
amino acids → less accurate
– (ii) derived from empirical studies of amino acid substitutions.
»  surveys of actual amino acid substitutions among related
proteins.
» PAM and BLOSUM matrices derived from actual alignments of
highly similar sequences. By analyzing the probabilities of
amino acid substitutions in these alignments, a scoring system
can be developed by giving a high score for a more likely
substitution and a low score for a rare substitution.
• PAM 행렬 (Dayhoff PAM 행렬)
• point accepted mutation
Correspondence of PAM Numbers with Observed
Amino Acid Mutational Rates
• BLOSUM 행렬
• the series of blocks amino acid substitution matrices (BLOSUM)
– → (In PAM matrix construction, the only direct observation of
residue substitutions is in PAM1, based on a relatively small set of
extremely closely related sequences. Sequence alignment statistics
for more divergent sequences are not available. )
– all are derived based on direct observation for every possible amino
acid substitution in multiple sequence alignments.
• extrapolation 함수 대신, BLOSUM matrices are actual % identity
values of sequences selected for construction of the matrices.
PAM250 amino acid substitution matrix. Residues are
grouped according to physicochemical similarities.
BLOSUM62 amino acid substitution matrix.
• PAM과 BLOSUM의 비교
• 주된 차이점
– PAM matrices, except PAM1, are derived from an evolutionary model
– BLOSUM matrices consist of entirely direct observations.
» BLOSUM matrices are entirely derived from local sequence
alignments of conserved sequence blocks,
» PAM1 matrix is based on the global alignment of full-length
sequences composed of both conserved and variable regions. →
BLOSUM matrices is more advantageous in searching databases and
finding conserved domains in proteins.
• 몇몇 실증 비교의 결과
– BLOSUM matrices outperform the PAM matrices in terms of accuracy of
local alignment, largely because BLOSUM matrices are derived from a
much larger and more representative dataset than the one used to derive
the PAM matrices. → BLOSUM matrices more reliable.
– 개정된 행렬이 고안됨. (ex) Gonnet matrices and Jones–Taylor–Thornton
matrices –particularly robust in phylogenetic tree construction .
alignment score에 대한 Gumble 극값 분포.
Sequence Alignment의 통계적 유의성
• 개념
• True evidence of homology를 찾기 위한 통계검정
– 검정 절차
• A P-value resulting from the test
– < 10-100 indicates an exact match between the two sequences.
– 10-100 < P-value < 10-50 → a nearly identical match.
– 10-50 < P-value < 10-5 → sequences having clear homology.
– 10-5 < P-value < 10-1 → possible distant homologs.
– 10-1 < P-value → the two sequence may be randomly related.
– However, sometimes truly related protein sequences may lack the
statistical significance at the sequence level owing to fast divergence
rates. Their evolutionary relationships can nonetheless be revealed at
the three-dimensional structural level.
Database 유사도 검색
• DB 검색의 요건
• Heuristic 검색
• Basic Local Alignment Search Tool (BLAST)
– Variants
– Statistical Significance
– Low Complexity Regions
– BLAST Output Format
• FASTA
– 통계적 유의성
• FASTA와 BLAST의 비교
• Smith–Waterman Method에 의한 검색
일반론
• DB 검색
• pairwise alignment to retrieve biological sequences in DBs based on
similarity.
– Query for a pairwise comparison with all individual sequences in a
database. - Database similarity searching is pairwise alignment on a large
scale.
– However, DP is slow and impractical to use in most cases. Special search
methods are needed to speed up the computational process.
• DB 검색의 요건
• Sensitivity → “true positives”
• specificity = “false positives.”
• speed
– Types of algo
• Exhaustive type – examine all mathematical combinations (ex) DP
• Heuristic type – find empirical or near optimal solution using rules of
thumb
Heuristic 검색
• (…)
– BLAST
– FASTA
– word method
• Both BLAST and FASTA use a heuristic “word method” for fast
pairwise sequence alignment.
Basic Local Alignment Search Tool (BLAST)
• 목적
– = high-scoring ungapped segments를 찾아내고자 함 - Segments
above a given threshold indicates pairwise similarity beyond random
chance.
BLOSUM62 matrix에 의한 alignment scoring의 예
• 변형된 방법론
– BLASTN
– BLASTP
– BLASTX
– TBLASTX
• 통계적 유의성
– The larger the DB, the more unrelated sequence alignments.
→ a new parameter taking into account total number of sequence
alignments conducted, proportional to the size of the database.
• In BLAST searches, E-value (expectation value)
– indicates the probability that the resulting alignments from a DB
search are caused by random chance.
– E-value is related to the P-value used to assess significance of single
pairwise alignment. BLAST compares a query sequence against all
database sequences, and so the E-value is determined by:
– (ex) …
• A bit score
– Measures sequence similarity independent of query sequence length
and DB size and is normalized based on the raw pairwise alignment
score
• Low Complexity Regions (LCRs)
• For both protein and DNAsequences, there may be regions that
contain highly repetitive residues, such as short segments of
repeats, or segments that are overrepresented by a small number
of residues.
– LCRs are rather prevalent in DB sequences; about 15% of the total
protein sequences in public databases. → spurious DB matches and
lead to artificially high alignment scores with unrelated sequences.
• To avoid the problem of high similarity scores owing to matching
of LCRs, filter out the problematic regions in both query and DB
sequences to improve SN ratio,(= masking)
• 2 types of masking: hard and soft.
• SEG detects and mask repetitive elements before executing DB
searches.
– SEG has been integrated into the BLAST web based program.
• BLAST Output Format
FASTA
• (…)
• 최초의 DB 유사도 검색 도구
• find matches for a short stretch of identical residues with a
length of k. (“hashing” 방식)
– string of residues (= ktuples or ktups) are equivalent to words in
BLAST, but are normally shorter than words. Typically, a ktup is
composed of two residues for protein sequences and six residues for
DNA sequences.
• Similar to BLAST, FASTA has a number of subprograms.
Procedure of ktup identification using the hashing strategy by FASTA. Identical
offset values between residues of the two sequences allow the formation of ktups.
Steps of the FASTA alignment procedure. In step 1 (left ), all possible ungapped
alignments are found between two sequences with the hashing method. In step 2
(middle), the alignments are scored according to a particular scoring matrix. Only
the ten best alignments are selected. In step 3 (right ), the alignments in the same
diagonal are selected and joined to form a single gapped alignment, which is
optimized using the dynamic programming approach.
• 통계적 유의성
• FASTA also uses E-values and bit scores.
– essentially the same as in BLAST, but the FASTA output provides one
more statistical parameter, the Z-score.
» Because most of the alignments with the query sequence are
with unrelated sequences, the higher the Z-score for a reported
match, the further away from the mean of the score distribution,
hence, the more significant the match.
» For a Z-score > 15, the match can be considered extremely
significant, with certainty of a homologous relationship.
» If Z is in the range of 5 to 15, the sequence pair can be
described as highly probable homologs.
» If Z < 5, their relationships is described as less certain.
FASTA와 BLAST의 비교
• (…)
• BLAST and FASTA perform equally well in regular DB searching.
• differences (Notably seeding step)
– BLAST uses a substitution matrix to find matching words
» use of low-complexity masking in BLAST → higher specificity
than FASTA because potential FPs are reduced.
» BLAST sometimes gives multiple best-scoring alignments from
the same sequence;
– FASTA identifies identical matching word using hashing procedure.
» By default, FASTA scans smaller window sizes. → more sensitive
results than BLAST, with a better coverage rate for homologs.
However, it is usually slower than BLAST.
» FASTA returns only one final alignment.
다중 서열정렬
(Multiple Sequence Alignment)
• Scoring 함수
• Exhaustive Algorithms
• Heuristic Algorithms
– Progressive Alignment Method
– Drawbacks and Solutions
– Iterative Alignment
– Block-Based Alignment
• 검토사항
– Protein-Coding DNA Sequences
– Editing
– Format Conversion
• 개념
• generation of multiple matching sequence pairs → convert
numerous pairwise alignments into a single alignment → arrange
sequences in such a way that evolutionarily equivalent positions
across all sequences are matched.
• 장점
– reveals more biological information than pairwise alignments can.
– applications in designing degenerate PCR primers based on multiple
related sequences.
• DP vs. Heuristic
– the amount of computing time and memory DP requires increases
exponentially as the number of sequences increases. In practice,
heuristic approaches are most often used.
Scoring 함수
• (…)
• MSA is to arrange sequences in such a way that a max no. of
residues from each sequence are matched up according to a
particular scoring function.
» = sum of pairs (SP). (= sum of scores of all possible pairs of sequences in
a multiple alignment based on a particular scoring matrix).
– In calculating SP scores, each column is scored by summing the
scores for all possible pairwise matches, mismatches and gap costs.
The score of the entire alignment is the sum of all of column scores.
– The purpose of most multiple sequence alignment algorithms is to
achieve maximum SP scores.
Exhaustive Algorithms
Heuristic Algorithms
• (3 categories)
– Progressive Alignment Method
– Iterative Alignment
– Block-Based Alignment
• Progressive Alignment Method
– Drawbacks and Solutions
Schematic of a typical progressive alignment procedure (e.g., Clustal).
Angled wavy lines represent consensus sequences for sequence pairs A/B
and C/D. Curved wavy lines represent a consensus for A/B/C/D.
Conversion of a sequence alignment into a graphical profile in
the Poa algorithm. Identical residues in the alignment are
condensed as nodes in the partial order graph.
• Iterative Alignment
• Block-Based Alignment
Schematic of iterative alignment procedure for PRRN, which
involves two sets of iterations.
실습 (1) PYTHON
• Source
실습 (2) R
• Source

Mais conteúdo relacionado

Mais procurados

Global and local alignment in Bioinformatics
Global and local alignment in BioinformaticsGlobal and local alignment in Bioinformatics
Global and local alignment in BioinformaticsMahmudul Alam
 
PROTEIN STRUCTURE DATABANK
PROTEIN STRUCTURE DATABANKPROTEIN STRUCTURE DATABANK
PROTEIN STRUCTURE DATABANKMalvika Bansal
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
 
Autodock and vina
Autodock and vinaAutodock and vina
Autodock and vinaIfraSaifi1
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentAfra Fathima
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignmentavrilcoghlan
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonNatalio Krasnogor
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsasync_io
 
Data Science Workflow
Data Science Workflow Data Science Workflow
Data Science Workflow Aseel Addawood
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationSease
 
Making abstract data visible
Making abstract data visibleMaking abstract data visible
Making abstract data visiblePriyanshi Jain
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmProshantaShil
 

Mais procurados (20)

Bio image informatics
Bio image informaticsBio image informatics
Bio image informatics
 
Global and local alignment in Bioinformatics
Global and local alignment in BioinformaticsGlobal and local alignment in Bioinformatics
Global and local alignment in Bioinformatics
 
PROTEIN STRUCTURE DATABANK
PROTEIN STRUCTURE DATABANKPROTEIN STRUCTURE DATABANK
PROTEIN STRUCTURE DATABANK
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Blast
BlastBlast
Blast
 
Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
Autodock and vina
Autodock and vinaAutodock and vina
Autodock and vina
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Computer Graphics
Computer GraphicsComputer Graphics
Computer Graphics
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.js
 
PPT ON ALGORITHM
PPT ON ALGORITHMPPT ON ALGORITHM
PPT ON ALGORITHM
 
Data Science Workflow
Data Science Workflow Data Science Workflow
Data Science Workflow
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
Making abstract data visible
Making abstract data visibleMaking abstract data visible
Making abstract data visible
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 

Semelhante a AI 바이오 (4일차).pdf

sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisSangeeta Das
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Prof. Wim Van Criekinge
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence AlignmentAjayPatil210
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 

Semelhante a AI 바이오 (4일차).pdf (20)

sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Parwati sihag
Parwati sihagParwati sihag
Parwati sihag
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 

Mais de H K Yoon

AI 바이오 (2_3일차).pdf
AI 바이오 (2_3일차).pdfAI 바이오 (2_3일차).pdf
AI 바이오 (2_3일차).pdfH K Yoon
 
Outlier Analysis.pdf
Outlier Analysis.pdfOutlier Analysis.pdf
Outlier Analysis.pdfH K Yoon
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Open stack and k8s(v4)
Open stack and k8s(v4)Open stack and k8s(v4)
Open stack and k8s(v4)H K Yoon
 
Open source Embedded systems
Open source Embedded systemsOpen source Embedded systems
Open source Embedded systemsH K Yoon
 
빅데이터, big data
빅데이터, big data빅데이터, big data
빅데이터, big dataH K Yoon
 
Sensor web
Sensor webSensor web
Sensor webH K Yoon
 
Tm기반검색v2
Tm기반검색v2Tm기반검색v2
Tm기반검색v2H K Yoon
 

Mais de H K Yoon (8)

AI 바이오 (2_3일차).pdf
AI 바이오 (2_3일차).pdfAI 바이오 (2_3일차).pdf
AI 바이오 (2_3일차).pdf
 
Outlier Analysis.pdf
Outlier Analysis.pdfOutlier Analysis.pdf
Outlier Analysis.pdf
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Open stack and k8s(v4)
Open stack and k8s(v4)Open stack and k8s(v4)
Open stack and k8s(v4)
 
Open source Embedded systems
Open source Embedded systemsOpen source Embedded systems
Open source Embedded systems
 
빅데이터, big data
빅데이터, big data빅데이터, big data
빅데이터, big data
 
Sensor web
Sensor webSensor web
Sensor web
 
Tm기반검색v2
Tm기반검색v2Tm기반검색v2
Tm기반검색v2
 

Último

💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...India Call Girls
 
Independent Call Girls Service Chandigarh | 8868886958 | Call Girl Service Nu...
Independent Call Girls Service Chandigarh | 8868886958 | Call Girl Service Nu...Independent Call Girls Service Chandigarh | 8868886958 | Call Girl Service Nu...
Independent Call Girls Service Chandigarh | 8868886958 | Call Girl Service Nu...Sheetaleventcompany
 
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...Sheetaleventcompany
 
Low Rate Call Girls Nagpur {9xx000xx09} ❤️VVIP NISHA Call Girls in Nagpur Mah...
Low Rate Call Girls Nagpur {9xx000xx09} ❤️VVIP NISHA Call Girls in Nagpur Mah...Low Rate Call Girls Nagpur {9xx000xx09} ❤️VVIP NISHA Call Girls in Nagpur Mah...
Low Rate Call Girls Nagpur {9xx000xx09} ❤️VVIP NISHA Call Girls in Nagpur Mah...Sheetaleventcompany
 
Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Sheetaleventcompany
 
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...Sheetaleventcompany
 
💚 Low Rate Call Girls In Chandigarh 💯Lucky 📲🔝8868886958🔝Call Girl In Chandig...
💚 Low Rate  Call Girls In Chandigarh 💯Lucky 📲🔝8868886958🔝Call Girl In Chandig...💚 Low Rate  Call Girls In Chandigarh 💯Lucky 📲🔝8868886958🔝Call Girl In Chandig...
💚 Low Rate Call Girls In Chandigarh 💯Lucky 📲🔝8868886958🔝Call Girl In Chandig...Sheetaleventcompany
 
❤️Chandigarh Escort Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ C...
❤️Chandigarh Escort Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ C...❤️Chandigarh Escort Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ C...
❤️Chandigarh Escort Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ C...Sheetaleventcompany
 
❤️Amritsar Call Girls Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ ...
❤️Amritsar Call Girls Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ ...❤️Amritsar Call Girls Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ ...
❤️Amritsar Call Girls Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ ...shallyentertainment1
 
❤️Amritsar Escort Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amrit...
❤️Amritsar Escort Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amrit...❤️Amritsar Escort Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amrit...
❤️Amritsar Escort Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amrit...Sheetaleventcompany
 
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...India Call Girls
 
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...India Call Girls
 
2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in RheumatologySidney Erwin Manahan
 
Independent Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bang...
Independent Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bang...Independent Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bang...
Independent Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bang...Sheetaleventcompany
 
💚Trustworthy Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girls In Chandiga...
💚Trustworthy Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girls In Chandiga...💚Trustworthy Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girls In Chandiga...
💚Trustworthy Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girls In Chandiga...Sheetaleventcompany
 
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...dilpreetentertainmen
 
Low Rate Call Girls Udaipur {9xx000xx09} ❤️VVIP NISHA CCall Girls in Udaipur ...
Low Rate Call Girls Udaipur {9xx000xx09} ❤️VVIP NISHA CCall Girls in Udaipur ...Low Rate Call Girls Udaipur {9xx000xx09} ❤️VVIP NISHA CCall Girls in Udaipur ...
Low Rate Call Girls Udaipur {9xx000xx09} ❤️VVIP NISHA CCall Girls in Udaipur ...Sheetaleventcompany
 
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...India Call Girls
 
❤️ Zirakpur Call Girl Service ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
❤️ Zirakpur Call Girl Service  ☎️9878799926☎️ Call Girl service in Zirakpur ☎...❤️ Zirakpur Call Girl Service  ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
❤️ Zirakpur Call Girl Service ☎️9878799926☎️ Call Girl service in Zirakpur ☎...daljeetkaur2026
 
Call Girls Service 11 Phase Mohali {7435815124} ❤️ MONA Call Girl in Mohali P...
Call Girls Service 11 Phase Mohali {7435815124} ❤️ MONA Call Girl in Mohali P...Call Girls Service 11 Phase Mohali {7435815124} ❤️ MONA Call Girl in Mohali P...
Call Girls Service 11 Phase Mohali {7435815124} ❤️ MONA Call Girl in Mohali P...Sheetaleventcompany
 

Último (20)

💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
💸Cash Payment No Advance Call Girls Nagpur 🧿 9332606886 🧿 High Class Call Gir...
 
Independent Call Girls Service Chandigarh | 8868886958 | Call Girl Service Nu...
Independent Call Girls Service Chandigarh | 8868886958 | Call Girl Service Nu...Independent Call Girls Service Chandigarh | 8868886958 | Call Girl Service Nu...
Independent Call Girls Service Chandigarh | 8868886958 | Call Girl Service Nu...
 
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
Independent Call Girls Service Chandigarh Sector 17 | 8868886958 | Call Girl ...
 
Low Rate Call Girls Nagpur {9xx000xx09} ❤️VVIP NISHA Call Girls in Nagpur Mah...
Low Rate Call Girls Nagpur {9xx000xx09} ❤️VVIP NISHA Call Girls in Nagpur Mah...Low Rate Call Girls Nagpur {9xx000xx09} ❤️VVIP NISHA Call Girls in Nagpur Mah...
Low Rate Call Girls Nagpur {9xx000xx09} ❤️VVIP NISHA Call Girls in Nagpur Mah...
 
Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024Top 20 Famous Indian Female Pornstars Name List 2024
Top 20 Famous Indian Female Pornstars Name List 2024
 
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
Call Girls In Indore 📞9235973566📞Just Call Inaaya📲 Call Girls Service In Indo...
 
💚 Low Rate Call Girls In Chandigarh 💯Lucky 📲🔝8868886958🔝Call Girl In Chandig...
💚 Low Rate  Call Girls In Chandigarh 💯Lucky 📲🔝8868886958🔝Call Girl In Chandig...💚 Low Rate  Call Girls In Chandigarh 💯Lucky 📲🔝8868886958🔝Call Girl In Chandig...
💚 Low Rate Call Girls In Chandigarh 💯Lucky 📲🔝8868886958🔝Call Girl In Chandig...
 
❤️Chandigarh Escort Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ C...
❤️Chandigarh Escort Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ C...❤️Chandigarh Escort Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ C...
❤️Chandigarh Escort Service☎️9814379184☎️ Call Girl service in Chandigarh☎️ C...
 
❤️Amritsar Call Girls Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ ...
❤️Amritsar Call Girls Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ ...❤️Amritsar Call Girls Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ ...
❤️Amritsar Call Girls Service☎️98151-129OO☎️ Call Girl service in Amritsar☎️ ...
 
❤️Amritsar Escort Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amrit...
❤️Amritsar Escort Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amrit...❤️Amritsar Escort Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amrit...
❤️Amritsar Escort Service☎️9815674956☎️ Call Girl service in Amritsar☎️ Amrit...
 
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
💸Cash Payment No Advance Call Girls Hyderabad 🧿 9332606886 🧿 High Class Call ...
 
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
💸Cash Payment No Advance Call Girls Pune 🧿 9332606886 🧿 High Class Call Girl ...
 
2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology2024 PCP #IMPerative Updates in Rheumatology
2024 PCP #IMPerative Updates in Rheumatology
 
Independent Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bang...
Independent Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bang...Independent Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bang...
Independent Call Girls Bangalore {7304373326} ❤️VVIP POOJA Call Girls in Bang...
 
💚Trustworthy Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girls In Chandiga...
💚Trustworthy Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girls In Chandiga...💚Trustworthy Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girls In Chandiga...
💚Trustworthy Call Girls Chandigarh 💯Niamh 📲🔝8868886958🔝Call Girls In Chandiga...
 
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
🍑👄Ludhiana Escorts Service☎️98157-77685🍑👄 Call Girl service in Ludhiana☎️Ludh...
 
Low Rate Call Girls Udaipur {9xx000xx09} ❤️VVIP NISHA CCall Girls in Udaipur ...
Low Rate Call Girls Udaipur {9xx000xx09} ❤️VVIP NISHA CCall Girls in Udaipur ...Low Rate Call Girls Udaipur {9xx000xx09} ❤️VVIP NISHA CCall Girls in Udaipur ...
Low Rate Call Girls Udaipur {9xx000xx09} ❤️VVIP NISHA CCall Girls in Udaipur ...
 
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
💞 Safe And Secure Call Girls Coimbatore 🧿 9332606886 🧿 High Class Call Girl S...
 
❤️ Zirakpur Call Girl Service ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
❤️ Zirakpur Call Girl Service  ☎️9878799926☎️ Call Girl service in Zirakpur ☎...❤️ Zirakpur Call Girl Service  ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
❤️ Zirakpur Call Girl Service ☎️9878799926☎️ Call Girl service in Zirakpur ☎...
 
Call Girls Service 11 Phase Mohali {7435815124} ❤️ MONA Call Girl in Mohali P...
Call Girls Service 11 Phase Mohali {7435815124} ❤️ MONA Call Girl in Mohali P...Call Girls Service 11 Phase Mohali {7435815124} ❤️ MONA Call Girl in Mohali P...
Call Girls Service 11 Phase Mohali {7435815124} ❤️ MONA Call Girl in Mohali P...
 

AI 바이오 (4일차).pdf

  • 1. AI-Bio 융합 전문 과정 2022-8~10 윤형기 (hky@openwith.net) 4일차
  • 2. 주제 세부사항 1일차 인사 및 과정 소개 인사 수강생 현황 및 수강목적 등 파악 의료/바이오 개관 (기술/산업) 의료/바이오 기술 및 산업동향 기반기술 (1-1) Python과 분석 패키지 분석도구 (1) (Python, Scipy, numpy/pandas) 2일차 기반기술 (1-2) R과 통계분석 분석도구 (2) (R과 통계학) 생명통계 활용 (1) 생명정보와 ANOVA, 다변량분석 등 유전체 분석 3일차 생명통계 활용 (2) 메타분석 유전체 분석 (Omics) (1) 유전체(genome) 분석 전사체(transcriptome) 분석 4일차 유전체 분석 (Omics) (2) 후성유전체(epigenome) 분석 단백체(proteome) 분석 차세대 Sequencing GenBank와 NCBI데이터 VCF 데이터 분석, NGS 데이터 처리 등 5일차 기반기술 (3) 기계학습 (1) 모델링 방법론 (모델 개념 및 Cross-Validation) 지도학습 알고리즘 (선형모델, 분류) 기반기술 (3) 기계학습 (2) 비지도학습 알고리즘 (군집, 연관분석 등) 6일차 지도학습과 생명정보 응용 의료데이터에서의 예측모델 선형모델과 헬스케어 데이터의 분류 비지도학습과 생명정보 응용 임상데이터의 연관성분석 동반질병 (comorbidity) 분석 의료/바이오 도메인 이해 헬스케어 데이터셋과 생명통계 바이오 데이터와 기계학습 일정
  • 3. 주제 세부사항 7일차 기반기술 (4) 딥러닝 (1) 신경망 학습과 딥러닝 모델 기반기술 (3) 딥러닝 (2) TensorFlow PyTorch 8일차 딥러닝과 생명정보 응용 Bi-LSTM을 이용한 헬스케어 시뮬레이션 딥러닝을 이용한 피부병 식별 온톨로지와 생명정보 응용 세만틱웹과 ontologies Ontology의 생명정보 응용 9일차 기반 기술 (3) 이미지 처리 이미지 처리와 컴퓨터 비전 개요 의료영상분석 (1) Segmentation 영상등록 (image registration) 10일차 의료영상분석 (2) 심전도 (ECG) Rendering과 Surface Models MRI 11일차 기반기술 (4) 생명정보와 계산화학 계산화학 (computational chemistry) 개요 신약개발 (drug discovery) (1) 표적규명 (target identification) 시약과 검정법 개발 ADME (흡수, 분포, 대사, 배설) 독성학과 기계학습 응용 12일차 기반 기술 (5) GAN GAN (Generative Adversarial Networks)과 VAE 신약개발과 GAN 생성모델을 이용한 신약후보물질 추천 총정리 Wrap-up 총정리 의료영상 분석 약물분석과 신약설계 바이오 데이터와 딥러닝
  • 5. 생명정보학 주요 주제 • 서열정렬 – Pairwise Sequence Alignment – Database 유사도 검색 – Multiple Sequence Alignment – Profile과 HMM – Protein Motifs and Domain Prediction • Gene과 Promoter 예측 – 유전자 예측 – Promoter and Regulatory Element Prediction • 분자 계통 발생학 (Molecular Phylogenetics) – Phylogenetics Basics – Phylogenetic Tree Construction Methods and Programs • 구조적 생명정보학 (Structural Bioinformatics) – 단백질 구조 시각화, 비교 & 분류 – Protein 구조 Structure 예측 (2ndary, Tertiary) – RNA 구조 예측 • 유전체학과 전사체학 (Genomics & Proteomics) – 유전체 Mapping, Assembly, 비교 – 기능 유전체학 – Proteomics • Genome rearrangements • Motif finding • Gene expression analysis
  • 7. 보충: 유전 부호(genetic code) • 1. 개요 – 각 codon이 어떤 아미노산을 부호화(encoding)할지를 정해놓은 규칙 • 2. 코돈 Codon – 단백질의 아미노산을 지정하는 RNA의 유전 정보 – RNA 구성 염기: Uracil, Guanine, Cytosine, Adenine – 한 codon은 3개 염기로 구성 - 이론상 4×4×4=64종의 정보 지정. • 3. 종류 – 3.1. 개시 코돈 start codon • 5'-AUG-3’ (일부 박테리아에서 변형된 개시 코돈 사용). • 진핵 생물에서는 메싸이오닌(Methionine, Met)을, 원핵생물에서는 N-포르밀메싸이오닌(N-Formylmethionine, fMet)을 지정. • 또한 mRNA가 리보솜과 결합해 단백질 번역을 시작하도록 하는 역할도 수행 – 3.2. 종결 코돈 Stop Codon, Nonsense Codon • 단백질 번역의 끝을 알리는 codon으로서 UAA, UAG, UGA의 세 종류 • 종결 코돈에는 대응하는 tRNA가 없고 대신 '종결 인자'라는 단백질이 붙으며, 번역 과 정에서 종결 코돈에 도달하면 리보솜의 두 단위체가 분리되어 번역이 종결된다. – 3.3. 안티코돈(역코돈) anticodon • tRNA의 RNA 사슬을 이루는 특정 구간의 염기 서열.
  • 8. Pairwise Sequence Alignment • 배경 • Sequence Homology (서열 상동성) vs. Sequence Similarity • Sequence Similarity vs. Sequence Identity • 기법 – Global Alignment and Local Alignment – Alignment Algorithms – Dot Matrix Method – Dynamic Programming Method • Gap Penalties • Dynamic Programming for Global Alignment • Dynamic Programming for Local Alignment • Scoring 행렬 – Amino Acid Scoring 행렬 – PAM 행렬 – BLOSUM 행렬 – Comparison between PAM and BLOSUM • Sequence Alignment의 통계적 유의성
  • 9. • (Goal) • 서열 비교  “공통 character patterns” 과 residue–residue 대응관계를 찾아냄 • 배경 – 진화 • DNA와 protein은 진화의 소산 – The degree of sequence conservation in the alignment reveals evolutionary relatedness of different sequences, whereas the variation between sequences reflects the changes that have occurred during evolution in the form of substitutions, insertions, and deletions. • sequence alignment – can be used as basis for prediction of structure and function of uncharacterized sequences. – provides inference for the relatedness of two sequences under study.
  • 10. Sequence Homology vs. Similarity • (…) – 용어 구별 • Homologous relationship or share homology. – an inference or a conclusion about a common ancestral relationship drawn from sequence similarity comparison when the two sequences share a high enough degree of similarity. (qualitative) • Sequence similarity – is a direct result of observation from the sequence alignment. – % of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity. (quantitative) – 문제는 sequence similarity level • Nucleotide sequences consist of only 4 characters → unrelated sequences have at least a 25% chance of being identical. • protein sequences - 20 possible amino acid residues → two unrelated sequences can match up 5% of the residues by random chance.
  • 11. – 단, % identity values only provide a tentative guidance for homology identification 3 zones of protein sequence alignments. (Source: Modified from Rost 1999).
  • 12. Sequence Similarity vs. Sequence Identity • (…) • nucleotide sequence의 경우 사실상 같은 의미 • Protein sequence의 경우 구별할 것 – sequence identity = % of matches of the same amino acid residues between two aligned sequences. – Similarity = % of aligned residues that have similar physicochemical characteristics and can be more readily substituted for each other. – Sequence similarity 및 identity 계산 방법 – One involves use of the overall sequence lengths of both sequences – the other normalizes by the size of the shorter sequence.
  • 13. Methods • Global Alignment and Local Alignment • Global Alignment – 처음부터 끝까지 비교 » is more applicable for aligning two closely related sequences of roughly the same length. » For divergent sequences and sequences of variable lengths, this method may not be able to generate optimal results because it fails to recognize highly similar local regions between the two sequences. • Local alignment – only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions – Two sequences to be aligned can be of different lengths
  • 15. • 정렬 알고리즘 – Dot Matrix Method (= dot plot method) – Dynamic Programming Method • Gap Penalties • Dynamic Programming for Global Alignment • Dynamic Programming for Local Alignment – Word method
  • 16. – Dot Matrix Method dot plot에 의한 서열비교의 예. Lines linking the dots in diagonals indicate sequence alignment. Diagonal lines above or below the main diagonal represent internal repeats of either sequence
  • 17. • Problem when comparing large sequences using dot matrix method – high noise level. » In most dot plots, dots are plotted all over the graph, obscuring identification of the true alignment - particularly acute for DNA sequences because only 4 possible characters in DNA and each residue therefore has a 1-in-4 chance of matching a residue in another sequence. » To reduce noise, instead of using a single residue to scan for similarity, a filtering technique has to be applied, which uses a “window” of fixed length covering a stretch of residue pairs.
  • 18. • self comparison as a variation of using the dot plot method. – a main diagonal for perfect matching of each residue  identify internal repeat elements – If repeats are present, short parallel lines are observed above and below the main diagonal. » Self complementarity of DNA sequences (also called inverted repeats) can also be identified using a dot plot. » In this case, a DNA sequence is compared with its reverse- complemented sequence. – Parallel diagonals represent the inverted repeats.
  • 19. – 장점 » easy identification of greatest similarities. – 단점 » it is often up to the user to construct a full alignment with insertions and deletions by linking nearby diagonals. » it lacks statistical rigor in assessing the quality of the alignment. » is also restricted to pairwise alignment. It is difficult for the method to scale up to multiple alignment.
  • 20. – Dynamic Programming Method • (…) – convert a dot matrix into a scoring matrix to account for matches and mismatches between sequences. By searching for the set of highest scores in this matrix, the best alignment can be accurately obtained. – construct a 2-D matrix. » The residue matching is according to a particular scoring matrix. The scores are calculated one row at a time. This starts with the first row of one sequence, which is used to scan through the entire length of the other sequence, followed by scanning of the second row. The matching scores are calculated.
  • 21.
  • 22. • Gap Penalties – Apply gaps that represent insertions and deletions. – cost difference between opening a gap and extending an existing gap. » it is easier to extend a gap that has already been started. Thus, gap opening have a much higher penalty  if insertions and deletions ever occur, several adjacent residues are likely to have been inserted or deleted together. » affine gap penalties (= These differential gap penalties). » Strategy: use preset gap penalty values for introducing and extending gaps. » The total gap penalty (W) is a linear function of gap length: » a constant gap penalty - less realistic γ = gap opening penalty, δ = gap extension penalty, k = length of the gap.
  • 23. • DP for Global Alignment (Needleman–Wunsch algorithm) – an optimal alignment is obtained over the entire lengths of the two sequences. – Drawback = risk of missing the best local similarity → only suitable for aligning two closely related sequences that are of the same length. (For divergent sequences or sequences with different domain structures, the approach does not produce optimal alignment) • DP for Local Alignment (Smith–Waterman algorithm) – identification of regional sequence similarity
  • 24. Scoring 행렬 • (…) = a substitution 행렬 • is derived from statistical analysis of residue substitution data from sets of reliable alignments of highly related sequences. – A positive value or high score is given for a match and a negative value or low score for a mismatch. – Assumption: the frequencies of mutation are equal for all bases. 단, 비현실적 가정임 • Scoring matrices for amino acids are more complicated –  the physicochemical properties of amino acid residues, as well as the likelihood of certain residues being substituted among true homologous sequences. – Certain amino acids with similar physicochemical properties can be more easily substituted than those without similar characteristics. Substitutions among similar residues are likely to preserve the essential functional and structural features. However, substitutions between residues of different physicochemical properties are more likely to cause disruptions to the structure and function.
  • 25.
  • 26. • Amino Acid Scoring 행렬 – 20 x 20 matrices to reflect the likelihood of residue substitutions • 2 types of amino acid substitution matrices. – (i) based on interchangeability of the genetic code or amino acid properties, » is based on genetic code or the physicochemical features of amino acids → less accurate – (ii) derived from empirical studies of amino acid substitutions. »  surveys of actual amino acid substitutions among related proteins. » PAM and BLOSUM matrices derived from actual alignments of highly similar sequences. By analyzing the probabilities of amino acid substitutions in these alignments, a scoring system can be developed by giving a high score for a more likely substitution and a low score for a rare substitution.
  • 27. • PAM 행렬 (Dayhoff PAM 행렬) • point accepted mutation Correspondence of PAM Numbers with Observed Amino Acid Mutational Rates
  • 28. • BLOSUM 행렬 • the series of blocks amino acid substitution matrices (BLOSUM) – → (In PAM matrix construction, the only direct observation of residue substitutions is in PAM1, based on a relatively small set of extremely closely related sequences. Sequence alignment statistics for more divergent sequences are not available. ) – all are derived based on direct observation for every possible amino acid substitution in multiple sequence alignments. • extrapolation 함수 대신, BLOSUM matrices are actual % identity values of sequences selected for construction of the matrices.
  • 29. PAM250 amino acid substitution matrix. Residues are grouped according to physicochemical similarities.
  • 30. BLOSUM62 amino acid substitution matrix.
  • 31. • PAM과 BLOSUM의 비교 • 주된 차이점 – PAM matrices, except PAM1, are derived from an evolutionary model – BLOSUM matrices consist of entirely direct observations. » BLOSUM matrices are entirely derived from local sequence alignments of conserved sequence blocks, » PAM1 matrix is based on the global alignment of full-length sequences composed of both conserved and variable regions. → BLOSUM matrices is more advantageous in searching databases and finding conserved domains in proteins. • 몇몇 실증 비교의 결과 – BLOSUM matrices outperform the PAM matrices in terms of accuracy of local alignment, largely because BLOSUM matrices are derived from a much larger and more representative dataset than the one used to derive the PAM matrices. → BLOSUM matrices more reliable. – 개정된 행렬이 고안됨. (ex) Gonnet matrices and Jones–Taylor–Thornton matrices –particularly robust in phylogenetic tree construction .
  • 32. alignment score에 대한 Gumble 극값 분포.
  • 33. Sequence Alignment의 통계적 유의성 • 개념 • True evidence of homology를 찾기 위한 통계검정 – 검정 절차 • A P-value resulting from the test – < 10-100 indicates an exact match between the two sequences. – 10-100 < P-value < 10-50 → a nearly identical match. – 10-50 < P-value < 10-5 → sequences having clear homology. – 10-5 < P-value < 10-1 → possible distant homologs. – 10-1 < P-value → the two sequence may be randomly related. – However, sometimes truly related protein sequences may lack the statistical significance at the sequence level owing to fast divergence rates. Their evolutionary relationships can nonetheless be revealed at the three-dimensional structural level.
  • 34. Database 유사도 검색 • DB 검색의 요건 • Heuristic 검색 • Basic Local Alignment Search Tool (BLAST) – Variants – Statistical Significance – Low Complexity Regions – BLAST Output Format • FASTA – 통계적 유의성 • FASTA와 BLAST의 비교 • Smith–Waterman Method에 의한 검색
  • 35. 일반론 • DB 검색 • pairwise alignment to retrieve biological sequences in DBs based on similarity. – Query for a pairwise comparison with all individual sequences in a database. - Database similarity searching is pairwise alignment on a large scale. – However, DP is slow and impractical to use in most cases. Special search methods are needed to speed up the computational process. • DB 검색의 요건 • Sensitivity → “true positives” • specificity = “false positives.” • speed – Types of algo • Exhaustive type – examine all mathematical combinations (ex) DP • Heuristic type – find empirical or near optimal solution using rules of thumb
  • 36. Heuristic 검색 • (…) – BLAST – FASTA – word method • Both BLAST and FASTA use a heuristic “word method” for fast pairwise sequence alignment.
  • 37. Basic Local Alignment Search Tool (BLAST) • 목적 – = high-scoring ungapped segments를 찾아내고자 함 - Segments above a given threshold indicates pairwise similarity beyond random chance. BLOSUM62 matrix에 의한 alignment scoring의 예
  • 38. • 변형된 방법론 – BLASTN – BLASTP – BLASTX – TBLASTX
  • 39. • 통계적 유의성 – The larger the DB, the more unrelated sequence alignments. → a new parameter taking into account total number of sequence alignments conducted, proportional to the size of the database. • In BLAST searches, E-value (expectation value) – indicates the probability that the resulting alignments from a DB search are caused by random chance. – E-value is related to the P-value used to assess significance of single pairwise alignment. BLAST compares a query sequence against all database sequences, and so the E-value is determined by: – (ex) … • A bit score – Measures sequence similarity independent of query sequence length and DB size and is normalized based on the raw pairwise alignment score
  • 40. • Low Complexity Regions (LCRs) • For both protein and DNAsequences, there may be regions that contain highly repetitive residues, such as short segments of repeats, or segments that are overrepresented by a small number of residues. – LCRs are rather prevalent in DB sequences; about 15% of the total protein sequences in public databases. → spurious DB matches and lead to artificially high alignment scores with unrelated sequences. • To avoid the problem of high similarity scores owing to matching of LCRs, filter out the problematic regions in both query and DB sequences to improve SN ratio,(= masking) • 2 types of masking: hard and soft. • SEG detects and mask repetitive elements before executing DB searches. – SEG has been integrated into the BLAST web based program. • BLAST Output Format
  • 41.
  • 42. FASTA • (…) • 최초의 DB 유사도 검색 도구 • find matches for a short stretch of identical residues with a length of k. (“hashing” 방식) – string of residues (= ktuples or ktups) are equivalent to words in BLAST, but are normally shorter than words. Typically, a ktup is composed of two residues for protein sequences and six residues for DNA sequences. • Similar to BLAST, FASTA has a number of subprograms.
  • 43. Procedure of ktup identification using the hashing strategy by FASTA. Identical offset values between residues of the two sequences allow the formation of ktups.
  • 44. Steps of the FASTA alignment procedure. In step 1 (left ), all possible ungapped alignments are found between two sequences with the hashing method. In step 2 (middle), the alignments are scored according to a particular scoring matrix. Only the ten best alignments are selected. In step 3 (right ), the alignments in the same diagonal are selected and joined to form a single gapped alignment, which is optimized using the dynamic programming approach.
  • 45. • 통계적 유의성 • FASTA also uses E-values and bit scores. – essentially the same as in BLAST, but the FASTA output provides one more statistical parameter, the Z-score. » Because most of the alignments with the query sequence are with unrelated sequences, the higher the Z-score for a reported match, the further away from the mean of the score distribution, hence, the more significant the match. » For a Z-score > 15, the match can be considered extremely significant, with certainty of a homologous relationship. » If Z is in the range of 5 to 15, the sequence pair can be described as highly probable homologs. » If Z < 5, their relationships is described as less certain.
  • 46. FASTA와 BLAST의 비교 • (…) • BLAST and FASTA perform equally well in regular DB searching. • differences (Notably seeding step) – BLAST uses a substitution matrix to find matching words » use of low-complexity masking in BLAST → higher specificity than FASTA because potential FPs are reduced. » BLAST sometimes gives multiple best-scoring alignments from the same sequence; – FASTA identifies identical matching word using hashing procedure. » By default, FASTA scans smaller window sizes. → more sensitive results than BLAST, with a better coverage rate for homologs. However, it is usually slower than BLAST. » FASTA returns only one final alignment.
  • 47. 다중 서열정렬 (Multiple Sequence Alignment) • Scoring 함수 • Exhaustive Algorithms • Heuristic Algorithms – Progressive Alignment Method – Drawbacks and Solutions – Iterative Alignment – Block-Based Alignment • 검토사항 – Protein-Coding DNA Sequences – Editing – Format Conversion
  • 48. • 개념 • generation of multiple matching sequence pairs → convert numerous pairwise alignments into a single alignment → arrange sequences in such a way that evolutionarily equivalent positions across all sequences are matched. • 장점 – reveals more biological information than pairwise alignments can. – applications in designing degenerate PCR primers based on multiple related sequences. • DP vs. Heuristic – the amount of computing time and memory DP requires increases exponentially as the number of sequences increases. In practice, heuristic approaches are most often used.
  • 49. Scoring 함수 • (…) • MSA is to arrange sequences in such a way that a max no. of residues from each sequence are matched up according to a particular scoring function. » = sum of pairs (SP). (= sum of scores of all possible pairs of sequences in a multiple alignment based on a particular scoring matrix). – In calculating SP scores, each column is scored by summing the scores for all possible pairwise matches, mismatches and gap costs. The score of the entire alignment is the sum of all of column scores. – The purpose of most multiple sequence alignment algorithms is to achieve maximum SP scores.
  • 51. Heuristic Algorithms • (3 categories) – Progressive Alignment Method – Iterative Alignment – Block-Based Alignment • Progressive Alignment Method – Drawbacks and Solutions Schematic of a typical progressive alignment procedure (e.g., Clustal). Angled wavy lines represent consensus sequences for sequence pairs A/B and C/D. Curved wavy lines represent a consensus for A/B/C/D.
  • 52.
  • 53. Conversion of a sequence alignment into a graphical profile in the Poa algorithm. Identical residues in the alignment are condensed as nodes in the partial order graph.
  • 54. • Iterative Alignment • Block-Based Alignment Schematic of iterative alignment procedure for PRRN, which involves two sets of iterations.