Complex Multi-omics Data Analysis and Integration Platform

Typical Mass-use Pipelines Complex Challenges and Workflows
NGS (Next Generation Sequencing)
1. Total-RNA Analysis (RNA-seq, Non-Coding RNA, Repeats)
2. Epigenetics (CHiP-seq and Bisulfate-Seq)
3. Variant Calling
4. Microbiome (Metagenomics)
Mass Spec
1. Proteomics
2. Metabolomics
Structural Biology
1. Libraries of Small Molecules (Query, Clustering)
2. Docking (Including large molecules)
Machine Learning
1. Phenotypic Analysis and Modeling
2. Analysis of visual data
3. Standard Statistical methods
4. Integration of heterogenous data sets
CirSeq Mutation Analysis
1. Analysis of viral CirSeq data for precise mutation identification
2. Fitness of mutations reflecting viral adaptation
3. Identification of viral quasi-species
Mass Spec
1. Protein-protein Interactions between host and viral
proteins
2. Post translational modifications of host proteins
Structural Biology
1. Libraries of Small Molecules (Query, Clustering)
2. Docking (Including large molecules)
NGS host data
1. Host gene expression variations in response to
infectious quasi-species
T-BioInfo is a user-friendly computational platform that enables analysis and integration of big data.The challenge of mining -omics data for meaningful
patters that can be applied in biomedical and agricultural research as sequencing becomes cheaper and more precise. On the other hand, complex
networks of dependencies that define many conditions tend to require integration of huge heterogenous data sets from SNPs, gene expression, epigenetic
markers, proteomic and metabolomic profiles, even structural biology data. Our company has developed innovative and user friendly workflows for analysis
and integration of these different datasets. Now we are looking to test and commercialize a platform that provides web access to the platform.

Simple, Flexible and Consistent Interface Across All Sections
Integration of
analysis types
One environment
for all types of data
and analysis
“one-button”
approach to most
areas of analysis
• Flexible analysis pipelines in the pla/orm sec4ons and easy to perform data input
• A user is assisted by the pla/orm in construc4ng meaningful algorithmic pipelines for processing data:
modules for pipeline con4nua4on are highlighted by black background and yellow 4tle.

Analysis of Total RNA
Concept: raw total transcriptome reads contain informa4on not only about
expressed splice variants (isoforms) of genes, but also about expressed
transposons and regulatory non-coding RNAs. The complete analysis consists
of three steps. First, the reads are mapped on isoforms in order to get
isoform expression levels. Second, previously unmapped reads are mapped
on known repe44ve elements (RE) and non-coding RNAs in order to get their
expression levels. Third, the rest of reads are processed by special clustering
(BiClustering) in order to get new expressed RE and non-coding RNAs as well
as their expression levels under applied biological condi4ons. On the next
stage, data integra4on can be performed: interplay between expressed
isoforms, transposons, and regulatory RNAs.
1
Detec4on of expressed isoforms and their expression levels
by mapping the reads on constructed transcripts
√
2 For unmapped reads: √
3
Detec4on of most expressed repeats and regulatory
RNA from databases
√
4
BiClustering: associa4ons of kmers and reads as a
bicluster, and genera4on of Kchains of biclusters
√
5 Extensions of Kchains ±
6
Mapping of NGS reads on found Kchains: detec4on of
most expressed novel transposons and regulatory
RNAs
√
T-Bioinfo RNA-seq/chip section
Example: Expression of RepeatsAlgorithmic Approaches:
Analysis of
“Junk” RNA

Epigenetic Analysis: Bisulfite DNA Methylation and CHiP-Seq
Bisulfite Concept: bisulfite sequencing shows T instead of C in a read if C of a genomics site
(like CpG) is methylated. Thus, detec4on of methylated sites and genome fragments
enriched/depleted by methyla4on is based on special type of read mapping, and
segmenta4on of the whole genome methyla4on profile. The analysis objec4ves include
special mapping algorithms with tolerance of the T-to-C mismatch, sta4s4cal es4ma4on of
the per-site methyla4on level, allele specificity of DNA methyla4on, as well as detec4on of
the over-methylated and under-methylated genomic regions.
CHiP-Seq Concept: detec4on of epigene4c signals such as histone modifica4ons of different types and DNA
methyla4on events as well as determining protein/DNA binding sites (TF binding sites) are performed by CHiP-seq
and CHiP-chip experiments. Analysis of profiles of these whole genome signals is performed by the genome
segmenta4on algorithms. The analysis objec4ves include iden4fying signal enriched genome fragments as puta4ve
epigene4c events, and a combina4on of enriched fragments on posi4ve and nega4ve strands with a certain
distance between them as the TF binding event. On the next analysis stage, the data integra4on can be performed:
interplay between genome muta4ons and epigene4c signals on one side and expressed isoforms, transposons, and
regulatory RNAs on the other side. The network of gene regula4on by a transcrip4on factor can be reconstructed
from the whole genome TF binding posi4ons and expressions of the down-stream genes. Microarray datasets are
transformed into pseudo NGS reads and are analyzed by the same CHiP-seq pipelines.
T-Bioinfo CHiP-seq section
1 Preprocessing of raw data √
2
Mapping of NGS reads by bisulfite mapping algorithms: no penalty
for T(read)-to-C(genome) mismatches
√
3
Detec4on of the DNA methylated posi4ons and their scores by the
confidence interval method
√
4 Allele specificity of the methyla4on in a posi4on. -
5
Detec4on of over-methylated and under-methylated genomic
intervals by the segmenta4on algorithms
±
6
Detec4on of differen4al DNA methyla4ons (individual posi4ons and
intervals) between contras4ng condi4ons
±

Virology Pipeline
Mutation Fitness
Genome-wide fitness calculations enabled by CirSeq,
combined with structural information, can provide
high-definition, bias-free insights into structure-function
relationships, potentially revealing novel functions for
viral proteins and RNA structures, as well as nuanced
insights into a viral genome’s phenotypic space. Such
analyses have the power to reveal protein residues or
domains that directly correspond to viral functional
plasticity and may significantly inform our structural
and mechanistic understanding of host–pathogen
interactions.

Integration of Heterogenous Data sets
Concept: mutual associa4on of features of biological datasets is most substan4al part for
integra4on of several analyses of biological projects in one story. We are sugges4ng several
techniques for such associa4ons.
Matching of metabolite and SNP profiles
according to LB’s selection of SNPs

Patent Pending Technology
for Drug Discovery
Fast screening and clustering of small molecules based on
physico-chemical similarity (70-100 times faster than industry
standard)
Small Molecule Candidate
Identifying a biologically active molecule
(Polio)
Patent Pending: Ref. P-78368-US | App. No. 14/625,785 entitled
SYSTEMS AND METHODS OF IMPROVED MOLECULE SCREENING
Computational analysis of small molecules can be roughly divided into three sections: pre-
processing analysis, virtual screening methods, and clustering.The aim of the conformer
generation process is to build a set of representative conformers that covers the conformational
space of a given molecule.There are two main classes of virtual screening methods: similarity-
based methods (descriptor-based screening; geometric querying; shape-based querying;
ﬁngerprints) and receptor-based methods (docking). One of the greatest challenges of docking
software is to consider protein ﬂexibility.These macromolecules are not static objects and
conformational changes are often key elements in ligand binding. T-Bioinfo provides a number of
proprietary methods that can be combined into pipelines for drug discovery.

Tauber Bioinformatics
Research Center
Tauber Bioinformatics Research Center at the University of Haifa
has a proven track record in Bioinformatics with scientiﬁc
collaborations with Hospitals, top US Universities, involvement in
government-funded projects, and multiple publications in
leading journals such as Science and Nature.
Pine Biotech holds an exclusive license for commercialization of
tools developed at the TBRC for research, industry applications
and education. The startup is located at the BioInnovation Center
in New Orleans, LA. In collaboration with TBRC staff, Pine Biotech
is completing several pilot projects to validate our approach.
Aleph
Therapeutics‫א‬
Early Adopters and Collaborators:

Complex Multi-omics Data Analysis and Integration Platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Complex Multi-omics Data Analysis and Integration Platform

Similar to Complex Multi-omics Data Analysis and Integration Platform (20)

Recently uploaded

Recently uploaded (20)

Complex Multi-omics Data Analysis and Integration Platform