Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Data retrieval tools
1. Data retrieval tools
Dedicated to access information for molecular biologists.
Most widely used are,
1. Entrez
2. DBGET
3. SRS
Each of these allows,
- Text based searching of a no. of linked DBs.(Data Bases)
- Sequence searching.
They differ in,
- The DBs they cover
- How the retrieved information is accessed and presented.
Entrez
- WWW-based data retrieval system.
- Developed by NCBI (National Centre for Biotechnology Information).
- Integrates information held in different DBs.
Data bases covered by Entrez are,
Nucleic acid - GenBank, RefSeq, PDB.
Protein seqs - SWISS-PROT, PIR.
3D structures – MMDB
Genomes – Many sources
PopSet – From GenBank
OMIM – OMIM
Taxonomy – NCBI taxonomy database
Books- Bookshelf
ProbeSet – GEO (Gene Expression Omnibus)
Literature - PubMed
3. - Data retrieval tool developed by EBI
- Integrates 80 molecular biology DBs
- An Open source software (Can be installed locally)
SRS has an associated scripting language called Icarus
Central resource for molecular biology data
- more than 250 databanks have been indexed. More than 35 SRS servers over the
WWW(world wide)
Data analysis applications server
- 11 protein applications
- 6 nucleic acid applications
- Uniform query interface on the web
History of SRS
1990 - Main author Dr. Thure Etzold
– Development started in EMBL, Heidelberg
1997
– Moved to EBI in Cambridge. Development work was supported by various
grants amongst others from the EMBnet.
1998
– Etzold and his group join LionBiosciences
Information retrieval
– Easy way to retrieve information from sequence and sequence-related
databases
– Possibility to search for multiple words/other criteria
Linkage between different databases
– E.g. Find all primary structures with known three-dimensional structure.
Different types of database in SRS
Sequence & structure
– DNA, protein, three-dimensional structures
Sequence-related
Gene-related
– Genome, mapping, mutations, transcription factors
– SNP
Bibliographic
4. – Medline, enzyme
User-defined
SRS main toolbar tabs:
Top Page: displays databases in different database groups
Query: displays either the standard or extended query form
Results or “the query manager”: maintains a history of all the results obtained
during a session
Projects or “the project manager”: maintains a history of all queries and views
used during a session
Views: allows a user to define a user specific view for one or more databases
Databanks: contains a list and some facts about the databases available in the
system
Search terms in SRS
SRS indexed fields can be searched using any of the following:
– Single word search
– Multiple word phrases
– Numbers and dates
– Regular expressions
– Wildcards