Talk about STRING and STITCH, given as part of the EMBO Practical Course 'Computational aspects of protein structure determination and analysis: from data to structure to function' at the EBI in Hinxton (Sept. 10, 2010)
56. higher coverage
lower specificity
includes all available evidence
some orthologous groups are too
large to be meaningful
57. STRING plans
• next big release (9.0):
• coming end of 2010 / early 2011
• more genomes
• allow users to add more data to the
network
58. STITCH plans
• next minor release (2.1):
• add ChEMBLdb
• next big release (3.0):
• “zoom” into stereo-isomers, salt forms
59. Acknowledgements
STRING
Christian von Mering STITCH
Lars Juhl Jensen Damian Szklarczyk
Manuel Stark Andrea Franceschini
Samuel Chaffron Monica Campillos
Chris Creevey Christian von Mering
Jean Muller Lars Juhl Jensen
Tobias Doerks Andreas Beyer
Philippe Julien Peer Bork
Alexander Roth
Milan Simonovic
Peer Bork
In this mode, STRING predicts interaction partners for one protein in a specific species.
This allows for maximum specificity, but has slightly lower coverage. Why? Because, in protein mode, STRING does not precisely know about orthologs in other species - instead, it resorts to estimating orthology through sequence similarity searches. In short, interaction information is transferred between species based on 'degree of orthology' (whereby 'degree of orthology' is a measure of how confident STRING is that two proteins are orthologs. The measure is derived from all-against-all similarity searches, and takes into account putative paralogs in both species. The fewer paralogs there are, the more confident STRING is about orthology).
In this mode, STRING predicts interaction partners for a group of orthologous proteins.
This generally has higher coverage, but may result in slightly lower specificity. Again, the reason is in how STRING derives orthologs. In COG-Mode, information about orthology is derived from the database 'Clusters of Orthologous Groups' (Tatusov & Koonin, NCBI). There, orthology is an 'all-or-nothing' decision, and all proteins considered orthologous are grouped into a single entity. Therefore, a prediction made for one protein applies to all proteins in the group - which is why STRING shows its predictions at the level of the groups. Coverage is higher, because the groups are partly based on manual curation and contain orthology assignments which are difficult to derive through an automated procedure. Specificity is lower, however, because some groups are (for technical reasons) relatively 'inclusive' - i.e. they contain a large number of proteins which cannot be resolved further. For example, almost all Serine/Threonine kinases are grouped into one COG - making predictions for a specific subset impossible. Nevertheless, COGs are very powerful and are the first choice for proteins which do not show much lineage-specific expansions, especially in prokaryotes.