Life science databases are sometimes difficult to understand due to lack of information. I'd like to add metadata into databases and improve search results.
12. In case of Hyper Estraier (Search
System)
NIBIO AgriTogo
Collaborate by
using P2P
NBDC / DBCLS MEDALS
architecture
Under
Comtemplation
JCGGDB
12
13. Back to the simple answers to
improvement
• Speed (Thanks to Johan-san ,Mizuguchi-san and
many collaborators)
1. Relax limits on access of DBCLS
(Use a liggle ingenuity in css and images)
• Accuracy NIBIO
NBDC / DBCLS
14. How to improve accuracy?
• What is accuracy for life science database
cross search?
• What is accuracy for life science
specialist?
15. • In general, developers emphasize search
algorithms and scorings.
• However, general results and methods for
cross search may not suitable for life
science specialists..?
• Data (Index files) from life science
databases are sometimes difficult to
understand immediately.
• It’s hard to make each crawler program for
each database and maintenance it.
• (We have no extra …. to make proper
search page like entrez et al….)
16. To Improve Accuracy
• Manually select Databases
• Assigned weights to crawled databases for
improving the ranking system
17. Metadata!
• One way to solve these problems
Difficult to
understand
data
immediately
18. If metadata are added data…
Data
Metadata
Disease:Epithelial adenoma
Species:Mouse
Keywords:DNA sequence
Last Modified:2013-01-19
19. Easy to understand for users
• It can be a guide to improve user experience.
Image
20. Easy to understand for crawlers
Metadata
Disease:Epithelial adenoma
Species:Mouse
Keywords:DNA sequence
Last Modified:2013-01-19
21. How to use it?
• Mark up data by microdata like a tag
Image
Title ID
Last Modified
http://www.pdbj.org/emnavi/emnavi_detail.php?id=1556&lang=en
22. Is it a practical suggestion?
• Google, Yahoo! and Bing decided to use microdata to
show search results more valuable.
• Some vocabularies have already applied to search
results.
• E.g.
23. Schema.org
• Provide a collection of schemas (htm tags)
• Bing, Google, Yahoo! and Yandex rely on
this markup to improve the display of search
results, making it easier for people to find
the right web pages. (quoted by schema.org)
• We proposed “schema.org” extensions for
“BiologicalDatabaseEntry” and “Biological
Database”.
• Schema.org proposals :
http://www.w3.org/wiki/WebSchemas/SchemaDot
OrgProposals
25. Related Link for our proposal
• WebSchemas proposal ‘Biological
Databases’ for schema.org
– http://www.w3.org/wiki/WebSchemas/BioData
bases
• Discussions at BioHackathon
– https://github.com/dbcls/bh12/wiki/Schema.org
-extension
• Discussions at BH12.12 (Japanese only)
– http://wiki.lifesciencedb.jp/mw/index.php/BH12
.12/schema.org
26. How to markup ?
Declaration
<div itemscope itemtype=“http://schema.org/BiologicalDatabaseEntry”>
ID
<span itemprop="entryID">1556</span>
Specied
<span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry">
<span itemprop="name">Bacillus subtilis</span>
</span>
Deposition:
<span itemprop="dateCreated">2008-09-08</span>
Last update:
<span itemprop="dateModified">2012-10-24</span>
</div>
Specify Property and
markup with normal tag
27. And then
• Crawl these microdata At Present
• Reflect Search Results Image
Within the fiscal year
(Preparation to reflect)
28. Ask for your help
• If this approach have some efforts, there are
may be chances to reflect major search
engines.
• Please markup your own site or database
and give me feedback.
• If you have any suggestions or comments,
please let me know.
29. Future Perspective
• Focus on Accuracy continuously
• Microdata
– Discuss many scientists and finalize the
proposal of schema.org extension
– Boost numbers of databases
– Make support tools to mark up microdata
• Add appropriate data from high-quality
databases