SlideShare uma empresa Scribd logo
1 de 30
Maori Ito @ NIBIO
Life Science Database
Cross Search and Metadata
Database integrate collaboration
among 4 ministries with NBDC
• Database Catalog
• Life Science Database Cross Search
• Life Science Database Archive
• Database Reconstructive Integration
Why Cross Search?
• Easy to use


• Accustomed to use


• Appropriate for comparing various kinds of
  databases
Sagace
• Search for Biomedical Data & Resources
  in Japan
Bad Skeptical Reputations for
Search Results…
• Useless…
• Slow….
• What is the advantage?
hat is the most Importa
thing in cross search ?
Simple Answers

•Speed and Accuracy
Mechanism of Search Engine
1. Crawling
2. Indexing
3. Query Processing
4. Scoring
Crawling
• Crawl databases and pages by program




                                     Program
Indexing
     • Split data convenient size and store
       own server
External Data




Internal Server
Query Processing and Scoring
In case of Hyper Estraier (Search
    System)
               NIBIO      AgriTogo




                                     Collaborate by
                                       using P2P
NBDC / DBCLS           MEDALS
                                      architecture
                                           Under
                                           Comtemplation
                       JCGGDB


                                                           12
Back to the simple answers to
     improvement
• Speed (Thanks to Johan-san ,Mizuguchi-san and
 many collaborators)
  1. Relax limits on access of DBCLS
  (Use a liggle ingenuity in css and images)
• Accuracy                                        NIBIO




                                               NBDC / DBCLS
How to improve accuracy?
• What is accuracy for life science database
  cross search?
• What is accuracy for life science
  specialist?
• In general, developers emphasize search
  algorithms and scorings.
• However, general results and methods for
  cross search may not suitable for life
  science specialists..?
• Data (Index files) from life science
  databases are sometimes difficult to
  understand immediately.
• It’s hard to make each crawler program for
  each database and maintenance it.
• (We have no extra …. to make proper
  search page like entrez et al….)
To Improve Accuracy
• Manually select Databases
• Assigned weights to crawled databases for
  improving the ranking system
Metadata!
  • One way to solve these problems




  Difficult to
 understand
     data
immediately
If metadata are added data…
                                 Data




Metadata
  Disease:Epithelial adenoma
  Species:Mouse
  Keywords:DNA sequence
  Last Modified:2013-01-19
Easy to understand for users
• It can be a guide to improve user experience.




                                   Image
Easy to understand for crawlers
          Metadata
             Disease:Epithelial adenoma
             Species:Mouse
             Keywords:DNA sequence
             Last Modified:2013-01-19
How to use it?
  • Mark up data by microdata like a tag
Image
                                     Title                    ID


                                                 Last Modified




                       http://www.pdbj.org/emnavi/emnavi_detail.php?id=1556&lang=en
Is it a practical suggestion?
• Google, Yahoo! and Bing decided to use microdata to
  show search results more valuable.
• Some vocabularies have already applied to search
  results.
• E.g.
Schema.org
• Provide a collection of schemas (htm tags)
• Bing, Google, Yahoo! and Yandex rely on
  this markup to improve the display of search
  results, making it easier for people to find
  the right web pages. (quoted by schema.org)
• We proposed “schema.org” extensions for
  “BiologicalDatabaseEntry” and “Biological
  Database”.
• Schema.org proposals :
 http://www.w3.org/wiki/WebSchemas/SchemaDot
 OrgProposals
Properties for
    BiologicalDatabaseEntry

entryID     additionalType        dateCreated
isEntryof   description           dateModified
taxon       image                 keywords
seeAlso     url                   provider
reference   alternativeHeadline   breadcrumb
name        inLanguage
Related Link for our proposal
• WebSchemas proposal ‘Biological
  Databases’ for schema.org
  – http://www.w3.org/wiki/WebSchemas/BioData
    bases
• Discussions at BioHackathon
  – https://github.com/dbcls/bh12/wiki/Schema.org
    -extension
• Discussions at BH12.12 (Japanese only)
  – http://wiki.lifesciencedb.jp/mw/index.php/BH12
    .12/schema.org
How to markup ?
                                                    Declaration
<div itemscope itemtype=“http://schema.org/BiologicalDatabaseEntry”>
ID
 <span itemprop="entryID">1556</span>
Specied
<span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry">
 <span itemprop="name">Bacillus subtilis</span>
</span>
Deposition:
 <span itemprop="dateCreated">2008-09-08</span>
Last update:
 <span itemprop="dateModified">2012-10-24</span>

</div>
                                          Specify Property and
                                         markup with normal tag
And then
• Crawl these microdata              At Present




• Reflect Search Results           Image




                            Within the fiscal year
                           (Preparation to reflect)
Ask for your help
• If this approach have some efforts, there are
  may be chances to reflect major search
  engines.
• Please markup your own site or database
  and give me feedback.
• If you have any suggestions or comments,
  please let me know.
Future Perspective
• Focus on Accuracy continuously
• Microdata
  – Discuss many scientists and finalize the
    proposal of schema.org extension
  – Boost numbers of databases
  – Make support tools to mark up microdata
• Add appropriate data from high-quality
  databases
Thank you for
listening!

Mais conteúdo relacionado

Mais procurados

Adding valuethroughdatacuration
Adding valuethroughdatacurationAdding valuethroughdatacuration
Adding valuethroughdatacurationAPLICwebmaster
 
Federated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The UglyFederated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The Uglydorishelfer
 
Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Maori Ito
 
Deep Dive: Security Trimming in Fusion
Deep Dive: Security Trimming in FusionDeep Dive: Security Trimming in Fusion
Deep Dive: Security Trimming in FusionLucidworks
 
Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12ASIS&T
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsDan Sullivan, Ph.D.
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Shortslknight
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE
 
Federated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentFederated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentHelen Mitchell
 
Qiagram
QiagramQiagram
Qiagramjwppz
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLijscai
 
Labmatrix
LabmatrixLabmatrix
Labmatrixjwppz
 
IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUKerstin Lehnert
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishingVarsha Khodiyar
 
Performance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBasePerformance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBaseSindhujanDhayalan
 

Mais procurados (18)

Presentation federated search
Presentation federated searchPresentation federated search
Presentation federated search
 
Adding valuethroughdatacuration
Adding valuethroughdatacurationAdding valuethroughdatacuration
Adding valuethroughdatacuration
 
Federated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The UglyFederated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The Ugly
 
Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013
 
Deep Dive: Security Trimming in Fusion
Deep Dive: Security Trimming in FusionDeep Dive: Security Trimming in Fusion
Deep Dive: Security Trimming in Fusion
 
Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12Landing Pages - Joe Hourcle - RDAP12
Landing Pages - Joe Hourcle - RDAP12
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Short
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and Workflows
 
BigData Testing by Shreya Pal
BigData Testing by Shreya PalBigData Testing by Shreya Pal
BigData Testing by Shreya Pal
 
Federated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentFederated Search in a Disparate Environment
Federated Search in a Disparate Environment
 
Qiagram
QiagramQiagram
Qiagram
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
Labmatrix
LabmatrixLabmatrix
Labmatrix
 
IEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGUIEDA Data Publication Workshop @AGU
IEDA Data Publication Workshop @AGU
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Performance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBasePerformance analysis of MongoDB and HBase
Performance analysis of MongoDB and HBase
 

Destaque

Life Science Accelerator, Tom Palenius
Life Science Accelerator, Tom Palenius Life Science Accelerator, Tom Palenius
Life Science Accelerator, Tom Palenius Business Turku
 
Nishimoto110126 v15-light2
Nishimoto110126 v15-light2Nishimoto110126 v15-light2
Nishimoto110126 v15-light2Takuya Nishimoto
 
The Life Science Product Manager's Toolkit
The Life Science Product Manager's ToolkitThe Life Science Product Manager's Toolkit
The Life Science Product Manager's Toolkitprothenberg
 
JSでファミコンエミュレータを作った時の話
JSでファミコンエミュレータを作った時の話JSでファミコンエミュレータを作った時の話
JSでファミコンエミュレータを作った時の話sairoutine
 
Pharmaceutical Mergers Acquisitions in the U.S
Pharmaceutical Mergers Acquisitions in the U.SPharmaceutical Mergers Acquisitions in the U.S
Pharmaceutical Mergers Acquisitions in the U.SCapgemini
 
Prescription Medicines: Costs in Context
Prescription Medicines: Costs in Context Prescription Medicines: Costs in Context
Prescription Medicines: Costs in Context PhRMA
 
2015 SF Exploratorium Lecture: "Corn: Diversity and Origins"
2015 SF Exploratorium Lecture: "Corn: Diversity and Origins"2015 SF Exploratorium Lecture: "Corn: Diversity and Origins"
2015 SF Exploratorium Lecture: "Corn: Diversity and Origins"jrossibarra
 
The Future Of Work & The Work Of The Future
The Future Of Work & The Work Of The FutureThe Future Of Work & The Work Of The Future
The Future Of Work & The Work Of The FutureArturo Pelayo
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 

Destaque (10)

Life Science Accelerator, Tom Palenius
Life Science Accelerator, Tom Palenius Life Science Accelerator, Tom Palenius
Life Science Accelerator, Tom Palenius
 
Nishimoto110126 v15-light2
Nishimoto110126 v15-light2Nishimoto110126 v15-light2
Nishimoto110126 v15-light2
 
The Life Science Product Manager's Toolkit
The Life Science Product Manager's ToolkitThe Life Science Product Manager's Toolkit
The Life Science Product Manager's Toolkit
 
JSでファミコンエミュレータを作った時の話
JSでファミコンエミュレータを作った時の話JSでファミコンエミュレータを作った時の話
JSでファミコンエミュレータを作った時の話
 
Pharmaceutical Mergers Acquisitions in the U.S
Pharmaceutical Mergers Acquisitions in the U.SPharmaceutical Mergers Acquisitions in the U.S
Pharmaceutical Mergers Acquisitions in the U.S
 
Prescription Medicines: Costs in Context
Prescription Medicines: Costs in Context Prescription Medicines: Costs in Context
Prescription Medicines: Costs in Context
 
2015 SF Exploratorium Lecture: "Corn: Diversity and Origins"
2015 SF Exploratorium Lecture: "Corn: Diversity and Origins"2015 SF Exploratorium Lecture: "Corn: Diversity and Origins"
2015 SF Exploratorium Lecture: "Corn: Diversity and Origins"
 
The Future Of Work & The Work Of The Future
The Future Of Work & The Work Of The FutureThe Future Of Work & The Work Of The Future
The Future Of Work & The Work Of The Future
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 

Semelhante a Life Science Database Cross Search and Metadata

The Progress on Sagace and Data Integration
The Progress on Sagace and Data IntegrationThe Progress on Sagace and Data Integration
The Progress on Sagace and Data IntegrationMaori Ito
 
Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptxSoniaDevi15
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...mestato
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overviewjbgraybeal
 
CS3270 - DATABASE SYSTEM - Lecture (1)
CS3270 - DATABASE SYSTEM -  Lecture (1)CS3270 - DATABASE SYSTEM -  Lecture (1)
CS3270 - DATABASE SYSTEM - Lecture (1)Dilawar Khan
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsGeorge Stathis
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsNeo4j
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Syed Ahmad Chan Bukhari, PhD
 
2015 09 emc lsug
2015 09 emc lsug2015 09 emc lsug
2015 09 emc lsugChris Dwan
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas WorkshopNiall Beard
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Ken Karapetyan
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 

Semelhante a Life Science Database Cross Search and Metadata (20)

The Progress on Sagace and Data Integration
The Progress on Sagace and Data IntegrationThe Progress on Sagace and Data Integration
The Progress on Sagace and Data Integration
 
Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptx
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
CS3270 - DATABASE SYSTEM - Lecture (1)
CS3270 - DATABASE SYSTEM -  Lecture (1)CS3270 - DATABASE SYSTEM -  Lecture (1)
CS3270 - DATABASE SYSTEM - Lecture (1)
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
 
2015 09 emc lsug
2015 09 emc lsug2015 09 emc lsug
2015 09 emc lsug
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas Workshop
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 

Mais de Maori Ito

42nd MTG in NIBIO
42nd MTG in NIBIO42nd MTG in NIBIO
42nd MTG in NIBIOMaori Ito
 
41st MTG in NIBIO
41st MTG in NIBIO41st MTG in NIBIO
41st MTG in NIBIOMaori Ito
 
40th MTG in NIBIO
40th MTG in NIBIO40th MTG in NIBIO
40th MTG in NIBIOMaori Ito
 
39th MTG in NIBIO
39th MTG in NIBIO39th MTG in NIBIO
39th MTG in NIBIOMaori Ito
 
Test slide for the lab - Target prioritization
Test slide for the lab - Target prioritization Test slide for the lab - Target prioritization
Test slide for the lab - Target prioritization Maori Ito
 
Test for lab_j Psiver j
Test for lab_j Psiver jTest for lab_j Psiver j
Test for lab_j Psiver jMaori Ito
 
38th MTG in NIBIO
38th MTG in NIBIO38th MTG in NIBIO
38th MTG in NIBIOMaori Ito
 
37th mtg in NIBIO
37th mtg in NIBIO37th mtg in NIBIO
37th mtg in NIBIOMaori Ito
 
36th mtg in NIBIO
 36th mtg in NIBIO 36th mtg in NIBIO
36th mtg in NIBIOMaori Ito
 
35th mtg in NIBIO
35th mtg in NIBIO35th mtg in NIBIO
35th mtg in NIBIOMaori Ito
 
34th mtg in NIBIO
34th mtg in NIBIO34th mtg in NIBIO
34th mtg in NIBIOMaori Ito
 
33rd MTG In NIBIO
33rd MTG In NIBIO33rd MTG In NIBIO
33rd MTG In NIBIOMaori Ito
 
32nd MTG in NIBIO
32nd MTG in NIBIO32nd MTG in NIBIO
32nd MTG in NIBIOMaori Ito
 
31st Integrated DB MTG in NIBIO
31st Integrated DB MTG in NIBIO31st Integrated DB MTG in NIBIO
31st Integrated DB MTG in NIBIOMaori Ito
 
30th Integrated DB MTG in NIBIO
30th Integrated DB MTG in NIBIO30th Integrated DB MTG in NIBIO
30th Integrated DB MTG in NIBIOMaori Ito
 
29th Integrated DB MTG in NIBIO
29th Integrated DB MTG in NIBIO29th Integrated DB MTG in NIBIO
29th Integrated DB MTG in NIBIOMaori Ito
 
Bh13.13 sagace 1
Bh13.13 sagace 1Bh13.13 sagace 1
Bh13.13 sagace 1Maori Ito
 

Mais de Maori Ito (20)

42nd MTG in NIBIO
42nd MTG in NIBIO42nd MTG in NIBIO
42nd MTG in NIBIO
 
41st MTG in NIBIO
41st MTG in NIBIO41st MTG in NIBIO
41st MTG in NIBIO
 
40th MTG in NIBIO
40th MTG in NIBIO40th MTG in NIBIO
40th MTG in NIBIO
 
39th MTG in NIBIO
39th MTG in NIBIO39th MTG in NIBIO
39th MTG in NIBIO
 
Test slide for the lab - Target prioritization
Test slide for the lab - Target prioritization Test slide for the lab - Target prioritization
Test slide for the lab - Target prioritization
 
Test for lab_j Psiver j
Test for lab_j Psiver jTest for lab_j Psiver j
Test for lab_j Psiver j
 
Psiver j
Psiver jPsiver j
Psiver j
 
38th MTG in NIBIO
38th MTG in NIBIO38th MTG in NIBIO
38th MTG in NIBIO
 
37th mtg in NIBIO
37th mtg in NIBIO37th mtg in NIBIO
37th mtg in NIBIO
 
36th mtg in NIBIO
 36th mtg in NIBIO 36th mtg in NIBIO
36th mtg in NIBIO
 
35th mtg in NIBIO
35th mtg in NIBIO35th mtg in NIBIO
35th mtg in NIBIO
 
34th mtg in NIBIO
34th mtg in NIBIO34th mtg in NIBIO
34th mtg in NIBIO
 
33rd MTG In NIBIO
33rd MTG In NIBIO33rd MTG In NIBIO
33rd MTG In NIBIO
 
32nd MTG in NIBIO
32nd MTG in NIBIO32nd MTG in NIBIO
32nd MTG in NIBIO
 
31st Integrated DB MTG in NIBIO
31st Integrated DB MTG in NIBIO31st Integrated DB MTG in NIBIO
31st Integrated DB MTG in NIBIO
 
30th Integrated DB MTG in NIBIO
30th Integrated DB MTG in NIBIO30th Integrated DB MTG in NIBIO
30th Integrated DB MTG in NIBIO
 
29th Integrated DB MTG in NIBIO
29th Integrated DB MTG in NIBIO29th Integrated DB MTG in NIBIO
29th Integrated DB MTG in NIBIO
 
Bh13.13 sagace 1
Bh13.13 sagace 1Bh13.13 sagace 1
Bh13.13 sagace 1
 
28th mtg
28th mtg28th mtg
28th mtg
 
27th mtg 1
27th mtg 127th mtg 1
27th mtg 1
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Life Science Database Cross Search and Metadata

  • 1. Maori Ito @ NIBIO Life Science Database Cross Search and Metadata
  • 2. Database integrate collaboration among 4 ministries with NBDC • Database Catalog • Life Science Database Cross Search • Life Science Database Archive • Database Reconstructive Integration
  • 3. Why Cross Search? • Easy to use • Accustomed to use • Appropriate for comparing various kinds of databases
  • 4. Sagace • Search for Biomedical Data & Resources in Japan
  • 5. Bad Skeptical Reputations for Search Results… • Useless… • Slow…. • What is the advantage?
  • 6. hat is the most Importa thing in cross search ?
  • 8. Mechanism of Search Engine 1. Crawling 2. Indexing 3. Query Processing 4. Scoring
  • 9. Crawling • Crawl databases and pages by program Program
  • 10. Indexing • Split data convenient size and store own server External Data Internal Server
  • 12. In case of Hyper Estraier (Search System) NIBIO AgriTogo Collaborate by using P2P NBDC / DBCLS MEDALS architecture Under Comtemplation JCGGDB 12
  • 13. Back to the simple answers to improvement • Speed (Thanks to Johan-san ,Mizuguchi-san and many collaborators) 1. Relax limits on access of DBCLS (Use a liggle ingenuity in css and images) • Accuracy NIBIO NBDC / DBCLS
  • 14. How to improve accuracy? • What is accuracy for life science database cross search? • What is accuracy for life science specialist?
  • 15. • In general, developers emphasize search algorithms and scorings. • However, general results and methods for cross search may not suitable for life science specialists..? • Data (Index files) from life science databases are sometimes difficult to understand immediately. • It’s hard to make each crawler program for each database and maintenance it. • (We have no extra …. to make proper search page like entrez et al….)
  • 16. To Improve Accuracy • Manually select Databases • Assigned weights to crawled databases for improving the ranking system
  • 17. Metadata! • One way to solve these problems Difficult to understand data immediately
  • 18. If metadata are added data… Data Metadata Disease:Epithelial adenoma Species:Mouse Keywords:DNA sequence Last Modified:2013-01-19
  • 19. Easy to understand for users • It can be a guide to improve user experience. Image
  • 20. Easy to understand for crawlers Metadata Disease:Epithelial adenoma Species:Mouse Keywords:DNA sequence Last Modified:2013-01-19
  • 21. How to use it? • Mark up data by microdata like a tag Image Title ID Last Modified http://www.pdbj.org/emnavi/emnavi_detail.php?id=1556&lang=en
  • 22. Is it a practical suggestion? • Google, Yahoo! and Bing decided to use microdata to show search results more valuable. • Some vocabularies have already applied to search results. • E.g.
  • 23. Schema.org • Provide a collection of schemas (htm tags) • Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. (quoted by schema.org) • We proposed “schema.org” extensions for “BiologicalDatabaseEntry” and “Biological Database”. • Schema.org proposals : http://www.w3.org/wiki/WebSchemas/SchemaDot OrgProposals
  • 24. Properties for BiologicalDatabaseEntry entryID additionalType dateCreated isEntryof description dateModified taxon image keywords seeAlso url provider reference alternativeHeadline breadcrumb name inLanguage
  • 25. Related Link for our proposal • WebSchemas proposal ‘Biological Databases’ for schema.org – http://www.w3.org/wiki/WebSchemas/BioData bases • Discussions at BioHackathon – https://github.com/dbcls/bh12/wiki/Schema.org -extension • Discussions at BH12.12 (Japanese only) – http://wiki.lifesciencedb.jp/mw/index.php/BH12 .12/schema.org
  • 26. How to markup ? Declaration <div itemscope itemtype=“http://schema.org/BiologicalDatabaseEntry”> ID <span itemprop="entryID">1556</span> Specied <span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"> <span itemprop="name">Bacillus subtilis</span> </span> Deposition: <span itemprop="dateCreated">2008-09-08</span> Last update: <span itemprop="dateModified">2012-10-24</span> </div> Specify Property and markup with normal tag
  • 27. And then • Crawl these microdata At Present • Reflect Search Results Image Within the fiscal year (Preparation to reflect)
  • 28. Ask for your help • If this approach have some efforts, there are may be chances to reflect major search engines. • Please markup your own site or database and give me feedback. • If you have any suggestions or comments, please let me know.
  • 29. Future Perspective • Focus on Accuracy continuously • Microdata – Discuss many scientists and finalize the proposal of schema.org extension – Boost numbers of databases – Make support tools to mark up microdata • Add appropriate data from high-quality databases