SWAT4LS Wikidata tutorial cambridge dec 2015

My 30 minutes of our tutorial on how to use Wikidata for biomedical data and beyond.

  1. 1. Cooking the big soup https://commons.wikimedia.org/wiki/File:Wikidata-logo-en.svg Sebastian Burgstaller-Muehlbacher
  2. 2. Introduction ● Single value edits are simple, due to the web interface of Wikidata. ● How to easily mass import data into Wikidata? ● Answer: Use Bots! ● Combine Wikidata API and query endpoints. ● Python as preferred language.
  3. 3. PBB_core Resource specific code Auxiliary classes PBB_core Data silo -Get data from silo -Clean data -Make silo to Wikidata mapping -Take mapped data -Lookup WD if item already exists -Throw exception if inconsistencies occur -Construct or modify a WD item JSON object -Provide logging capabilities -Provide WD login infrastructure -Provide settings 1. 2. 3. 4. 1. Get data and map to WD 2. Login to WD 3. Provide PBB_core with data 4. Request write to WD
  4. 4. What does an item look like, really? https://goo.gl/Ndbcd4 https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q423111
  5. 5. A Minimal Bot
  6. 6. A Minimal Bot for Mass Data Import
  7. 7. Advantages of PBB_core ● One interface to Wikidata for (your) bots! ● Fast development and deployment of new bots. ● Integrates Wikidata querying and writing. ● Prevents creation of duplicate items. ● Searches for duplicate use of identifiers. ● Compatible to Python 2 and Python 3. ● Execute queries with SPARQL or WDQ. ● Minimizes HTTP traffic, increases throughput.
  8. 8. All Wikidata data types ● All current Wikidata data types have been implemented. – PBB_core.WDString – PBB_core.WDItemID – PBB_core.WDMonolingualText – PBB_core.WDProperty – PBB_core.WDQuantity – PBB_core.WDTime – PBB_core.WDUrl – PBB_core.WDGlobeCoordinate – PBB_core.WDCommonsMedia
  9. 9. Conclusions ● Mass data imports require scripts/aka bots ● Our solution: PBB_core – Python framework for reading from and writing to Wikidata – Implementing all Wikidata data types – Implementing consistency checks of data to be written. – Get it from: https://bitbucket.org/sulab/wikidatabots/src
  10. 10. Let's hack Wikidata!! culturedigitally.org