SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
DBpedia ♥ Commons 
Gaurav Vaidya - Dimitris Kontokostas - Andrea Di Menna - Jim O'Regan 
2nd DBpedia Meeting Leipzig 03.09.2014
~23M pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
~23M pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
A lot of pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
Many pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
Not very similar to pages like this 
2nd DBpedia Meeting Leipzig 03.09.2014
DBpedia Extraction Framework 
2nd DBpedia Meeting Leipzig 03.09.2014 
✔ “Wiki agnostic” 
✔ Pluggable 
extractors 
✔ Out of the box 
support for 
common 
metadata 
✗ Tuned for extraction in the main namespace (not File:) 
✗ Many other challenges left
2nd DBpedia Meeting Leipzig 03.09.2014 
Challenges 
✔ File metadata 
✔ KML files 
✔ Image Galleries 
✔ Image Annotations 
✔ Mappings Wiki 
✔ Bootstrap community mappings 
✔ Template Statistics 
✔ Licensing 
✔ Technical details I'll not go into
Out-of-the-box support 
2nd DBpedia Meeting Leipzig 03.09.2014 
● Categories (skos) 
● External links 
● Geo-coordinates 
● Raw infobox properties 
● Labels 
● PageIds / Revisions 
● Links (internal / external) 
● Mappings Wiki (with some tweaking / more on that later)
2nd DBpedia Meeting Leipzig 03.09.2014 
File metadata 
● New Extractor 
● New file Class hierarchy 
– dbo:File, dbo:Image, dbo:StillImage, dbo:MovingImage and 
dbo:Sound 
Sample Output: 
:Aeropetes.JPG a dbo:StillImage, dbo:Image, dbo:Document, dbo:File, Work; 
dcterms:type dbo:StillImage 
dbo:fileExtension "jpg" 
dcterms:format "image/jpeg" 
dbo:fileURL commons-path:Aeropetes.JPG ; 
foaf:depiction commons-path:Aeropetes.JPG ; 
dbo:thumbnail commons-path:Aeropetes.JPG?width=300 .
2nd DBpedia Meeting Leipzig 03.09.2014 
Image Galleries 
● Attach each gallery 
item to the page 
resource 
:Colorado dbo:hasGalleryItem 
Colorado.JPG, 
Denver_Colorado_Art.jpg, 
ColoradoCenter1.jpg.
Image Annotations 
2nd DBpedia Meeting Leipzig 03.09.2014 
● Annotation 
Gadget 
● Boxes with 
optional 
description
Image Annotations 
● W3 Media Fragments recommendation 
● Embed the box in the URI 
– ?width=15130&height=1886#xywh=pixel:10431,324,1670,1208> . 
● Add descriptions in the new resource 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Mappings Wiki
Template Statistics 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Licensing 
● Identified & imported automatically ~360 licence templates 
● Use the mappings wiki 
● Needed some hacking to make it work 
– e.g. {{Self|GFDL|cc-by-sa-3.0,2.5,2.0,1.0}} 
:Acraea_circeis.JPG dbo:license 
<http://creativecommons.org/publicdomain/mark/1.0/> 
:Antepipona_deflenda_-_2012-10-17.webm dbo:license < 
http://creativecommons.org/licenses/by-sa/3.0/ >
KML Annotations attached to media 
Attach raw KML data to resource with custom extractor 
Sample Output: 
:Yellowstone_1871b.jpg dbo:hasKMLData “”” 
?xml version=1.0 encoding=UTF-8?> 
<kml xmlns=http://earth.google.com/kml/2.2”> 
<GroundOverlay> 
<name>Yorktown, Indiana (1878)</name> 
<description>An 1878 map of Yorktown in Tippecanoe County, Indiana. Source: Kingman 
Brothers&apos; Combination Atlas Map of Tippecanoe County, Indiana, 1878.</description> 
<color>99ffffff</color><Icon><href>BIG_LINK_HERE</href> 
<viewBoundScale>0.75</viewBoundScale></Icon> 
<LatLonBox> 
<north>40.26126145890567</north><south>40.25777915632657</south> 
<east>-86.77033439383223</east><west>-86.77398493316619</west> 
<rotation>-1.123009884936565</rotation></LatLonBox> 
</GroundOverlay></kml>“”"^^rdfs:XMLLiteral . 
2nd DBpedia Meeting Leipzig 03.09.2014
2nd DBpedia Meeting Leipzig 03.09.2014 
Left TODOs 
● Nested templates are commonly used and cannot be handled 
by the mappings wiki atm 
– e.g. Media descriptions (although mapped) are missing 
{{Information |Description= {{en|Logo of the [[w:en:DBpedia|DBpedia project]]}} {{fr| 
Logo du projet [[w:fr:DBpedia|DBpedia]]}} 
● Annotation descriptions need some tweaking 
– Need to render wikitext 
● Put it under a SPARQL Endpoint 
● Provide Linked Data 
– http://commons.dbpedia.org
2nd DBpedia Meeting Leipzig 03.09.2014 
Thank You! 
Special thanks to: 
● Alexandru Todor (importing the License templates) 
● Google Summer of Code for sponsoring this project 
(Gaurav Vaidya) 
Questions? 
Dataset: http://nl.dbpedia.org/downloads/commonswiki 
Dataset samples: https://github.com/gaurav/commons-extraction

Mais conteúdo relacionado

Mais procurados

Mais procurados (6)

Societal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending ComparisonSocietal Challenge 6: Social Sciences - Spending Comparison
Societal Challenge 6: Social Sciences - Spending Comparison
 
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
Atmosphere 2018: Wojciech Krysmann- INFRA AS CODE - TERRAFORM DEEP DIVE AND B...
 
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
BDE_SC4_WS3_6_Luigi Selmi - Pilot SC4
 
Doing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOceanDoing E-commerce Right – Magento on DigitalOcean
Doing E-commerce Right – Magento on DigitalOcean
 
BDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architectureBDE SC4 Hangout - Hajira Jabeen, general architecture
BDE SC4 Hangout - Hajira Jabeen, general architecture
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors
 

Destaque

Destaque (9)

DBpedia past, present & future
DBpedia past, present & futureDBpedia past, present & future
DBpedia past, present & future
 
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use case
 
DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in Dublin
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
 

Semelhante a DBpedia ♥ Commons

Semelhante a DBpedia ♥ Commons (20)

The DBpedia databus
The DBpedia databusThe DBpedia databus
The DBpedia databus
 
Azure Nights August2017
Azure Nights August2017Azure Nights August2017
Azure Nights August2017
 
Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)Recent c++ goodies (March 2018)
Recent c++ goodies (March 2018)
 
2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides2.28.17 Introducing DSpace 7 Webinar Slides
2.28.17 Introducing DSpace 7 Webinar Slides
 
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
Container-as-a-Service – Plattformunabhängige Datenbankbereitstellung in der ...
 
Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
 
Categorizing Docker Hub Public Images
Categorizing Docker Hub Public ImagesCategorizing Docker Hub Public Images
Categorizing Docker Hub Public Images
 
Bring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In ProductionBring Your Own Container: Using Docker Images In Production
Bring Your Own Container: Using Docker Images In Production
 
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
Large Scale Vandalism Detection in Knowledge Bases: PyData Berlin 2017
 
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data PersistenceFIWARE Wednesday Webinars - Strategies for Context Data Persistence
FIWARE Wednesday Webinars - Strategies for Context Data Persistence
 
Modern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро ТарасенкоModern database in browsers, Дмитро Тарасенко
Modern database in browsers, Дмитро Тарасенко
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
 
Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1Code for Startup MVP (Ruby on Rails) Session 1
Code for Startup MVP (Ruby on Rails) Session 1
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
IWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologiesIWMW 1998: Deploying new web technologies
IWMW 1998: Deploying new web technologies
 
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)Scaling and hardware provisioning for databases (lessons learned at wikipedia)
Scaling and hardware provisioning for databases (lessons learned at wikipedia)
 
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_CloudKoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
KoprowskiT-Difinify2017-SQL_ServerBackup_In_The_Cloud
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
 
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
Modernising your Applications on AWS: AWS SDKs and Application Web Services –...
 
Unicon June 2014 IAM Briefing
Unicon June 2014 IAM BriefingUnicon June 2014 IAM Briefing
Unicon June 2014 IAM Briefing
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

DBpedia ♥ Commons

  • 1. DBpedia ♥ Commons Gaurav Vaidya - Dimitris Kontokostas - Andrea Di Menna - Jim O'Regan 2nd DBpedia Meeting Leipzig 03.09.2014
  • 2. ~23M pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 3. ~23M pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 4. A lot of pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 5. Many pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 6. Not very similar to pages like this 2nd DBpedia Meeting Leipzig 03.09.2014
  • 7. DBpedia Extraction Framework 2nd DBpedia Meeting Leipzig 03.09.2014 ✔ “Wiki agnostic” ✔ Pluggable extractors ✔ Out of the box support for common metadata ✗ Tuned for extraction in the main namespace (not File:) ✗ Many other challenges left
  • 8. 2nd DBpedia Meeting Leipzig 03.09.2014 Challenges ✔ File metadata ✔ KML files ✔ Image Galleries ✔ Image Annotations ✔ Mappings Wiki ✔ Bootstrap community mappings ✔ Template Statistics ✔ Licensing ✔ Technical details I'll not go into
  • 9. Out-of-the-box support 2nd DBpedia Meeting Leipzig 03.09.2014 ● Categories (skos) ● External links ● Geo-coordinates ● Raw infobox properties ● Labels ● PageIds / Revisions ● Links (internal / external) ● Mappings Wiki (with some tweaking / more on that later)
  • 10. 2nd DBpedia Meeting Leipzig 03.09.2014 File metadata ● New Extractor ● New file Class hierarchy – dbo:File, dbo:Image, dbo:StillImage, dbo:MovingImage and dbo:Sound Sample Output: :Aeropetes.JPG a dbo:StillImage, dbo:Image, dbo:Document, dbo:File, Work; dcterms:type dbo:StillImage dbo:fileExtension "jpg" dcterms:format "image/jpeg" dbo:fileURL commons-path:Aeropetes.JPG ; foaf:depiction commons-path:Aeropetes.JPG ; dbo:thumbnail commons-path:Aeropetes.JPG?width=300 .
  • 11. 2nd DBpedia Meeting Leipzig 03.09.2014 Image Galleries ● Attach each gallery item to the page resource :Colorado dbo:hasGalleryItem Colorado.JPG, Denver_Colorado_Art.jpg, ColoradoCenter1.jpg.
  • 12. Image Annotations 2nd DBpedia Meeting Leipzig 03.09.2014 ● Annotation Gadget ● Boxes with optional description
  • 13. Image Annotations ● W3 Media Fragments recommendation ● Embed the box in the URI – ?width=15130&height=1886#xywh=pixel:10431,324,1670,1208> . ● Add descriptions in the new resource 2nd DBpedia Meeting Leipzig 03.09.2014
  • 14. 2nd DBpedia Meeting Leipzig 03.09.2014 Mappings Wiki
  • 15. Template Statistics 2nd DBpedia Meeting Leipzig 03.09.2014
  • 16. 2nd DBpedia Meeting Leipzig 03.09.2014 Licensing ● Identified & imported automatically ~360 licence templates ● Use the mappings wiki ● Needed some hacking to make it work – e.g. {{Self|GFDL|cc-by-sa-3.0,2.5,2.0,1.0}} :Acraea_circeis.JPG dbo:license <http://creativecommons.org/publicdomain/mark/1.0/> :Antepipona_deflenda_-_2012-10-17.webm dbo:license < http://creativecommons.org/licenses/by-sa/3.0/ >
  • 17. KML Annotations attached to media Attach raw KML data to resource with custom extractor Sample Output: :Yellowstone_1871b.jpg dbo:hasKMLData “”” ?xml version=1.0 encoding=UTF-8?> <kml xmlns=http://earth.google.com/kml/2.2”> <GroundOverlay> <name>Yorktown, Indiana (1878)</name> <description>An 1878 map of Yorktown in Tippecanoe County, Indiana. Source: Kingman Brothers&apos; Combination Atlas Map of Tippecanoe County, Indiana, 1878.</description> <color>99ffffff</color><Icon><href>BIG_LINK_HERE</href> <viewBoundScale>0.75</viewBoundScale></Icon> <LatLonBox> <north>40.26126145890567</north><south>40.25777915632657</south> <east>-86.77033439383223</east><west>-86.77398493316619</west> <rotation>-1.123009884936565</rotation></LatLonBox> </GroundOverlay></kml>“”"^^rdfs:XMLLiteral . 2nd DBpedia Meeting Leipzig 03.09.2014
  • 18. 2nd DBpedia Meeting Leipzig 03.09.2014 Left TODOs ● Nested templates are commonly used and cannot be handled by the mappings wiki atm – e.g. Media descriptions (although mapped) are missing {{Information |Description= {{en|Logo of the [[w:en:DBpedia|DBpedia project]]}} {{fr| Logo du projet [[w:fr:DBpedia|DBpedia]]}} ● Annotation descriptions need some tweaking – Need to render wikitext ● Put it under a SPARQL Endpoint ● Provide Linked Data – http://commons.dbpedia.org
  • 19. 2nd DBpedia Meeting Leipzig 03.09.2014 Thank You! Special thanks to: ● Alexandru Todor (importing the License templates) ● Google Summer of Code for sponsoring this project (Gaurav Vaidya) Questions? Dataset: http://nl.dbpedia.org/downloads/commonswiki Dataset samples: https://github.com/gaurav/commons-extraction