5. Fahrplan
• Eingebette strukturierte Daten auf
Webseiten
• Daten in Content-Management-
Systemen
• Resultate einer Analyse strukturierter
Daten im Web
2
6. Eine typische
Webseite...
My name is Horst Mustermann, here is my home page:
<a href="http://www.horst.example">
www.horst.example</a>. I live in Berlin, where I
work as a Researcher at Freie Universität Berlin.
3
7. ... mit impliziten
Informationen
Name Webseite
Person
Ort Titel Organisation
4
9. Beispiel: Microdata +
schema.org
<div itemscope itemtype="http://data-
vocabulary.org/Person">
My name is <span itemprop="name">Horst
Mustermann</span>, here is my homepage:
<a href="http://www.horst.example"
itemprop="url">www.horst.example</a>.
I live in
<span itemprop="address" itemscope
itemtype="http://data-vocabulary.org/Address">
<span itemprop="locality">Berlin</span>
</span>
where I work as a <span
itemprop="title">Researcher</span> at <span
itemprop="affiliation">Freie Universität Berlin
</span>.
</div>
6
10. Beispiel: Microdata +
schema.orgStrukturierte
Daten!
Item
Type = http://data-vocabulary.org/person
name = Horst Mustermann
title = Researcher
affiliation = Freie Universität Berlin
url
text = www.horst.example
href = http://www.horst.example/
address = Item
Type = http://data-vocabulary.org/address
locality = Berlin
7
13. Semantik auf
Knopfdruck?
Semantik Typen z.B.
Spezialisierte
Inhalt CMS / Produkte
Manuell Qualität
Metadaten Alle CMS Autor
Seitenstruktu
Alle CMS Navigation
r
10
34. Zusammenfassung
• Spezialisierte CMS bieten ideale
Voraussetzungen für die Publikation
strukturierter Daten (“Knopfdruck”)
• Strukturierte Daten aus CM-Systemen
eingebettet in HTML-Seiten weit
verbreitet
19
35. Zusammenfassung
• Spezialisierte CMS bieten ideale
Voraussetzungen für die Publikation
strukturierter Daten (“Knopfdruck”)
• Strukturierte Daten aus CM-Systemen
eingebettet in HTML-Seiten weit
verbreitet
• Bisher begrenzter Nutzungsbereich
19
36. Vielen Dank für Ihre
Aufmerksamkeit!
Fragen?
Twitter: @hfmuehleisen
Web: http://webdatacommons.org
http://hannes.muehleisen.org
Notas do Editor
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
First: Our UseCase is not suitable to Hadoop, so EMR is out, since it was too slow\nInput data split in 100 MB parts, yes\n- EC2 c1.xlarge instances: 8 CPUs, current spot price : ca. 0.17 EUR per hour (most of the time)\n- so, with 100 instances we get 800 CPUs and we could expect to do it in around three days for around 1000 EUR (BIG WIN!) and in 55 hours!\n\n\n
First: Our UseCase is not suitable to Hadoop, so EMR is out, since it was too slow\nInput data split in 100 MB parts, yes\n- EC2 c1.xlarge instances: 8 CPUs, current spot price : ca. 0.17 EUR per hour (most of the time)\n- so, with 100 instances we get 800 CPUs and we could expect to do it in around three days for around 1000 EUR (BIG WIN!) and in 55 hours!\n\n\n
First: Our UseCase is not suitable to Hadoop, so EMR is out, since it was too slow\nInput data split in 100 MB parts, yes\n- EC2 c1.xlarge instances: 8 CPUs, current spot price : ca. 0.17 EUR per hour (most of the time)\n- so, with 100 instances we get 800 CPUs and we could expect to do it in around three days for around 1000 EUR (BIG WIN!) and in 55 hours!\n\n\n
First: Our UseCase is not suitable to Hadoop, so EMR is out, since it was too slow\nInput data split in 100 MB parts, yes\n- EC2 c1.xlarge instances: 8 CPUs, current spot price : ca. 0.17 EUR per hour (most of the time)\n- so, with 100 instances we get 800 CPUs and we could expect to do it in around three days for around 1000 EUR (BIG WIN!) and in 55 hours!\n\n\n