2. WEB AT-A-GLANCE
25 billion web pages in the indexable web 1
1 trillion unique URLs discovered by Google 2
109.5 million web sites 3
2 billion users
1000x more sites in the “deep web” 4
1 Worldwidewebsize.com, March 2009
2 Google Official Blog, July 2008
3 Name Intelligence, May 2009
4 BrightPlanet, November 2010
Data via Sco Brinker, h p://www.slideshare.net/sjbrinker/semantic-web-summit
4. 2020: 25 ze abytes
digital data online
2002: 5 exabytes of
data online (total)
2010: 21 exabytes of
data flow monthly
2000 2005 2010 2020
2015: 10 ze abytes
2008: 5 exabytes of digital data online
data flow monthly
15. <html
xmlns:og="h^p://ogp.me/ns#">
<head>
<meta
property="og:Xtle"
content="ImporXka
–
Thermometer,
THERM(W/Clip)"
/>
<meta
property="og:type"
content="product"
/>
<meta
property="og:url"
content="h^p://www.bestbuy.com/products/ImporXka
+Thermometer/9972587"
/>
<meta
property="og:image"
content="h^p://images.bestbuy.com/BestBuy_US/
images/products/9972587_sb.jpg"
/>
<meta
property="og:site_name"
content="Best
Buy"
/>
<meta
property="og:descripXon"
content="This
thermometer
features
a
large
dial
for
easy
readability
and
a
clip
to
keep
it
in
place
when
not
in
use"
/>
</head>
Human-readable
Semantics
18. Simple form/ Basic transform Human & machine
user input engine readable data
RDFa
Human-readable
Semantics
19. RICH DATA EXTRACT
www.bestbuy.com/products/availability;zip=55428;upc=0036725233539;
Offer
includesObject
=
Samsung
40"
Class
LCD
hasStockKeepingUnit
=
9791235
hasMPN
=
LN40C630K1F
hasEan_UCC-‐13
=
0036725233539
amountOfThisGood
=
1.0
availableAtOrFrom
label
=
Best
Buy
Brooklyn
Center
MN
street-‐address
=
5925
Earle
Brown
Dr
locality
=
Brooklyn
Center
region
=
MN
postal-‐code
=
55430
20. RAW DATA IS PLENTIFUL
500 Million Facebook users 1
190 Million Twitter users 2
65 Million tweets per day 3
4 Million Foursquare users 4
Customer forums
APIs
Internal sales/ customer data
Product data
And more!
1 Mark Zuckerberg, July 2010
2 Techcrunch, July 2010
3 Twi er blog, June 2009
4 Business Insider, October 2010 Machine
Data via Sco Brinker Semantics
21. BBY US @BestBuy BBY UK
BBY US BBY US BBY UK
Local Twi er Customer
Facebook Customer Facebook
Stores annot. BBY UK Insights
Insights
Employee Carphone
Reward Insights Warehouse
BBY US Zone @twelp-
force Twi er BBY UK
Products
Best Buy annot. Site
Mobile @BestBuy Best Buy Analytics
UK UK
Twi er
BBY UK
BBY QR
m.bestbuy Products
BBY US Best Buy
Code .com
Employee US
Data Insights
BBY UK
Site
Analytics
BBY
Mobile BBY US
Site
Apps
Analytics
Geek
Best Buy
Squad Global BBY CN
BBY US Best Buy Site
Mobile App
Magnolia Pacific
Graph China Analytics
Data
Sales
BBY CA
BBY CA Employee
Insights BBY CN
Local Five Star
Products
Stores BBY MX Products
Site
Analytics
BBY CA
Customer Best Buy Best Buy
BBY TK
Insights
Canada BBY CA Mexico BBY MX Products
Customer Products Best Buy
Insights Turkey
BBY CA BBY MX
Products BBY CA BBY MX
Customer BBY TK
Site BBY MX Employee BBY TK
@BestBuy Insights Site
Analytics Local Insights Employee
CA Analytics
Stores Insights
Twi er
22. SPARQL
Global Graph select distinct ?o as ?uri, bif:sprintf("%.2f",?p2) as ?price, ?
currency, ?text, ?label, ?thumb, ?ean, ?order_link where
of data {
?s1 rdfs:comment ?text .
?text bif:contains ’”Netbook”’.
23. SPARQL
Global Graph select distinct ?o as ?uri, bif:sprintf("%.2f",?p2) as ?price, ?
currency, ?text, ?label, ?thumb, ?ean, ?order_link where
of data {
?s1 rdfs:comment ?text .
?text bif:contains ’”LCD TV”’.