SlideShare uma empresa Scribd logo
1 de 25
www.ict.tuwien.ac.at
Institute of
Computer Technology
Extracting Data from the Deep Web with Global-as-View
Mediators Using Rule-Enriched Semantic Annotations
Harold Boley
harold.boley[at]unb.ca
University of New Brunswick
Faculty of Computer Science
Fredericton, NB, Canada
Benjamin Dönz
doenz[at]ict.tuwien.ac.at
Vienna University of Technology
Institute of Computer Technology
Vienna, Austria
www.ict.tuwien.ac.at
Institute of
Computer Technology
The „Deep Web“ – What is it?
2
 Data hidden behind search forms and interfaces
 Estimated 400-500 times more information than the indexable
World Wide Web
 77% of the content classified as structured information
 Template based – so understanding how to extract one result
allows to extract them all
 Examples:
• Web shops
• Classified advertising
• Miscellaneous databases
www.ict.tuwien.ac.at
Institute of
Computer Technology
Accessing the Deep Web
3
www.ict.tuwien.ac.at
Institute of
Computer Technology
Our Approach: Upper-Right Quadrant
4
 Processing of queries using a query forwarding approach
• SPARQL queries as input
• Query transformation and forwarding via mediators
• Global-as-View mapping of local sources
 Web form interaction and information extraction
• Extraction process based on an extensible model
• Semantic annotations for mapping real-world Web pages to the model
• Feature-based rules for creating annotations
www.ict.tuwien.ac.at
Institute of
Computer Technology
Model Overview
5
 Query interface for submitting conjunctive queries
www.ict.tuwien.ac.at
Institute of
Computer Technology
Model Overview
6
 Query interface for submitting conjunctive queries
 Result list: all valid records, but only key attribute/value pairs
www.ict.tuwien.ac.at
Institute of
Computer Technology
Model Overview
7
 Query interface for submitting conjunctive queries
 Result list: all valid records, but only key attribute/value pairs
 Result detail: all attribute/value pairs, but only one record
www.ict.tuwien.ac.at
Institute of
Computer Technology
Query Process
8
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (1)
9
SELECT ? townname ?offername ?description ?realestaterent ?realestaterooms ?realestatefloorSpace
WHERE {?object rdf:type realestate:RealEstateOffer;
realestate:townname ?townname;
realestate:offername ?offername;
realestate:description ?description;
realestate:rent ?realestaterent;
realestate:rooms ?realestaterooms;
realestate:floorSpace ?realestatefloorSpace}.
FILTER (?realestaterent>=800 && ?realestaterent<=1200 &&
(?realestaterooms>=3 || ?realestatefloorSpace >= 80)) }
 Query:
“Return details of real estate offers with a rent between 800€ and 1200€ and
either at least 3 rooms or 80m²”
 SPARQL:
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (2)
10
 Transformation:
• Parse Query
• Transform filter to Disjunctive Normal Form and split into subqueries
• Unfold to include relevant sources via Global-as-View mappings
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (2)
11
 Transformation:
• Parse Query
• Transform filter to Disjunctive Normal Form and split into subqueries
• Unfold to include relevant sources via Global-as-View mappings
 Result
SELECT FROM <http://derStandard.at>: greaterOrEqual(realestate:rent,800) ∧
lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:rooms,3)
UNION
SELECT FROM <http://at.immolive24.com>: greaterOrEqual(realestate:rent,800) ∧
lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:rooms,3)
UNION
SELECT FROM <http://derStandard.at>: greaterOrEqual(realestate:rent,800) ∧
lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:floorSpace,100)
UNION
SELECT FROM <http://at.immolive24.com>: greaterOrEqual(realestate:rent,800) ∧
lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:floorSpace,100)
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (3)
12
 Features used for identification elements
 “Properties” of the HTML tags
 For example id, class, value, tag path, associated text, …
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (4)
13
 Object-centered Datalog rules
 Conditions: conjunction of req. features
 Conclusions: concepts of the model
 Single efficient evaluation pass
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (5)
14
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (6)
15
www.ict.tuwien.ac.at
Institute of
Computer Technology
Walkthrough (7)
16
SELECT ? townname ?offername ?description ?realestaterent ?realestaterooms ?realestatefloorSpace
WHERE {?object rdf:type realestate:RealEstateOffer;
realestate:townname ?townname;
realestate:offername ?offername;
realestate:description ?description;
realestate:rent ?realestaterent;
realestate:rooms ?realestaterooms;
realestate:floorSpace ?realestatefloorSpace}.
FILTER (?realestaterent>=800 && ?realestaterent<=1200 &&
(?realestaterooms>=3 || ?realestatefloorSpace >= 80)) }
 SPARQL
 Output
www.ict.tuwien.ac.at
Institute of
Computer Technology
Live Presentation of Example Use Cases
17
Deep Web Mediator and examples available online:
http://semann.bdoenz.com/default.aspx
#1 Plain list: Extract the average rent per town from a single site.
#2 Search and result list: Extract test results on cars of the brand Audi from a single site and return brand, model and the test
conclusion.
#3 Search, list and detail page: Extract real estate offers from a single site and return details for offers with 3 or more rooms and a
rent of 800€ to 1200€.
#4 Disjunctive query: Extract used car offers from a single site and return details of all offers for cars of the brand “Audi” that are
priced under 12.500€ if the construction year is after 2011 or under 15.000€ if the construction year is after 2012.
#5 Union: Extract used car offers from all available sites and return details of offers for cars of the brand “Audi” that are priced
under 12.500€ and have a construction year after 2011.
#6 Disjunctive union: Extract used car offers from all available sites and return details of all offers for cars of the brand “Audi” that
are priced under 12.500€ if the construction year is after 2011 or under 15.000€ if the construction year is after 2012.
#7 Relations between sources: Extract average rents and real estate offers from all available sites and return those that are
located in a specific town and have a lower rent/m² than the average for that town.
#8 Deep Web and local databases: Extract real estate offers from all available sites and add the type of town and population from a
local dataset.
#9 Deep Web and external databases: Extract real estate offers from all available sites and add a description of the town and the
population from an external SPARQL endpoint (dbPedia).
www.ict.tuwien.ac.at
Institute of
Computer Technology
Conclusion
18
 Processing of queries using a query forwarding approach
• SPARQL queries as input
• Query transformation and forwarding via mediators
• Global-as-View mapping of local sources
 Web form interaction and information extraction
• Extraction process based on an extensible model
• Semantic annotations for mapping real-world Web pages to the model
• Feature-based rules for creating annotations
www.ict.tuwien.ac.at
Institute of
Computer Technology
Extracting Data from the Deep Web with Global-as-View
Mediators Using Rule-Enriched Semantic Annotations
Harold Boley
harold.boley[at]unb.ca
University of New Brunswick
Faculty of Computer Science
Fredericton, NB, Canada
Benjamin Dönz
Doenz[at]ict.tuwien.ac.at
Vienna University of Technology
Institute of Computer Technology
Vienna, Austria
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #3
20
This example is situated in the domain of real estate, and asks for the name of the offer, a description, the
number of rooms, the floor space and the rent of all offers with 3 or more rooms and a rent in the range of
800€ and 1200€ from a specific real estate site. To process this query, the mediator accesses the query
interface of the site, sets the parameters in the fields of the web form and triggers the search function to
submit the query. The returned result lists are iterated to extract the values from the list itself, but also from
subpages by following the the corresponding link for each record. All extracted facts are collected in a
database and presented to the user in a tabular style including a link to the page where the offer was
found.
SELECT ?source ?realestatetownname ?realestateoffername ?realestatedescription ?
realestaterent ?realestaterooms ?realestatefloorSpace
FROM <http://derstandard.at/anzeiger/immoweb/Immobilien-suche.aspx>
WHERE {?object rdf:type <http://semannot.bdoenz.com/realestate#RealEstateOffer>.
OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#townname> ?realestatetownname}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#offername> ?realestateoffername}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#description> ?realestatedescription}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rent> ?realestaterent}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rooms> ?realestaterooms}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#floorSpace> ?realestatefloorSpace}.
FILTER (?realestaterent>=800 &&
?realestaterent<=1200 &&
?realestaterooms>=3
)
}
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #3
21
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #6
22
This example is situated in the domain of used cars and is the combination of the previous two examples:
The query requests the name, model, color, construction year, mileage and price of offers from a single
site, where the brand of the car is “Audi” and that are priced under 12.500€ if the construction year is after
2010, or under 15.000€ if the construction year is after 2011. This query is split into two conjunctive
subqueries and submit to all three available sites returning the union of a total of 6 queries.
SELECT DISTINCT ?source ?carsbrand ?carsmodel ?carsoffername ?carscolor ?carsmileage ?
carsconstructionYear ?carsofferprice
WHERE {?object rdf:type <http://semannot.bdoenz.com/cars#UsedCarOffer>.
OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#brand> ?carsbrand}.
?object <http://semannot.bdoenz.com/cars#brand> "Audi".
OPTIONAL {?object <http://semannot.bdoenz.com/cars#model> ?carsmodel}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#offername> ?carsoffername}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#color> ?carscolor}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#mileage> ?carsmileage}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#constructionYear> ?carsconstructionYear}.
OPTIONAL {?object <http://semannot.bdoenz.com/cars#offerprice> ?carsofferprice}.
FILTER (
(?carsconstructionYear>=2010 && ?carsofferprice<=12500)
||
(?carsconstructionYear>=2011 && ?carsofferprice<=15000)
)
}
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #6
23
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #7
24
This example is situated in the domain of real estate, and asks for the name of the offer, a description, the
number of rooms, the floor space and the rent of all offers with 3 or more rooms in the town of
"Klosterneuburg" where the rent is lower than the average rent per square meter for that town. No specific
site is referenced in the query, the mediator therefore includes all sites with real estate offers and also a s
site containig average rents. Each of these are accessed in turn collecting the intermediate results in a
database before applying the filter and returning the results. Intermediate results are only available after
the average rents have been extracted and can be compared to the offers in the defined manner. Note that
this type of query cannot be generated by the query wizard, but is entered directly as SPARQL
SELECT ?source ?realestatetownname ?realestateoffername ?realestaterent ?realestaterooms ?
realestatefloorSpace
WHERE {?object rdf:type <http://semannot.bdoenz.com/realestate#RealEstateOffer>.
OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#townname> ?realestatetownname}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#offername> ?realestateoffername}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rent> ?realestaterent}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rooms> ?realestaterooms}.
OPTIONAL {?object <http://semannot.bdoenz.com/realestate#floorSpace> ?realestatefloorSpace}.
?community rdf:type <http://semannot.bdoenz.com/realestate#Community>.
?community <http://semannot.bdoenz.com/realestate#averageRent> ?avrent.
?community <http://semannot.bdoenz.com/realestate#communityname> ?cname.
FILTER (REGEX(?realestatetownname,"Klosterneuburg","i") &&
?realestaterent>=800 && ?realestaterent<=1200 &&
?realestaterooms>=3 && REGEX(?cname,"Klosterneuburg","i") &&
?realestaterent/?realestatefloorSpace <(?avrent))}
www.ict.tuwien.ac.at
Institute of
Computer Technology
Use case #7
25

Mais conteúdo relacionado

Destaque

Terreno En Chriqui Con Mapak
Terreno En Chriqui Con MapakTerreno En Chriqui Con Mapak
Terreno En Chriqui Con MapakDalimagen
 
Hna Crescencia Perez
Hna Crescencia PerezHna Crescencia Perez
Hna Crescencia Perezclaegfmh
 
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTOBrooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTOÁggapBrasil
 
NET-Metrix-Audit Juni 2009
NET-Metrix-Audit Juni 2009NET-Metrix-Audit Juni 2009
NET-Metrix-Audit Juni 2009danieltschudi
 
Governanca patricipacao social e dialogo federativo
Governanca patricipacao social e dialogo federativoGovernanca patricipacao social e dialogo federativo
Governanca patricipacao social e dialogo federativoCogepp CEPAM
 
Cloud contractadministrationb
Cloud contractadministrationbCloud contractadministrationb
Cloud contractadministrationbIPSA
 
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"GDNÄ - Die Wissensgesellschaft
 
Masies de Sant Joan Despi
Masies de Sant Joan DespiMasies de Sant Joan Despi
Masies de Sant Joan DespiSeniorlab25
 
EasyVista Company presentation
EasyVista Company presentationEasyVista Company presentation
EasyVista Company presentationEasyVista
 
Guia de los derechos de los trabajadores
Guia de los derechos de los trabajadoresGuia de los derechos de los trabajadores
Guia de los derechos de los trabajadoresMiguel Simón
 
EETT clinica cruz blanca
EETT clinica cruz blancaEETT clinica cruz blanca
EETT clinica cruz blancaconstruline
 
The Essential Global Guide to Liqueurs-Presentation
The Essential Global Guide to Liqueurs-Presentation The Essential Global Guide to Liqueurs-Presentation
The Essential Global Guide to Liqueurs-Presentation Tales of the Cocktail
 
Benito Pérez Galdós – presentación por David Arroyo
Benito Pérez Galdós – presentación por David ArroyoBenito Pérez Galdós – presentación por David Arroyo
Benito Pérez Galdós – presentación por David ArroyoDAVIDSTREAMS.com
 

Destaque (17)

Terreno En Chriqui Con Mapak
Terreno En Chriqui Con MapakTerreno En Chriqui Con Mapak
Terreno En Chriqui Con Mapak
 
Hna Crescencia Perez
Hna Crescencia PerezHna Crescencia Perez
Hna Crescencia Perez
 
JBM-HH Bulletin 6-10
JBM-HH Bulletin 6-10JBM-HH Bulletin 6-10
JBM-HH Bulletin 6-10
 
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTOBrooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
Brooklin Prime Offices (11) 7853-9660 RENATA GABAN - LANÇAMENTO
 
NET-Metrix-Audit Juni 2009
NET-Metrix-Audit Juni 2009NET-Metrix-Audit Juni 2009
NET-Metrix-Audit Juni 2009
 
Governanca patricipacao social e dialogo federativo
Governanca patricipacao social e dialogo federativoGovernanca patricipacao social e dialogo federativo
Governanca patricipacao social e dialogo federativo
 
FUNDAEMPRESARIAL
FUNDAEMPRESARIALFUNDAEMPRESARIAL
FUNDAEMPRESARIAL
 
Cloud contractadministrationb
Cloud contractadministrationbCloud contractadministrationb
Cloud contractadministrationb
 
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
GDNÄ 2012: Prof. Heinz Gerhäuser über die "Faszination MP3"
 
Masies de Sant Joan Despi
Masies de Sant Joan DespiMasies de Sant Joan Despi
Masies de Sant Joan Despi
 
EasyVista Company presentation
EasyVista Company presentationEasyVista Company presentation
EasyVista Company presentation
 
Guia de los derechos de los trabajadores
Guia de los derechos de los trabajadoresGuia de los derechos de los trabajadores
Guia de los derechos de los trabajadores
 
EETT clinica cruz blanca
EETT clinica cruz blancaEETT clinica cruz blanca
EETT clinica cruz blanca
 
The Essential Global Guide to Liqueurs-Presentation
The Essential Global Guide to Liqueurs-Presentation The Essential Global Guide to Liqueurs-Presentation
The Essential Global Guide to Liqueurs-Presentation
 
Que Es La User Experience
Que Es La User ExperienceQue Es La User Experience
Que Es La User Experience
 
Benito Pérez Galdós – presentación por David Arroyo
Benito Pérez Galdós – presentación por David ArroyoBenito Pérez Galdós – presentación por David Arroyo
Benito Pérez Galdós – presentación por David Arroyo
 
11018 ftp
11018 ftp11018 ftp
11018 ftp
 

Semelhante a RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations

Smart Cities and Intelligent Buildings.pptx
Smart Cities and Intelligent Buildings.pptxSmart Cities and Intelligent Buildings.pptx
Smart Cities and Intelligent Buildings.pptxReetaDutta1
 
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptxPresentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptxSharanabasappaDegoan
 
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017Justin Hayward
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Boris Adryan
 
TIER TIER RESEARCH RESEARCH
TIER TIER RESEARCH RESEARCHTIER TIER RESEARCH RESEARCH
TIER TIER RESEARCH RESEARCHwebhostingguy
 
Smart Energy-Vincenzo Croce.pptx
Smart Energy-Vincenzo Croce.pptxSmart Energy-Vincenzo Croce.pptx
Smart Energy-Vincenzo Croce.pptxFIWARE
 
Fog Lifter Summary from CES
Fog Lifter Summary from CESFog Lifter Summary from CES
Fog Lifter Summary from CESbillwzel
 
Apidays x api3 9th dec
Apidays x api3   9th decApidays x api3   9th dec
Apidays x api3 9th decBenCarvill1
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenterwebhostingguy
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenterwebhostingguy
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenterwebhostingguy
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenterwebhostingguy
 
E mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCTE mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCTdinesh2vasu
 
React Native e IoT - Un progetto complesso
React Native e IoT - Un progetto complessoReact Native e IoT - Un progetto complesso
React Native e IoT - Un progetto complessoCommit University
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsMongoDB
 
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web PagesWSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web PagesIOSR Journals
 
FIWARE Overview (University Cairo 20Aug2017)
FIWARE Overview (University Cairo 20Aug2017)FIWARE Overview (University Cairo 20Aug2017)
FIWARE Overview (University Cairo 20Aug2017)FIWARE
 

Semelhante a RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations (20)

Smart Cities and Intelligent Buildings.pptx
Smart Cities and Intelligent Buildings.pptxSmart Cities and Intelligent Buildings.pptx
Smart Cities and Intelligent Buildings.pptx
 
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptxPresentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
Presentation-Smart-Cities-International-Virtual-Symposium-2021.pptx
 
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
Global C4IR-1 Masterclass Adryan - Zuehlke Engineering 2017
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
TIER TIER RESEARCH RESEARCH
TIER TIER RESEARCH RESEARCHTIER TIER RESEARCH RESEARCH
TIER TIER RESEARCH RESEARCH
 
Smart Energy-Vincenzo Croce.pptx
Smart Energy-Vincenzo Croce.pptxSmart Energy-Vincenzo Croce.pptx
Smart Energy-Vincenzo Croce.pptx
 
Fog Lifter Summary from CES
Fog Lifter Summary from CESFog Lifter Summary from CES
Fog Lifter Summary from CES
 
Apidays x api3 9th dec
Apidays x api3   9th decApidays x api3   9th dec
Apidays x api3 9th dec
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenter
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenter
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenter
 
Build vs. Buy: Internet Datacenter
Build vs. Buy: Internet DatacenterBuild vs. Buy: Internet Datacenter
Build vs. Buy: Internet Datacenter
 
E mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCTE mine by V.DINESH KUMAR KSRCT
E mine by V.DINESH KUMAR KSRCT
 
Bem2034
Bem2034Bem2034
Bem2034
 
React Native e IoT - Un progetto complesso
React Native e IoT - Un progetto complessoReact Native e IoT - Un progetto complesso
React Native e IoT - Un progetto complesso
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
 
DC4Cities: an innovative approach for efficient and environmentally sustainab...
DC4Cities: an innovative approach for efficient and environmentally sustainab...DC4Cities: an innovative approach for efficient and environmentally sustainab...
DC4Cities: an innovative approach for efficient and environmentally sustainab...
 
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web PagesWSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
WSO-LINK: Algorithm to Eliminate Web Structure Outliers in Web Pages
 
Supporting a Cloud Platform with Streams of Factory Shop Floor Data in the C...
Supporting a Cloud Platform with Streams of  Factory Shop Floor Data in the C...Supporting a Cloud Platform with Streams of  Factory Shop Floor Data in the C...
Supporting a Cloud Platform with Streams of Factory Shop Floor Data in the C...
 
FIWARE Overview (University Cairo 20Aug2017)
FIWARE Overview (University Cairo 20Aug2017)FIWARE Overview (University Cairo 20Aug2017)
FIWARE Overview (University Cairo 20Aug2017)
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

RuleML Challenge: Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations

  • 1. www.ict.tuwien.ac.at Institute of Computer Technology Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations Harold Boley harold.boley[at]unb.ca University of New Brunswick Faculty of Computer Science Fredericton, NB, Canada Benjamin Dönz doenz[at]ict.tuwien.ac.at Vienna University of Technology Institute of Computer Technology Vienna, Austria
  • 2. www.ict.tuwien.ac.at Institute of Computer Technology The „Deep Web“ – What is it? 2  Data hidden behind search forms and interfaces  Estimated 400-500 times more information than the indexable World Wide Web  77% of the content classified as structured information  Template based – so understanding how to extract one result allows to extract them all  Examples: • Web shops • Classified advertising • Miscellaneous databases
  • 4. www.ict.tuwien.ac.at Institute of Computer Technology Our Approach: Upper-Right Quadrant 4  Processing of queries using a query forwarding approach • SPARQL queries as input • Query transformation and forwarding via mediators • Global-as-View mapping of local sources  Web form interaction and information extraction • Extraction process based on an extensible model • Semantic annotations for mapping real-world Web pages to the model • Feature-based rules for creating annotations
  • 5. www.ict.tuwien.ac.at Institute of Computer Technology Model Overview 5  Query interface for submitting conjunctive queries
  • 6. www.ict.tuwien.ac.at Institute of Computer Technology Model Overview 6  Query interface for submitting conjunctive queries  Result list: all valid records, but only key attribute/value pairs
  • 7. www.ict.tuwien.ac.at Institute of Computer Technology Model Overview 7  Query interface for submitting conjunctive queries  Result list: all valid records, but only key attribute/value pairs  Result detail: all attribute/value pairs, but only one record
  • 9. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (1) 9 SELECT ? townname ?offername ?description ?realestaterent ?realestaterooms ?realestatefloorSpace WHERE {?object rdf:type realestate:RealEstateOffer; realestate:townname ?townname; realestate:offername ?offername; realestate:description ?description; realestate:rent ?realestaterent; realestate:rooms ?realestaterooms; realestate:floorSpace ?realestatefloorSpace}. FILTER (?realestaterent>=800 && ?realestaterent<=1200 && (?realestaterooms>=3 || ?realestatefloorSpace >= 80)) }  Query: “Return details of real estate offers with a rent between 800€ and 1200€ and either at least 3 rooms or 80m²”  SPARQL:
  • 10. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (2) 10  Transformation: • Parse Query • Transform filter to Disjunctive Normal Form and split into subqueries • Unfold to include relevant sources via Global-as-View mappings
  • 11. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (2) 11  Transformation: • Parse Query • Transform filter to Disjunctive Normal Form and split into subqueries • Unfold to include relevant sources via Global-as-View mappings  Result SELECT FROM <http://derStandard.at>: greaterOrEqual(realestate:rent,800) ∧ lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:rooms,3) UNION SELECT FROM <http://at.immolive24.com>: greaterOrEqual(realestate:rent,800) ∧ lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:rooms,3) UNION SELECT FROM <http://derStandard.at>: greaterOrEqual(realestate:rent,800) ∧ lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:floorSpace,100) UNION SELECT FROM <http://at.immolive24.com>: greaterOrEqual(realestate:rent,800) ∧ lessOrEqual(realestate:rent,1200) ∧ greaterOrEqual(realestate:floorSpace,100)
  • 12. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (3) 12  Features used for identification elements  “Properties” of the HTML tags  For example id, class, value, tag path, associated text, …
  • 13. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (4) 13  Object-centered Datalog rules  Conditions: conjunction of req. features  Conclusions: concepts of the model  Single efficient evaluation pass
  • 16. www.ict.tuwien.ac.at Institute of Computer Technology Walkthrough (7) 16 SELECT ? townname ?offername ?description ?realestaterent ?realestaterooms ?realestatefloorSpace WHERE {?object rdf:type realestate:RealEstateOffer; realestate:townname ?townname; realestate:offername ?offername; realestate:description ?description; realestate:rent ?realestaterent; realestate:rooms ?realestaterooms; realestate:floorSpace ?realestatefloorSpace}. FILTER (?realestaterent>=800 && ?realestaterent<=1200 && (?realestaterooms>=3 || ?realestatefloorSpace >= 80)) }  SPARQL  Output
  • 17. www.ict.tuwien.ac.at Institute of Computer Technology Live Presentation of Example Use Cases 17 Deep Web Mediator and examples available online: http://semann.bdoenz.com/default.aspx #1 Plain list: Extract the average rent per town from a single site. #2 Search and result list: Extract test results on cars of the brand Audi from a single site and return brand, model and the test conclusion. #3 Search, list and detail page: Extract real estate offers from a single site and return details for offers with 3 or more rooms and a rent of 800€ to 1200€. #4 Disjunctive query: Extract used car offers from a single site and return details of all offers for cars of the brand “Audi” that are priced under 12.500€ if the construction year is after 2011 or under 15.000€ if the construction year is after 2012. #5 Union: Extract used car offers from all available sites and return details of offers for cars of the brand “Audi” that are priced under 12.500€ and have a construction year after 2011. #6 Disjunctive union: Extract used car offers from all available sites and return details of all offers for cars of the brand “Audi” that are priced under 12.500€ if the construction year is after 2011 or under 15.000€ if the construction year is after 2012. #7 Relations between sources: Extract average rents and real estate offers from all available sites and return those that are located in a specific town and have a lower rent/m² than the average for that town. #8 Deep Web and local databases: Extract real estate offers from all available sites and add the type of town and population from a local dataset. #9 Deep Web and external databases: Extract real estate offers from all available sites and add a description of the town and the population from an external SPARQL endpoint (dbPedia).
  • 18. www.ict.tuwien.ac.at Institute of Computer Technology Conclusion 18  Processing of queries using a query forwarding approach • SPARQL queries as input • Query transformation and forwarding via mediators • Global-as-View mapping of local sources  Web form interaction and information extraction • Extraction process based on an extensible model • Semantic annotations for mapping real-world Web pages to the model • Feature-based rules for creating annotations
  • 19. www.ict.tuwien.ac.at Institute of Computer Technology Extracting Data from the Deep Web with Global-as-View Mediators Using Rule-Enriched Semantic Annotations Harold Boley harold.boley[at]unb.ca University of New Brunswick Faculty of Computer Science Fredericton, NB, Canada Benjamin Dönz Doenz[at]ict.tuwien.ac.at Vienna University of Technology Institute of Computer Technology Vienna, Austria
  • 20. www.ict.tuwien.ac.at Institute of Computer Technology Use case #3 20 This example is situated in the domain of real estate, and asks for the name of the offer, a description, the number of rooms, the floor space and the rent of all offers with 3 or more rooms and a rent in the range of 800€ and 1200€ from a specific real estate site. To process this query, the mediator accesses the query interface of the site, sets the parameters in the fields of the web form and triggers the search function to submit the query. The returned result lists are iterated to extract the values from the list itself, but also from subpages by following the the corresponding link for each record. All extracted facts are collected in a database and presented to the user in a tabular style including a link to the page where the offer was found. SELECT ?source ?realestatetownname ?realestateoffername ?realestatedescription ? realestaterent ?realestaterooms ?realestatefloorSpace FROM <http://derstandard.at/anzeiger/immoweb/Immobilien-suche.aspx> WHERE {?object rdf:type <http://semannot.bdoenz.com/realestate#RealEstateOffer>. OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#townname> ?realestatetownname}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#offername> ?realestateoffername}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#description> ?realestatedescription}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rent> ?realestaterent}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rooms> ?realestaterooms}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#floorSpace> ?realestatefloorSpace}. FILTER (?realestaterent>=800 && ?realestaterent<=1200 && ?realestaterooms>=3 ) }
  • 22. www.ict.tuwien.ac.at Institute of Computer Technology Use case #6 22 This example is situated in the domain of used cars and is the combination of the previous two examples: The query requests the name, model, color, construction year, mileage and price of offers from a single site, where the brand of the car is “Audi” and that are priced under 12.500€ if the construction year is after 2010, or under 15.000€ if the construction year is after 2011. This query is split into two conjunctive subqueries and submit to all three available sites returning the union of a total of 6 queries. SELECT DISTINCT ?source ?carsbrand ?carsmodel ?carsoffername ?carscolor ?carsmileage ? carsconstructionYear ?carsofferprice WHERE {?object rdf:type <http://semannot.bdoenz.com/cars#UsedCarOffer>. OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#brand> ?carsbrand}. ?object <http://semannot.bdoenz.com/cars#brand> "Audi". OPTIONAL {?object <http://semannot.bdoenz.com/cars#model> ?carsmodel}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#offername> ?carsoffername}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#color> ?carscolor}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#mileage> ?carsmileage}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#constructionYear> ?carsconstructionYear}. OPTIONAL {?object <http://semannot.bdoenz.com/cars#offerprice> ?carsofferprice}. FILTER ( (?carsconstructionYear>=2010 && ?carsofferprice<=12500) || (?carsconstructionYear>=2011 && ?carsofferprice<=15000) ) }
  • 24. www.ict.tuwien.ac.at Institute of Computer Technology Use case #7 24 This example is situated in the domain of real estate, and asks for the name of the offer, a description, the number of rooms, the floor space and the rent of all offers with 3 or more rooms in the town of "Klosterneuburg" where the rent is lower than the average rent per square meter for that town. No specific site is referenced in the query, the mediator therefore includes all sites with real estate offers and also a s site containig average rents. Each of these are accessed in turn collecting the intermediate results in a database before applying the filter and returning the results. Intermediate results are only available after the average rents have been extracted and can be compared to the offers in the defined manner. Note that this type of query cannot be generated by the query wizard, but is entered directly as SPARQL SELECT ?source ?realestatetownname ?realestateoffername ?realestaterent ?realestaterooms ? realestatefloorSpace WHERE {?object rdf:type <http://semannot.bdoenz.com/realestate#RealEstateOffer>. OPTIONAL {?object <http://semannot.bdoenz.com/mediatorvocabulary#sourceURL> ?source}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#townname> ?realestatetownname}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#offername> ?realestateoffername}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rent> ?realestaterent}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#rooms> ?realestaterooms}. OPTIONAL {?object <http://semannot.bdoenz.com/realestate#floorSpace> ?realestatefloorSpace}. ?community rdf:type <http://semannot.bdoenz.com/realestate#Community>. ?community <http://semannot.bdoenz.com/realestate#averageRent> ?avrent. ?community <http://semannot.bdoenz.com/realestate#communityname> ?cname. FILTER (REGEX(?realestatetownname,"Klosterneuburg","i") && ?realestaterent>=800 && ?realestaterent<=1200 && ?realestaterooms>=3 && REGEX(?cname,"Klosterneuburg","i") && ?realestaterent/?realestatefloorSpace <(?avrent))}