+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
AMBER presentation
1. Little Knowledge Rules The Web:
Domain-Centric Result Page Extraction
Tim Furche, Georg Gottlob, Giovanni Grasso,
Giorgio Orsi, Christian Scallhart, Cheng Wang
Department of Computer Science
University of Oxford
Cheng.wang@trinity.ox.ac.uk
4. AMBER: System Overview
Needs only Very high
one clue precision & recall
Adaptable Model-Based Extraction of Result Pages
Implemented Domain-Parameterized tool,
in rules currently aimed at UK real-estate
Part of DIADEM | Domain-centric Intelligent
Automated Data Extraction Methodology
6. Fact Generation & Annotation
• Live browser (Mozilla XUL-Runner)
• Extract DOM tree
• CSS box information
• Textual annotation with GATE (domain dep.)
– Gazetteers
– Regular expression like rules
• All represented as facts in the Page Model
8. Segmentation Mapping: Identification
Attribute Data area
• From bottom phenomena to data area
• Little knowledge rules the web
Only one domain concept
(mandatory attribute)
– Price
– Location
– Title
10. Segmentation Mapping: Understanding
• Data area Record
• Domain independent
• Identify leading nodes
• Two problems
– Superfluous nodes
– Correct shift
14. Summary
• AMBER - Adaptable Model-based Extraction of
Result Pages
– Domain knowledge simple heuristic
– Using DLV compact & easy implementation
– Understanding phase: only one domain clue
quickly adaptable to new domains
– Very High precision (99.4%) recall (99.0%)
15. Current Work
• Testing AMBER on another domain
• Integrate visual information in understanding
phase
• Use probabilistic logic programming to improve
the whole system