SlideShare uma empresa Scribd logo
1 de 58
Enabling a Personal View of the Deep Web Jeffrey P. Bigham Anna C. Cavender, Ryan S. Kaminsky, Craig M. Prince,  and  Tyler S. Robison University of Washington Computer Science and Engineering Transcendence
What is the Deep Web? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[1]   Bergman, M. K. The deep web: Surfacing hidden value, 2001. Introduction
Deep Web Resources Introduction
 
Deep Web Resources Introduction
Problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Introduction
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scenario ,[object Object],[object Object],[object Object],[object Object],[object Object],Transcending Craigslist
 
 
Generalize a Form Field
Add a Value
Add a Value
Add Another Value
Automatically Generate More Values
Results only for “University Village”
Fields Automatically Chosen
Extract for All Inputs
Review Extractions in Place
Extractions Sorted by Price
Transcending Craigslist ,[object Object],[object Object],[object Object],[object Object]
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Crawling the Deep Web ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[1]  Madhavan  et al.   “Structured data meets the web: A few observations.” 2006. [2]  Ntoulas  et al.  “ Downloading textual hidden web content through keyword queries.” 2005. Related Work
User Interfaces for the Web ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Related Work [1]   Huynh  et al .  “Enabling web browsers to augment web sites’ filtering and sorting functionalities.”  UIST 2006. [2]  Fujima  et al.  “Clip, connect, clone: combining application elements to build custom interfaces for information access.”  UIST 2004. [3]  Faaborg  et al.  “A goal-oriented web browser.” CHI 2006.
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1.  Generalize Form ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],3 Steps of Transcendence 1 2 3
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1-a) Finding Values with UIE [1]   http://labs.google.com/sets/ [2]   Etzioni  et al . “Methods for domain-independent information extraction from the web: an experimental comparison.”  2008 3 Steps of Transcendence 1 2 3
2.  Choose Fields & Extract ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],3 Steps of Transcendence [1]   Huynh  et al . “Enabling web browsers to augment web sites’ filtering and  sorting functionalities.” UIST 2006. 1 2 3
3.  Visualize Data ,[object Object],[object Object],[object Object],[object Object],[object Object],3 Steps of Transcendence 1 2 3
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Examples:  IMDB 1  Rating Dist. Additional Examples [1]   http://www.imdb.com Entered: “ Scent of a Woman,” “Rocky,” “Star Wars,” and “The Matrix” Generate > 7000 more titles
Examples:  IMDB 1  Rating Dist. Additional Examples [1]   http://www.imdb.com
Examples:  Directory Diving ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Additional Examples
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
User Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],User Evaluation
User Reaction & Comments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],User Evaluation
Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Future Work
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],Conclusion
Transcendence Jeffrey P. Bigham [email_address] www.cs.washington.edu/homes/jbigham/ Thanks to:   Mira Dontcheva, UW Turing Center, anonymous reviewers, and our study participants . The End
Some Extra Slides
Show Those Resulting from Specific Inputs
Show Those Resulting from Specific Inputs
Show Wedgewood Results
 
 
 
System Description
3 Steps of Transcendence Generalize Choose Fields & Extract Visualize
Examples:  Mapping Stores Additional Examples
Examples:  Kayak Flights Additional Examples
User Evaluation
Introduction
Introduction
Introduction
Introduction
Introduction
Introduction
Deep Web Resources Introduction

Mais conteúdo relacionado

Destaque

Deep Web
Deep WebDeep Web
Deep Web
St John
 
Tor projet
Tor projetTor projet
Tor projet
JP TQ
 
Power Point Lesson 08 P1
Power Point Lesson 08 P1Power Point Lesson 08 P1
Power Point Lesson 08 P1
Nasir Jumani
 
Power Point Lesson 09 Part 2
Power Point Lesson 09 Part 2Power Point Lesson 09 Part 2
Power Point Lesson 09 Part 2
Nasir Jumani
 

Destaque (18)

Deep Web - what to do and what not to do
Deep Web - what to do and what not to do	Deep Web - what to do and what not to do
Deep Web - what to do and what not to do
 
The Deep Web, TOR Network and Internet Anonymity
The Deep Web, TOR Network and Internet AnonymityThe Deep Web, TOR Network and Internet Anonymity
The Deep Web, TOR Network and Internet Anonymity
 
Deep web
Deep webDeep web
Deep web
 
The Deep and Dark Web
The Deep and Dark WebThe Deep and Dark Web
The Deep and Dark Web
 
Deep Web
Deep WebDeep Web
Deep Web
 
Deep web
Deep webDeep web
Deep web
 
La deep web
La deep webLa deep web
La deep web
 
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUESTUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
STUDY OF DEEP WEB AND A NEW FORM BASED CRAWLING TECHNIQUE
 
Tor projet
Tor projetTor projet
Tor projet
 
Deep web
Deep webDeep web
Deep web
 
Format string vunerability
Format string vunerabilityFormat string vunerability
Format string vunerability
 
ToR - Deep Web
ToR -  Deep Web ToR -  Deep Web
ToR - Deep Web
 
Deep web
Deep webDeep web
Deep web
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
How the Internet Works
How the Internet WorksHow the Internet Works
How the Internet Works
 
Power Point Lesson 08 P1
Power Point Lesson 08 P1Power Point Lesson 08 P1
Power Point Lesson 08 P1
 
Hiroshima i Nagasaki. 4t ESO. Ies Josep Tapiró.
Hiroshima i Nagasaki. 4t ESO. Ies Josep Tapiró.Hiroshima i Nagasaki. 4t ESO. Ies Josep Tapiró.
Hiroshima i Nagasaki. 4t ESO. Ies Josep Tapiró.
 
Power Point Lesson 09 Part 2
Power Point Lesson 09 Part 2Power Point Lesson 09 Part 2
Power Point Lesson 09 Part 2
 

Semelhante a Transcendence: Enabling A Personal View of the Deep Web

BAQMaR - Conference DM
BAQMaR - Conference DMBAQMaR - Conference DM
BAQMaR - Conference DM
BAQMaR
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
feiwin
 
Charting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningCharting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data Mining
Valeria de Paiva
 

Semelhante a Transcendence: Enabling A Personal View of the Deep Web (20)

WSDM 2011 - Nicolaas Matthijs and Filip Radlinski
WSDM 2011 - Nicolaas Matthijs and Filip RadlinskiWSDM 2011 - Nicolaas Matthijs and Filip Radlinski
WSDM 2011 - Nicolaas Matthijs and Filip Radlinski
 
By
ByBy
By
 
Mazhiming
MazhimingMazhiming
Mazhiming
 
Internet 信息检索中的数学
Internet 信息检索中的数学Internet 信息检索中的数学
Internet 信息检索中的数学
 
A new approach for user identification in web usage mining preprocessing
A new approach for user identification in web usage mining preprocessingA new approach for user identification in web usage mining preprocessing
A new approach for user identification in web usage mining preprocessing
 
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web CrawlerIRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
IRJET-Deep Web Crawling Efficiently using Dynamic Focused Web Crawler
 
BAQMaR - Conference DM
BAQMaR - Conference DMBAQMaR - Conference DM
BAQMaR - Conference DM
 
Test Automation Framework Designs
Test Automation Framework DesignsTest Automation Framework Designs
Test Automation Framework Designs
 
Ibm cognos-build-data-marts-reports-and-dashboards
Ibm cognos-build-data-marts-reports-and-dashboardsIbm cognos-build-data-marts-reports-and-dashboards
Ibm cognos-build-data-marts-reports-and-dashboards
 
Liquid Query: Multi-domain Exploratory Search on the Web
Liquid Query: Multi-domain Exploratory Search on the WebLiquid Query: Multi-domain Exploratory Search on the Web
Liquid Query: Multi-domain Exploratory Search on the Web
 
Making IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyMaking IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture Strategy
 
Searchland: Search quality for Beginners
Searchland: Search quality for BeginnersSearchland: Search quality for Beginners
Searchland: Search quality for Beginners
 
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
The Hidden Web, XML and the Semantic Web: A Scientific Data Management Perspe...
 
Website Usability
Website UsabilityWebsite Usability
Website Usability
 
Data Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and FutureData Mining and the Web_Past_Present and Future
Data Mining and the Web_Past_Present and Future
 
Web mining
Web miningWeb mining
Web mining
 
Charting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data MiningCharting Searchland, ACM SIG Data Mining
Charting Searchland, ACM SIG Data Mining
 
Searchland2
Searchland2Searchland2
Searchland2
 
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A ReviewIRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
IRJET-Multi -Stage Smart Deep Web Crawling Systems: A Review
 
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
1,2,3 … testing : is this thing on(line)? Meet your new Microsoft Testing tools
 

Mais de Jeffrey Bigham

Mais de Jeffrey Bigham (9)

Augmenting Vision for Accessibility
Augmenting Vision for AccessibilityAugmenting Vision for Accessibility
Augmenting Vision for Accessibility
 
Crowd-Powered Dialog
Crowd-Powered DialogCrowd-Powered Dialog
Crowd-Powered Dialog
 
Crowd Agents: Interactive Crowd-Powered Systems in the Real World
Crowd Agents:  Interactive Crowd-Powered Systems in the Real WorldCrowd Agents:  Interactive Crowd-Powered Systems in the Real World
Crowd Agents: Interactive Crowd-Powered Systems in the Real World
 
The Design of Human-Powered Access Technology
The Design of Human-Powered Access TechnologyThe Design of Human-Powered Access Technology
The Design of Human-Powered Access Technology
 
WebAnywhere - Experiences with a New Delivery Model for Access Technology
WebAnywhere - Experiences with a New Delivery Model for Access TechnologyWebAnywhere - Experiences with a New Delivery Model for Access Technology
WebAnywhere - Experiences with a New Delivery Model for Access Technology
 
Systems Science
Systems ScienceSystems Science
Systems Science
 
Trailblazer: Enabling Blind Web Users to Blaze Trails Through the Web
Trailblazer:  Enabling Blind Web Users to Blaze Trails Through the WebTrailblazer:  Enabling Blind Web Users to Blaze Trails Through the Web
Trailblazer: Enabling Blind Web Users to Blaze Trails Through the Web
 
Webanywhere: A Screen Reader On-the-Go
Webanywhere:  A Screen Reader On-the-GoWebanywhere:  A Screen Reader On-the-Go
Webanywhere: A Screen Reader On-the-Go
 
Accessmonkey: Scripting Accessibility
Accessmonkey:  Scripting AccessibilityAccessmonkey:  Scripting Accessibility
Accessmonkey: Scripting Accessibility
 

Último

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Transcendence: Enabling A Personal View of the Deep Web

Notas do Editor

  1. Transcendence helps make web forms more flexible and enables users to conduct queries that help them find the information they really want more easily from deep web resources. Transcendence is a web browser extension that enables a personal view of the deep web by making web forms more flexible, enabling users to perform queries of interest to them that are not supported by the original interface. . It enables users to enter multiple values for form input fields that may have originally been restricted to one, submits all combinations of form input automatically, and merges these results for easy visualization. It uses unsupervised information extraction to automatically supply inputs, enabling users to partially reconstruct the databases underlying deep web resources, facilitating aggregate queries that were previously impossible. Transcendence is joint work with fellow graduate students, Anna Cavender, Ryan Kaminsky, Craig Prince and Tyler Robison.