SlideShare a Scribd company logo
1 of 27
There’s a universe of data on the Deep Web. And you’re missing most of it.
CONTENT
• INTRODUCTION
• HISTORY
• WHAT MAKES IT DEEP?
• DEEP WEB RESOURCES
• WHY DEEP WEB?
• WHEN TO USE THE DEEP WEB?
• HOW TO SEARCH THE DEEP WEB?
• WHAT TOOL TO BE USED?
• CONCLUSION
• REFERENCES
INTRODUCTION
• Deep web[1] is defined as the content on the Web that is
not accessible through a search on general search
engines.
• This content is sometimes also referred to as the invisible
or hidden web.
• Deep Web content includes information in private
databases that are accessible over the Internet but not
intended to be crawled by search engines.
INTRODUCTION(Contd.)
• Most of the search engines are only designed to search the
surface of the Web and they deliver less than 10% of the
available Internet information[2].
DEEP WEB V/S SURFACE
WEB
Surface web
Deep Web
DEEP WEB V/S SURFACE
WEB(Contd.)[3]
• Public information on the deep Web is currently 400 to 550
times larger than the commonly defined on World Wide
Web.
• The deep Web contains 7,500 terabytes of information
compared to 19 terabytes of information on the surface
Web.
• The deep Web contains nearly 550 billion individual
documents compared to the 1 billion of the surface Web.
• More than 2,00,000 deep Web sites presently exist.
DEEP WEB V/S SURFACE WEB
(Contd.)
• 60 of the largest deep-Web sites collectively contain about
750 terabytes of information - sufficient by themselves to
exceed the size of the surface Web 40 times.
• On average, deep Web sites receive 50% greater monthly
traffic than surface sites and are more highly linked than
surface sites; however, the typical Deep Web site is not well
known to the Internet-searching public.
• Total quality content of the deep Web is 1,000 to 2,000 times
greater than that of the surface Web.
DEEP WEB V/S SURFACE WEB
(Contd.)
• Deep Web content is highly relevant to every information
need, market and domain.
• A full 95% of the deep Web is publicly accessible information-
not subject to fees or subscriptions.
HISTORY OF DEEP WEB[4,5]
• Jill Ellsworth used the term invisible Web in 1994 to refer
to websites that are not registered with any search engine.
• In 1996, Frank Garcia , in an article said that:
"It would be a site that's possibly reasonably designed, but they
didn't bother to register it with any of the search engines. So, no one can
find them! You're hidden. I call that the invisible Web.“
•Another early use of the term invisible Web was by Bruce
Mount and Matthew B., in a description of the @1 deep
Web tool found in a December 1996 press release.
• In 2001, the first use of the specific term deep Web was
generally accepted.
WHAT MAKES IT DEEP?[6]
Search engines typically do not index the following types of
Web sites:
• Proprietary sites
• Sites requiring a registration
• Sites with scripts
• Dynamic sites
WHAT MAKES IT DEEP?
(Contd.)
• Ephemeral sites
• Sites blocked by local webmasters
• Sites blocked by search engine policy
• Sites with special formats
• Searchable database
DEEP WEB RESOURCES
• Dynamic content
• Unlinked content
• Private Web
• Contextual Web
• Limited access content
• Scripted content
• Non-HTML/text content
WHY DEEP WEB?
• Quality of content / higher level of authority
• Comprehensiveness
• Focused
• Timeliness
• The material isn’t available elsewhere on the Web
WHEN TO USE THE DEEP
WEB?
• Standard search engines aren’t working.
• A precise answer is needed.
• Data or statistics are needed.
• High quality or authoritative results are needed.
• When timeliness is important.
WHEN TO USE THE DEEP
WEB?(Contd.)
• You know the subject area well.
• Looking for collections [images, sounds, manuscripts etc]
• Reference books online [handbooks, guides, dictionaries,
encyclopedias, directories etc]
HOW TO SEARCH THE DEEP
WEB?
• Determine the specific topic you need to find.
• Categorize your topic.
• Decide what type of source you'd like to search.
• Choose your starting point based on your objective
What Tools To Be Used?
• WorldWideScience.org
Global Science gateway to national and international
scientific databases.
• Infomine
It has been built by a pool of libraries in the United
States. You can search by subject category and
further tweak your search using the search options.
• Complete Planet
Calls itself the ‘front door' to the Deep Web. This free
and well designed directory resource makes it easy
to access the mass of dynamic databases that are
cloaked from a general purpose search. The
databases indexed by Complete Planet number
around 70,000 and range from Agriculture to
Weather. Also thrown in are databases like Food &
Drink and Military.
What Tools To Be Used?
(Contd.)
What Tools To Use?
(Contd.)
• TechXtra:
It concentrates on engineering, mathematics and
computing. It gives us industry news, job announcements,
technical reports, technical data, full text ,teaching and
learning resources along with articles and relevant website
information.
CONCLUSION
The Deep Web contains valuable resources that are not
easily accessible by automated search engines but
readily available to enlightened searchers.
It makes the online search process more efficient and
productive as it constitutes the resources missed in the
Surface Web.
REFERENCES
1. www.internettutorials.net
2. http://www.releseek.com
3. http://beta.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp
4. Bergman, Michael K. (August 2001). "The Deep Web: Surfacing Hidden
Value". The Journal of Electronic Publishing 7
5. Garcia, Frank (January 1996). "Business and Marketing on the Internet"
6. www.computerworld.com
7. http://www.infomine.ucr.edu/
8. htttp://www.completeplanet.com
Thank You
Queries ???

More Related Content

Similar to Presentation Deep Web Technology.pptx

Deep Web Presentation April 25
Deep Web Presentation April 25Deep Web Presentation April 25
Deep Web Presentation April 25
nagold
 
Arabic Text mining Classification
Arabic Text mining Classification Arabic Text mining Classification
Arabic Text mining Classification
Zakaria Zubi
 
Deepak semantic web_iitd
Deepak semantic web_iitdDeepak semantic web_iitd
Deepak semantic web_iitd
Deepak Shevani
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
Essam Obaid
 

Similar to Presentation Deep Web Technology.pptx (20)

The Next Web of Linked Data -- University of St Thomas SEIS 708
The Next Web of Linked Data -- University of St Thomas SEIS 708The Next Web of Linked Data -- University of St Thomas SEIS 708
The Next Web of Linked Data -- University of St Thomas SEIS 708
 
Deep Web
Deep WebDeep Web
Deep Web
 
Minnebar9 -- The Next Web of Linked Data
Minnebar9 -- The Next Web of Linked DataMinnebar9 -- The Next Web of Linked Data
Minnebar9 -- The Next Web of Linked Data
 
Pandora
PandoraPandora
Pandora
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of Entities
 
Deep Web Presentation April 25
Deep Web Presentation April 25Deep Web Presentation April 25
Deep Web Presentation April 25
 
UNDERSTANDINGWWW - SEARCH ENGINE[Replica].pdf
UNDERSTANDINGWWW - SEARCH ENGINE[Replica].pdfUNDERSTANDINGWWW - SEARCH ENGINE[Replica].pdf
UNDERSTANDINGWWW - SEARCH ENGINE[Replica].pdf
 
Arabic Text mining Classification
Arabic Text mining Classification Arabic Text mining Classification
Arabic Text mining Classification
 
Deepak semantic web_iitd
Deepak semantic web_iitdDeepak semantic web_iitd
Deepak semantic web_iitd
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Deep web and dark web
Deep web and dark webDeep web and dark web
Deep web and dark web
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
L017447590
L017447590L017447590
L017447590
 
ppt
pptppt
ppt
 
Trends and advancements in www.pptx
Trends and advancements in www.pptxTrends and advancements in www.pptx
Trends and advancements in www.pptx
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
 
The development of web archiving 3
The development of web archiving 3The development of web archiving 3
The development of web archiving 3
 
Trends and advancements in www.pptx
Trends and advancements in www.pptxTrends and advancements in www.pptx
Trends and advancements in www.pptx
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
 
Internet and Its Applications
Internet and Its ApplicationsInternet and Its Applications
Internet and Its Applications
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Presentation Deep Web Technology.pptx

  • 1. There’s a universe of data on the Deep Web. And you’re missing most of it.
  • 2. CONTENT • INTRODUCTION • HISTORY • WHAT MAKES IT DEEP? • DEEP WEB RESOURCES • WHY DEEP WEB? • WHEN TO USE THE DEEP WEB? • HOW TO SEARCH THE DEEP WEB? • WHAT TOOL TO BE USED? • CONCLUSION • REFERENCES
  • 3. INTRODUCTION • Deep web[1] is defined as the content on the Web that is not accessible through a search on general search engines. • This content is sometimes also referred to as the invisible or hidden web. • Deep Web content includes information in private databases that are accessible over the Internet but not intended to be crawled by search engines.
  • 4. INTRODUCTION(Contd.) • Most of the search engines are only designed to search the surface of the Web and they deliver less than 10% of the available Internet information[2].
  • 5. DEEP WEB V/S SURFACE WEB Surface web Deep Web
  • 6. DEEP WEB V/S SURFACE WEB(Contd.)[3] • Public information on the deep Web is currently 400 to 550 times larger than the commonly defined on World Wide Web. • The deep Web contains 7,500 terabytes of information compared to 19 terabytes of information on the surface Web. • The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web. • More than 2,00,000 deep Web sites presently exist.
  • 7. DEEP WEB V/S SURFACE WEB (Contd.) • 60 of the largest deep-Web sites collectively contain about 750 terabytes of information - sufficient by themselves to exceed the size of the surface Web 40 times. • On average, deep Web sites receive 50% greater monthly traffic than surface sites and are more highly linked than surface sites; however, the typical Deep Web site is not well known to the Internet-searching public. • Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface Web.
  • 8. DEEP WEB V/S SURFACE WEB (Contd.) • Deep Web content is highly relevant to every information need, market and domain. • A full 95% of the deep Web is publicly accessible information- not subject to fees or subscriptions.
  • 9. HISTORY OF DEEP WEB[4,5] • Jill Ellsworth used the term invisible Web in 1994 to refer to websites that are not registered with any search engine. • In 1996, Frank Garcia , in an article said that: "It would be a site that's possibly reasonably designed, but they didn't bother to register it with any of the search engines. So, no one can find them! You're hidden. I call that the invisible Web.“ •Another early use of the term invisible Web was by Bruce Mount and Matthew B., in a description of the @1 deep Web tool found in a December 1996 press release. • In 2001, the first use of the specific term deep Web was generally accepted.
  • 10. WHAT MAKES IT DEEP?[6] Search engines typically do not index the following types of Web sites: • Proprietary sites • Sites requiring a registration • Sites with scripts • Dynamic sites
  • 11. WHAT MAKES IT DEEP? (Contd.) • Ephemeral sites • Sites blocked by local webmasters • Sites blocked by search engine policy • Sites with special formats • Searchable database
  • 12. DEEP WEB RESOURCES • Dynamic content • Unlinked content • Private Web • Contextual Web • Limited access content • Scripted content • Non-HTML/text content
  • 13. WHY DEEP WEB? • Quality of content / higher level of authority • Comprehensiveness • Focused • Timeliness • The material isn’t available elsewhere on the Web
  • 14. WHEN TO USE THE DEEP WEB? • Standard search engines aren’t working. • A precise answer is needed. • Data or statistics are needed. • High quality or authoritative results are needed. • When timeliness is important.
  • 15. WHEN TO USE THE DEEP WEB?(Contd.) • You know the subject area well. • Looking for collections [images, sounds, manuscripts etc] • Reference books online [handbooks, guides, dictionaries, encyclopedias, directories etc]
  • 16. HOW TO SEARCH THE DEEP WEB? • Determine the specific topic you need to find. • Categorize your topic. • Decide what type of source you'd like to search. • Choose your starting point based on your objective
  • 17. What Tools To Be Used? • WorldWideScience.org Global Science gateway to national and international scientific databases. • Infomine It has been built by a pool of libraries in the United States. You can search by subject category and further tweak your search using the search options.
  • 18.
  • 19.
  • 20. • Complete Planet Calls itself the ‘front door' to the Deep Web. This free and well designed directory resource makes it easy to access the mass of dynamic databases that are cloaked from a general purpose search. The databases indexed by Complete Planet number around 70,000 and range from Agriculture to Weather. Also thrown in are databases like Food & Drink and Military. What Tools To Be Used? (Contd.)
  • 21.
  • 22. What Tools To Use? (Contd.) • TechXtra: It concentrates on engineering, mathematics and computing. It gives us industry news, job announcements, technical reports, technical data, full text ,teaching and learning resources along with articles and relevant website information.
  • 23.
  • 24. CONCLUSION The Deep Web contains valuable resources that are not easily accessible by automated search engines but readily available to enlightened searchers. It makes the online search process more efficient and productive as it constitutes the resources missed in the Surface Web.
  • 25. REFERENCES 1. www.internettutorials.net 2. http://www.releseek.com 3. http://beta.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp 4. Bergman, Michael K. (August 2001). "The Deep Web: Surfacing Hidden Value". The Journal of Electronic Publishing 7 5. Garcia, Frank (January 1996). "Business and Marketing on the Internet" 6. www.computerworld.com 7. http://www.infomine.ucr.edu/ 8. htttp://www.completeplanet.com