Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Presentation Deep Web Technology.pptx
1. There’s a universe of data on the Deep Web. And you’re missing most of it.
2. CONTENT
• INTRODUCTION
• HISTORY
• WHAT MAKES IT DEEP?
• DEEP WEB RESOURCES
• WHY DEEP WEB?
• WHEN TO USE THE DEEP WEB?
• HOW TO SEARCH THE DEEP WEB?
• WHAT TOOL TO BE USED?
• CONCLUSION
• REFERENCES
3. INTRODUCTION
• Deep web[1] is defined as the content on the Web that is
not accessible through a search on general search
engines.
• This content is sometimes also referred to as the invisible
or hidden web.
• Deep Web content includes information in private
databases that are accessible over the Internet but not
intended to be crawled by search engines.
4. INTRODUCTION(Contd.)
• Most of the search engines are only designed to search the
surface of the Web and they deliver less than 10% of the
available Internet information[2].
6. DEEP WEB V/S SURFACE
WEB(Contd.)[3]
• Public information on the deep Web is currently 400 to 550
times larger than the commonly defined on World Wide
Web.
• The deep Web contains 7,500 terabytes of information
compared to 19 terabytes of information on the surface
Web.
• The deep Web contains nearly 550 billion individual
documents compared to the 1 billion of the surface Web.
• More than 2,00,000 deep Web sites presently exist.
7. DEEP WEB V/S SURFACE WEB
(Contd.)
• 60 of the largest deep-Web sites collectively contain about
750 terabytes of information - sufficient by themselves to
exceed the size of the surface Web 40 times.
• On average, deep Web sites receive 50% greater monthly
traffic than surface sites and are more highly linked than
surface sites; however, the typical Deep Web site is not well
known to the Internet-searching public.
• Total quality content of the deep Web is 1,000 to 2,000 times
greater than that of the surface Web.
8. DEEP WEB V/S SURFACE WEB
(Contd.)
• Deep Web content is highly relevant to every information
need, market and domain.
• A full 95% of the deep Web is publicly accessible information-
not subject to fees or subscriptions.
9. HISTORY OF DEEP WEB[4,5]
• Jill Ellsworth used the term invisible Web in 1994 to refer
to websites that are not registered with any search engine.
• In 1996, Frank Garcia , in an article said that:
"It would be a site that's possibly reasonably designed, but they
didn't bother to register it with any of the search engines. So, no one can
find them! You're hidden. I call that the invisible Web.“
•Another early use of the term invisible Web was by Bruce
Mount and Matthew B., in a description of the @1 deep
Web tool found in a December 1996 press release.
• In 2001, the first use of the specific term deep Web was
generally accepted.
10. WHAT MAKES IT DEEP?[6]
Search engines typically do not index the following types of
Web sites:
• Proprietary sites
• Sites requiring a registration
• Sites with scripts
• Dynamic sites
11. WHAT MAKES IT DEEP?
(Contd.)
• Ephemeral sites
• Sites blocked by local webmasters
• Sites blocked by search engine policy
• Sites with special formats
• Searchable database
12. DEEP WEB RESOURCES
• Dynamic content
• Unlinked content
• Private Web
• Contextual Web
• Limited access content
• Scripted content
• Non-HTML/text content
13. WHY DEEP WEB?
• Quality of content / higher level of authority
• Comprehensiveness
• Focused
• Timeliness
• The material isn’t available elsewhere on the Web
14. WHEN TO USE THE DEEP
WEB?
• Standard search engines aren’t working.
• A precise answer is needed.
• Data or statistics are needed.
• High quality or authoritative results are needed.
• When timeliness is important.
15. WHEN TO USE THE DEEP
WEB?(Contd.)
• You know the subject area well.
• Looking for collections [images, sounds, manuscripts etc]
• Reference books online [handbooks, guides, dictionaries,
encyclopedias, directories etc]
16. HOW TO SEARCH THE DEEP
WEB?
• Determine the specific topic you need to find.
• Categorize your topic.
• Decide what type of source you'd like to search.
• Choose your starting point based on your objective
17. What Tools To Be Used?
• WorldWideScience.org
Global Science gateway to national and international
scientific databases.
• Infomine
It has been built by a pool of libraries in the United
States. You can search by subject category and
further tweak your search using the search options.
18.
19.
20. • Complete Planet
Calls itself the ‘front door' to the Deep Web. This free
and well designed directory resource makes it easy
to access the mass of dynamic databases that are
cloaked from a general purpose search. The
databases indexed by Complete Planet number
around 70,000 and range from Agriculture to
Weather. Also thrown in are databases like Food &
Drink and Military.
What Tools To Be Used?
(Contd.)
21.
22. What Tools To Use?
(Contd.)
• TechXtra:
It concentrates on engineering, mathematics and
computing. It gives us industry news, job announcements,
technical reports, technical data, full text ,teaching and
learning resources along with articles and relevant website
information.
23.
24. CONCLUSION
The Deep Web contains valuable resources that are not
easily accessible by automated search engines but
readily available to enlightened searchers.
It makes the online search process more efficient and
productive as it constitutes the resources missed in the
Surface Web.
25. REFERENCES
1. www.internettutorials.net
2. http://www.releseek.com
3. http://beta.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp
4. Bergman, Michael K. (August 2001). "The Deep Web: Surfacing Hidden
Value". The Journal of Electronic Publishing 7
5. Garcia, Frank (January 1996). "Business and Marketing on the Internet"
6. www.computerworld.com
7. http://www.infomine.ucr.edu/
8. htttp://www.completeplanet.com