2. WHAT IS THE DEEP WEB?
• Also known as:
• Undernet
• Invisible Web
• Hidden Web
• All data that does not appear in search
engines (e.g. Google, Yahoo, Bing, etc.)
• Data found in search engines is
called the surface web
• Only 0.03% of available information!
• Deep web is estimated to be 500x
bigger than the surface web
3. HOW AND WHY DOES IT EXIST?
• It cannot be found by current
search engine technology
• Search engines have robots
or Web “spiders” index
websites through metadata
(e.g. page title, page
location (URL) and
repeated keywords used in
text
• collect page data from
hyperlinked pages
• Sites are dynamically
generated: data cannot be
indexed because information is
not hyperlinked
• not immediately accessible
to web spiders
Information may or may not be
purposely hidden, including:
• Data that needs to be accessed
by a search interface
• Results of database queries
• Subscription-only information
and other password-protected
data
• Pages that are not linked to by
any other page
• Technically limited content,
such as that
requiring CAPTCHA technology
• Text content that exists outside
of conventional http:// or https://
protocols
4. HOW TO ACCESS THE DEEP WEB
• Tor or The Onion Router
• makes tracking difficult by
“routing connections
through servers around
the world”
• access websites ending in
.onion
• originated from research
by the US Naval Research
Laboratory in 2003 to
protect “political dissidents
and whistleblowers”
5. IMPLICATIONS OF THE DEEP WEB
• Negative: Illegal activities
• “illicit drugs, child pornography,
stolen credit card
numbers, human trafficking,
weapons, exotic animals,
copyrighted media” (How Stuff
Works)
• Transactions in the deep
web are done through
Bitcoin – encrypted digital
currency which maintains
anonymity when transacting
• Positive
• Education: finding research
papers that can help different
fields and industries, e.g.
research for diseases
• Privacy: Increased privacy in “e-mail,
file storage and sharing,
social media, news outlets, and
whistleblowing sites”
• Free speech: Can be used by
civilians to overcome
censorship online in countries
with oppressive regimes
6. WHAT’S NEXT?
• The deep web grows each day
• Challenge is for programmers to improve search engine algorithms to
manage big data
• Big data – sets of data that are so large that they become
incoherent
• Companies who learn how to manage data have competitive
advantage to survive; those who rely only on surface web will not
• Make content more accessible (tips can be found here:
http://oedb.org/ilibrarian/invisible-web/)