SlideShare uma empresa Scribd logo
1 de 44
Lecture 39:
                                   …and the
                                   World Wide
                                   Web




cs1120 Fall 2011
David Evans
http://www.cs.virginia.edu/evans
Announcements
Exam 2 due 61 seconds ago!
           70
           69
           68
           67
           66
           65
           64
           63
           62
           60

Friday: we will return graded Exam 2, along with
  guidance about the Final

  Must be present (or email me in advance) to win!

 If you want to present your PS8 in class Monday, remember to email me!



                                                                          2
Plan
The World Wide Web
Building Web Applications
How Google Works
  (or, going back to pre-PS5 to make things
      really fast again!)
cs1120 recap in one (heavily animated) slide!



                                                3
The World Wide Web
The “Desk Wide Web”




            Memex Machine
Vannevar Bush, As We May Think, LIFE, 1945
WorldWideWeb




Sir Tim Berners-Lee   First web server and client, 1990
CERN (Switzerland)           (This picture, 1993)
        MIT
Overview:
                                             Many of the discussions of the
                                             future at CERN and the LHC era
                                             end with the question – “Yes, but
                                             how will we ever keep track of
                                             such a large project?” This
                                             proposal provides an answer to
                                             such questions. Firstly, it
                                             discusses the problem of
                                             information access at CERN.
                                             Then, it introduces the idea of
                                             linked information systems, and
                                             compares them with less flexible
                                             ways of finding information.


http://www.w3.org/History/1989/proposal-msw.html
A Practical Project




                      8
9
WorldWideWeb
Established a common language for sharing
  information on computers

Lots of previous attempts (Gopher, WAIS,
  Archie, Xanadu, etc.) failed




                                            10
Why the World Wide Web?
World Wide Web succeeded because it was simple!

Didn’t attempt to maintain links, just a common
  way to name things
Uniform Resource Locators (URL)
   http://www.cs.virginia.edu/cs1120/index.html
    Service     Hostname            File Path


HyperText Transfer Protocol
HyperText Transfer Protocol

                                      Server

                   GET /cs1120/index.html HTTP/1.0


                            <html>
                            <head>               Contents
                            …                    of file


Client (Browser)     HTML
                      HyperText Markup Language
HTML: HyperText Markup Language

Language for controlling display of web pages
Uses formatting tags: between < and >
        Document ::= <html> Header Body </html>
        Header ::= <head> HeadElements </head>
        HeadElements ::= HeadElement HeadElements
        HeadElements ::= ε | <title> Element </title>
        Body ::= <body> Elements </body>
        Elements ::= ε | Element Elements
        Element ::= <p> Element </p>
        Element ::= <center> Element </center>
        …
Popular Web Site: Strategy 1
          Static, Authored Web Site
                                                     Drawbacks:
                                                     •Have to do all the
                                                     work yourself
                                                     •The world may
                                                     already have enough
                                                     Twinkie-experiment
                                                     websites

Content Producer




                   http://www.twinkiesproject.com/
Popular Web Site: Strategy 2
           Dynamic Web Applications
                                                                                   Attracts users
                                 Seed content and
                                     function




Web Programmer




                                                                                  Produce more
                                                                                     content
                                      eBay in 1997
                 http://web.archive.org/web/19970614001443/http://www.ebay.com/
Popular Web Site: Strategy 2
               Dynamic Web Applications
                                             Attracts users
   Seed content and
       function




Advantages:
• Users do most of the work
• If you’re lucky, they might even pay you
 for the privilege!

Disadvantages:
• Lose control over the content (you might
                                       Produce more
   get sued for things your users do)
                                           content            reddit.com today
• Have to know how to program a web
   application
    reddit.com in 2005
Dynamic Web Sites
Programs that run on the web server
   Can be written in any language (often in Python or Java), just
     need a way to connect the web server to the program
   Program generates HTML (often JavaScript also now)
   Every useful web site does this
Programs that run on the client’s machine
   Java, JavaScript (aka, “Scheme for the Web”), Flash, etc.:
     language must be supported by the client’s browser
   Responsive interface: limited round-trips to server
Searching the Web




                    18
19
Building a Web Search Engine
Database of web pages
  Crawling the web collecting pages and links
  Indexing them efficiently
Responding to Searches
  Spell checking – edit distance
  How to find documents that match a query
  How to rank the “best” documents
Crawling Crawler
activeURLs = * “www.yahoo.com” +
while (len(activeURLs) > 0) :
 newURLs = [ ]
 for URL in activeURLs:
    page = downloadPage (URL)
    newURLs += extractLinks (page)
 activeURLs = newURLs
                     Problems:
                     Will keep revisiting the same pages
                     Will take very long to get a good view of the web
                     Will annoy web server admins
                     downloadPage and extractLinks must be very robust
Building a Web Search Engine
Database of web pages
  Crawling the web collecting pages and links
  Indexing them efficiently
Responding to Searches
  How to find documents that match a query
  How to rank the “best” documents
Building an Index
What if we just stored all the pages?
Answering a query would be (size of the database)
      (need to look at all characters in database)

Google: about 40 Billion pages (1 Trillion URLs, but number
actually indexed is a closely kept corporate secret)
               * 60 KB (average web page size)
       = ~2.4 Quadrillion bytes to search!

Linear is not nearly good enough when n is Quadrillions
Hash Table
             Index                          Key-Value Pairs
               0               , <“Colleen”, ? >, <“virginia”, ? >, … -
               1               , <“Bob”, ? >, … -
               2
               3
               …
     [about a million bins?]

def lookup(key, table) : searchEntries(table[H(key, len(table))])

       Finding a good H is difficult
           You can download google’s from
           http://code.google.com/p/google-sparsehash/
Google’s Lexicon
1998: 14 million words (billions today?)
Lookup word in H(word, nbins): maps to WordID

    Key                       Words
      0       *<“aardvark”, 1024235>, ... +
      1       *<“aaa”, 224155>, ..., <“zzz”, 29543> +
     ...      ...
  nbins – 1   *<“abba”, 25583>, ..., <“zeit”, 50395> +
Google’s Reverse Index
 (Based on 1998 paper…definitely changed some since then, but now they are secretive!)

  WordId         ndocs         pointer
00000000                 3
00000001               15

...                                                              “Inverted
                                                                  Barrels”:
16777215              105                                       41 GB (1998)
                                                              Today: many TB?
         Lexicon: 293 MB (1998)
         Today: many GB?
Inverted Barrels
docid (27 bits)    nhits (5 bits)   hits (16 bits
                                    each)            plain hit:
                                                     capitalized: 1 bit
7630486927 23                                        font size: 3 bits
                                                     position: 12 bits
...                                                    first 4095 chars,
                                                       everything else

                                                     extra info for
                                                     anchors, titles
                                                     (less position bits)

                     Suggested experiment for winter break:
                     is the position field still only 12 bits?
Building a Web Search Engine
Database of web pages
  Crawling the web collecting pages and links
  Indexing them efficiently
Responding to Searches
  Spell checking – edit distance
  How to find documents that match a query
  How to rank the “best” documents
Finding the “Best” Documents
Humans rate them
  “Jerry and David’s Guide to the World Wide Web”
    (became Yahoo!)
Machines rate them
  Count number of occurrences of keyword
     Easy for sites to rig this
  Machine language understanding not good enough
Business Model
  Whoever pays you the most is listed first
PageRank
If a site is important and interesting, other sites
   will link to it.
 Don’t ever take <a href=http://www.cs.virginia.edu/cs1120>cs1120</a>!



But…not all links are equal:
  if a lot of highly-ranked sites link to this site,
  this site should be highly-ranked.


                                                                         30
PageRank
def pageRank (u):
  rank = 0
  for b in linksToPage (u)
     rank = rank + PageRank (b) / Links (b)
  return rank


                                Would this work?
Converging PageRank
Ranks of all pages depend on ranks of all other
  pages
Keep recalculating ranks until they converge
def CalculatePageRanks (urls):
 initially, every rank is 1
 for as many times as necessary
    calculate a new rank for each page (using old ranks)
    replace the old ranks with the new ranks
                                  How do initial ranks effect results?
                                  How many iterations are necessary?
PageRank: 1998
Crawlable web (1998):
  150 million pages, 1.7 Billion links
Database of 322 million links
  Converges in about 50 iterations
Initialization matters
  All pages = 1: very democratic, models browser
    equally likely to start on random page
  www.yahoo.com = 1, ..., all others = 0
     More like what Google probably uses
Do we have a
  search engine?


Theoretician: Sure!

Ali G: No way! It’ll blow up.




                                Google’s First Server
                                                        34
How do we make our service fast
enough to index the whole web
 and serve billions of requests?




                                   35
Counting Word Occurrences

“When in the Course of human events, it
                                                            * <“When”, 1>,
becomes necessary for one people to dissolve
                                                              <“in”, 1>,
the political bands which have connected them
                                                              <“the”, 2>
with another, …”
                                                              …+

“We the People of the United States, in Order               * <“We”, 1>,
to form a more perfect Union, establish Justice,              <“in”, 1>,
insure domestic Tranquility, provide for the …”               <“the”, 2>
                                                              …+


                        map(doc, countWords)
         If we have enough machines, can we do this fast for the whole web?


                                                                              36
* <“When”, 1>,
  <“in”, 1>,
  <“the”, 2>
  …+
                   reduce
* <“We”, 1>,
  <“in”, 1>,
                  * <“We”, 1>,
  <“the”, 2>
                    <“in”, 2>,            * <“a”, 5>,
  …+
                    …+           reduce     <“in”, 6>,
* <“a”, 5>,                                 …+
  <“in”, 3>,
  <“the”, 2>
  …+
                    reduce
* <“apple”, 1>,
  <“in”, 1>,
  <“the”, 7>      * <“a”, 5>,
  …+                <“in”, 4>,
                    …+
MapReduce




            38
Key to Massive Parallel Execution


   Get rid of state and mutation!




                                    39
(define (count-matches p b)                                                                   Functional Programming
          (list-sum (map (lambda (v) (if (eq? v b) 1 0)) p)))                                                 (PS 1-4)

        def meval(expr, env):
                                                                                                           Interpreters
          … return evalApplication(expr, env)

... #    1   0   1   1   0   1   1   1   0   1   1   0       1   1   1       #
                                                                                 ...


                                                                                                         Any Mechanical
             1                                   3                               Turing Machine
                                 2                                                                        Computation

                                                         A               B             C   R1   R0
   (or a b)
                                                     0               0                 0   0    0
   (not (and (not a)                                 0               0                 1   0    1    Any Discrete Function
             (not b)))                               …               …                 …   …    …
                                         AND                         NOT                             Mechanical Logic

                                                                                                     “Magic” Transistors
                                                                                                                               40
(define (count-matches p b)                                                                   Functional Programming
          (list-sum (map (lambda (v) (if (eq? v b) 1 0)) p)))                                                 (PS 1-4)

        def meval(expr, env):
                                                                                                           Interpreters
          … return evalApplication(expr, env)

... #    1   0   1   1   0   1   1   1   0   1   1   0       1   1   1       #
                                                                                 ...


                                                                                                         Any Mechanical
             1                                   3                               Turing Machine
                                 2                                                                        Computation

                                                         A               B             C   R1   R0
   (or a b)
                                                     0               0                 0   0    0
   (not (and (not a)                                 0               0                 1   0    1    Any Discrete Function
             (not b)))                               …               …                 …   …    …
                                         AND                         NOT                             Mechanical Logic

                                                                                                     “Magic” Transistors
SimObject

                                 PhysicalObject                                                          Objects
                                                                             Place
                                 MobileObject


        m1:                                                                                           State and Mutation
                                     1                           2                         3


        (define (count-matches p b)                                                                  Functional Programming
          (list-sum (map (lambda (v) (if (eq? v b) 1 0)) p)))                                                (PS 1-4)

        def meval(expr, env):
                                                                                                         Interpreters
          … return evalApplication(expr, env)

... #    1   0   1   1   0   1   1   1   0   1   1   0       1   1   1       #
                                                                                 ...


                                                                                                       Any Mechanical
             1                                   3                               Turing Machine
                                 2                                                                      Computation

                                                         A               B             C   R1   R0
   (or a b)
SimObject

                                 PhysicalObject                                                          Objects
                                                                             Place
                                 MobileObject


        m1:                                                                                           State and Mutation
                                     1                           2                         3


        (define (count-matches p b)                                                                  Functional Programming
          (list-sum (map (lambda (v) (if (eq? v b) 1 0)) p)))                                                (PS 1-4)

        def meval(expr, env):
                                                                                                         Interpreters
          … return evalApplication(expr, env)

... #    1   0   1   1   0   1   1   1   0   1   1   0       1   1   1       #
                                                                                 ...


                                                                                                       Any Mechanical
             1                                   3                               Turing Machine
                                 2                                                                      Computation

                                                         A               B             C   R1   R0
   (or a b)
Objects




                         Recursive Definitions
   State and Mutation

Functional Programming
                                                                                     Charge
        (PS 1-4)




                                                 Universality

                                                                Abstraction
                                                                              Now, you know
      Interpreters
                                                                              almost everything
                                                                              you need to build the
    Any Mechanical
     Computation                                                              next reddit or
                                                                              google!
 Any Discrete Function

    Mechanical Logic

   “Magic” Transistors

Mais conteúdo relacionado

Semelhante a Class 39: ...and the World Wide Web

Drupal for Webmasters by Brett Baker
Drupal for Webmasters by Brett BakerDrupal for Webmasters by Brett Baker
Drupal for Webmasters by Brett Bakerwebfinearts
 
IWMW 1998: Dataweb: Three Worlds Colide
IWMW 1998: Dataweb: Three Worlds ColideIWMW 1998: Dataweb: Three Worlds Colide
IWMW 1998: Dataweb: Three Worlds ColideIWMW
 
Power to the Users (and Librarians)
Power to the Users (and Librarians)Power to the Users (and Librarians)
Power to the Users (and Librarians)Guus van den Brekel
 
StackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinStackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinBoyd Hemphill
 
DockerCon SF 2015: Ben Golub's Keynote Day 1
DockerCon SF 2015: Ben Golub's Keynote Day 1DockerCon SF 2015: Ben Golub's Keynote Day 1
DockerCon SF 2015: Ben Golub's Keynote Day 1Docker, Inc.
 
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...Samantha Bailey
 
Web 2.0 Application development with Ruby on Rails
Web 2.0 Application development with Ruby on RailsWeb 2.0 Application development with Ruby on Rails
Web 2.0 Application development with Ruby on RailsAmit Mathur
 
"Python web development combines the simplicity of the language with powerful...
"Python web development combines the simplicity of the language with powerful..."Python web development combines the simplicity of the language with powerful...
"Python web development combines the simplicity of the language with powerful...softwaretrainer2elys
 
React Native and the future of web technology (Mark Wilcox) - GreeceJS #15
React Native and the future of web technology (Mark Wilcox) - GreeceJS #15React Native and the future of web technology (Mark Wilcox) - GreeceJS #15
React Native and the future of web technology (Mark Wilcox) - GreeceJS #15GreeceJS
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologiesMarina Santini
 
Week 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. WuWeek 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. WuAppUniverz Org
 
MDN Development & Web Documentation
MDN Development & Web DocumentationMDN Development & Web Documentation
MDN Development & Web DocumentationJay Patel
 
Desktop apps with node webkit
Desktop apps with node webkitDesktop apps with node webkit
Desktop apps with node webkitPaul Jensen
 
Webpage & Multimedia Design- class01
Webpage & Multimedia Design- class01Webpage & Multimedia Design- class01
Webpage & Multimedia Design- class01hellosoon_world
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)AI4BD GmbH
 
Beyond web services: supporting mashup artists at Yahoo!
Beyond web services: supporting mashup artists at Yahoo!Beyond web services: supporting mashup artists at Yahoo!
Beyond web services: supporting mashup artists at Yahoo!Chad Dickerson
 

Semelhante a Class 39: ...and the World Wide Web (20)

Drupal for Webmasters by Brett Baker
Drupal for Webmasters by Brett BakerDrupal for Webmasters by Brett Baker
Drupal for Webmasters by Brett Baker
 
IWMW 1998: Dataweb: Three Worlds Colide
IWMW 1998: Dataweb: Three Worlds ColideIWMW 1998: Dataweb: Three Worlds Colide
IWMW 1998: Dataweb: Three Worlds Colide
 
Power to the Users (and Librarians)
Power to the Users (and Librarians)Power to the Users (and Librarians)
Power to the Users (and Librarians)
 
StackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinStackEngine Demo - Docker Austin
StackEngine Demo - Docker Austin
 
React.js at Cortex
React.js at CortexReact.js at Cortex
React.js at Cortex
 
DockerCon SF 2015: Ben Golub's Keynote Day 1
DockerCon SF 2015: Ben Golub's Keynote Day 1DockerCon SF 2015: Ben Golub's Keynote Day 1
DockerCon SF 2015: Ben Golub's Keynote Day 1
 
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...
Web 2.0: Beyond the Hype.” Usability Professionals Association, Minneapolis M...
 
Web 2.0 Application development with Ruby on Rails
Web 2.0 Application development with Ruby on RailsWeb 2.0 Application development with Ruby on Rails
Web 2.0 Application development with Ruby on Rails
 
"Python web development combines the simplicity of the language with powerful...
"Python web development combines the simplicity of the language with powerful..."Python web development combines the simplicity of the language with powerful...
"Python web development combines the simplicity of the language with powerful...
 
React Native and the future of web technology (Mark Wilcox) - GreeceJS #15
React Native and the future of web technology (Mark Wilcox) - GreeceJS #15React Native and the future of web technology (Mark Wilcox) - GreeceJS #15
React Native and the future of web technology (Mark Wilcox) - GreeceJS #15
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologies
 
Rutgers - History Intranet
Rutgers - History IntranetRutgers - History Intranet
Rutgers - History Intranet
 
Week 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. WuWeek 05 Web, App and Javascript_Brandon, S.H. Wu
Week 05 Web, App and Javascript_Brandon, S.H. Wu
 
Web 3.0
Web 3.0Web 3.0
Web 3.0
 
Cet
CetCet
Cet
 
MDN Development & Web Documentation
MDN Development & Web DocumentationMDN Development & Web Documentation
MDN Development & Web Documentation
 
Desktop apps with node webkit
Desktop apps with node webkitDesktop apps with node webkit
Desktop apps with node webkit
 
Webpage & Multimedia Design- class01
Webpage & Multimedia Design- class01Webpage & Multimedia Design- class01
Webpage & Multimedia Design- class01
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
 
Beyond web services: supporting mashup artists at Yahoo!
Beyond web services: supporting mashup artists at Yahoo!Beyond web services: supporting mashup artists at Yahoo!
Beyond web services: supporting mashup artists at Yahoo!
 

Mais de David Evans

Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!David Evans
 
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksTrick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksDavid Evans
 
Hidden Services, Zero Knowledge
Hidden Services, Zero KnowledgeHidden Services, Zero Knowledge
Hidden Services, Zero KnowledgeDavid Evans
 
Anonymity in Bitcoin
Anonymity in BitcoinAnonymity in Bitcoin
Anonymity in BitcoinDavid Evans
 
Midterm Confirmations
Midterm ConfirmationsMidterm Confirmations
Midterm ConfirmationsDavid Evans
 
Scripting Transactions
Scripting TransactionsScripting Transactions
Scripting TransactionsDavid Evans
 
How to Live in Paradise
How to Live in ParadiseHow to Live in Paradise
How to Live in ParadiseDavid Evans
 
Mining Economics
Mining EconomicsMining Economics
Mining EconomicsDavid Evans
 
Becoming More Paranoid
Becoming More ParanoidBecoming More Paranoid
Becoming More ParanoidDavid Evans
 
Asymmetric Key Signatures
Asymmetric Key SignaturesAsymmetric Key Signatures
Asymmetric Key SignaturesDavid Evans
 
Introduction to Cryptography
Introduction to CryptographyIntroduction to Cryptography
Introduction to CryptographyDavid Evans
 
Class 1: What is Money?
Class 1: What is Money?Class 1: What is Money?
Class 1: What is Money?David Evans
 
Multi-Party Computation for the Masses
Multi-Party Computation for the MassesMulti-Party Computation for the Masses
Multi-Party Computation for the MassesDavid Evans
 
Proof of Reserve
Proof of ReserveProof of Reserve
Proof of ReserveDavid Evans
 
Blooming Sidechains!
Blooming Sidechains!Blooming Sidechains!
Blooming Sidechains!David Evans
 
Useful Proofs of Work, Permacoin
Useful Proofs of Work, PermacoinUseful Proofs of Work, Permacoin
Useful Proofs of Work, PermacoinDavid Evans
 

Mais de David Evans (20)

Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!
 
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksTrick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
 
Hidden Services, Zero Knowledge
Hidden Services, Zero KnowledgeHidden Services, Zero Knowledge
Hidden Services, Zero Knowledge
 
Anonymity in Bitcoin
Anonymity in BitcoinAnonymity in Bitcoin
Anonymity in Bitcoin
 
Midterm Confirmations
Midterm ConfirmationsMidterm Confirmations
Midterm Confirmations
 
Scripting Transactions
Scripting TransactionsScripting Transactions
Scripting Transactions
 
How to Live in Paradise
How to Live in ParadiseHow to Live in Paradise
How to Live in Paradise
 
Bitcoin Script
Bitcoin ScriptBitcoin Script
Bitcoin Script
 
Mining Economics
Mining EconomicsMining Economics
Mining Economics
 
Mining
MiningMining
Mining
 
The Blockchain
The BlockchainThe Blockchain
The Blockchain
 
Becoming More Paranoid
Becoming More ParanoidBecoming More Paranoid
Becoming More Paranoid
 
Asymmetric Key Signatures
Asymmetric Key SignaturesAsymmetric Key Signatures
Asymmetric Key Signatures
 
Introduction to Cryptography
Introduction to CryptographyIntroduction to Cryptography
Introduction to Cryptography
 
Class 1: What is Money?
Class 1: What is Money?Class 1: What is Money?
Class 1: What is Money?
 
Multi-Party Computation for the Masses
Multi-Party Computation for the MassesMulti-Party Computation for the Masses
Multi-Party Computation for the Masses
 
Proof of Reserve
Proof of ReserveProof of Reserve
Proof of Reserve
 
Silk Road
Silk RoadSilk Road
Silk Road
 
Blooming Sidechains!
Blooming Sidechains!Blooming Sidechains!
Blooming Sidechains!
 
Useful Proofs of Work, Permacoin
Useful Proofs of Work, PermacoinUseful Proofs of Work, Permacoin
Useful Proofs of Work, Permacoin
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Class 39: ...and the World Wide Web

  • 1. Lecture 39: …and the World Wide Web cs1120 Fall 2011 David Evans http://www.cs.virginia.edu/evans
  • 2. Announcements Exam 2 due 61 seconds ago! 70 69 68 67 66 65 64 63 62 60 Friday: we will return graded Exam 2, along with guidance about the Final Must be present (or email me in advance) to win! If you want to present your PS8 in class Monday, remember to email me! 2
  • 3. Plan The World Wide Web Building Web Applications How Google Works (or, going back to pre-PS5 to make things really fast again!) cs1120 recap in one (heavily animated) slide! 3
  • 5. The “Desk Wide Web” Memex Machine Vannevar Bush, As We May Think, LIFE, 1945
  • 6. WorldWideWeb Sir Tim Berners-Lee First web server and client, 1990 CERN (Switzerland) (This picture, 1993) MIT
  • 7. Overview: Many of the discussions of the future at CERN and the LHC era end with the question – “Yes, but how will we ever keep track of such a large project?” This proposal provides an answer to such questions. Firstly, it discusses the problem of information access at CERN. Then, it introduces the idea of linked information systems, and compares them with less flexible ways of finding information. http://www.w3.org/History/1989/proposal-msw.html
  • 9. 9
  • 10. WorldWideWeb Established a common language for sharing information on computers Lots of previous attempts (Gopher, WAIS, Archie, Xanadu, etc.) failed 10
  • 11. Why the World Wide Web? World Wide Web succeeded because it was simple! Didn’t attempt to maintain links, just a common way to name things Uniform Resource Locators (URL) http://www.cs.virginia.edu/cs1120/index.html Service Hostname File Path HyperText Transfer Protocol
  • 12. HyperText Transfer Protocol Server GET /cs1120/index.html HTTP/1.0 <html> <head> Contents … of file Client (Browser) HTML HyperText Markup Language
  • 13. HTML: HyperText Markup Language Language for controlling display of web pages Uses formatting tags: between < and > Document ::= <html> Header Body </html> Header ::= <head> HeadElements </head> HeadElements ::= HeadElement HeadElements HeadElements ::= ε | <title> Element </title> Body ::= <body> Elements </body> Elements ::= ε | Element Elements Element ::= <p> Element </p> Element ::= <center> Element </center> …
  • 14. Popular Web Site: Strategy 1 Static, Authored Web Site Drawbacks: •Have to do all the work yourself •The world may already have enough Twinkie-experiment websites Content Producer http://www.twinkiesproject.com/
  • 15. Popular Web Site: Strategy 2 Dynamic Web Applications Attracts users Seed content and function Web Programmer Produce more content eBay in 1997 http://web.archive.org/web/19970614001443/http://www.ebay.com/
  • 16. Popular Web Site: Strategy 2 Dynamic Web Applications Attracts users Seed content and function Advantages: • Users do most of the work • If you’re lucky, they might even pay you for the privilege! Disadvantages: • Lose control over the content (you might Produce more get sued for things your users do) content reddit.com today • Have to know how to program a web application reddit.com in 2005
  • 17. Dynamic Web Sites Programs that run on the web server Can be written in any language (often in Python or Java), just need a way to connect the web server to the program Program generates HTML (often JavaScript also now) Every useful web site does this Programs that run on the client’s machine Java, JavaScript (aka, “Scheme for the Web”), Flash, etc.: language must be supported by the client’s browser Responsive interface: limited round-trips to server
  • 19. 19
  • 20. Building a Web Search Engine Database of web pages Crawling the web collecting pages and links Indexing them efficiently Responding to Searches Spell checking – edit distance How to find documents that match a query How to rank the “best” documents
  • 21. Crawling Crawler activeURLs = * “www.yahoo.com” + while (len(activeURLs) > 0) : newURLs = [ ] for URL in activeURLs: page = downloadPage (URL) newURLs += extractLinks (page) activeURLs = newURLs Problems: Will keep revisiting the same pages Will take very long to get a good view of the web Will annoy web server admins downloadPage and extractLinks must be very robust
  • 22. Building a Web Search Engine Database of web pages Crawling the web collecting pages and links Indexing them efficiently Responding to Searches How to find documents that match a query How to rank the “best” documents
  • 23. Building an Index What if we just stored all the pages? Answering a query would be (size of the database) (need to look at all characters in database) Google: about 40 Billion pages (1 Trillion URLs, but number actually indexed is a closely kept corporate secret) * 60 KB (average web page size) = ~2.4 Quadrillion bytes to search! Linear is not nearly good enough when n is Quadrillions
  • 24. Hash Table Index Key-Value Pairs 0 , <“Colleen”, ? >, <“virginia”, ? >, … - 1 , <“Bob”, ? >, … - 2 3 … [about a million bins?] def lookup(key, table) : searchEntries(table[H(key, len(table))]) Finding a good H is difficult You can download google’s from http://code.google.com/p/google-sparsehash/
  • 25. Google’s Lexicon 1998: 14 million words (billions today?) Lookup word in H(word, nbins): maps to WordID Key Words 0 *<“aardvark”, 1024235>, ... + 1 *<“aaa”, 224155>, ..., <“zzz”, 29543> + ... ... nbins – 1 *<“abba”, 25583>, ..., <“zeit”, 50395> +
  • 26. Google’s Reverse Index (Based on 1998 paper…definitely changed some since then, but now they are secretive!) WordId ndocs pointer 00000000 3 00000001 15 ... “Inverted Barrels”: 16777215 105 41 GB (1998) Today: many TB? Lexicon: 293 MB (1998) Today: many GB?
  • 27. Inverted Barrels docid (27 bits) nhits (5 bits) hits (16 bits each) plain hit: capitalized: 1 bit 7630486927 23 font size: 3 bits position: 12 bits ... first 4095 chars, everything else extra info for anchors, titles (less position bits) Suggested experiment for winter break: is the position field still only 12 bits?
  • 28. Building a Web Search Engine Database of web pages Crawling the web collecting pages and links Indexing them efficiently Responding to Searches Spell checking – edit distance How to find documents that match a query How to rank the “best” documents
  • 29. Finding the “Best” Documents Humans rate them “Jerry and David’s Guide to the World Wide Web” (became Yahoo!) Machines rate them Count number of occurrences of keyword Easy for sites to rig this Machine language understanding not good enough Business Model Whoever pays you the most is listed first
  • 30. PageRank If a site is important and interesting, other sites will link to it. Don’t ever take <a href=http://www.cs.virginia.edu/cs1120>cs1120</a>! But…not all links are equal: if a lot of highly-ranked sites link to this site, this site should be highly-ranked. 30
  • 31. PageRank def pageRank (u): rank = 0 for b in linksToPage (u) rank = rank + PageRank (b) / Links (b) return rank Would this work?
  • 32. Converging PageRank Ranks of all pages depend on ranks of all other pages Keep recalculating ranks until they converge def CalculatePageRanks (urls): initially, every rank is 1 for as many times as necessary calculate a new rank for each page (using old ranks) replace the old ranks with the new ranks How do initial ranks effect results? How many iterations are necessary?
  • 33. PageRank: 1998 Crawlable web (1998): 150 million pages, 1.7 Billion links Database of 322 million links Converges in about 50 iterations Initialization matters All pages = 1: very democratic, models browser equally likely to start on random page www.yahoo.com = 1, ..., all others = 0 More like what Google probably uses
  • 34. Do we have a search engine? Theoretician: Sure! Ali G: No way! It’ll blow up. Google’s First Server 34
  • 35. How do we make our service fast enough to index the whole web and serve billions of requests? 35
  • 36. Counting Word Occurrences “When in the Course of human events, it * <“When”, 1>, becomes necessary for one people to dissolve <“in”, 1>, the political bands which have connected them <“the”, 2> with another, …” …+ “We the People of the United States, in Order * <“We”, 1>, to form a more perfect Union, establish Justice, <“in”, 1>, insure domestic Tranquility, provide for the …” <“the”, 2> …+ map(doc, countWords) If we have enough machines, can we do this fast for the whole web? 36
  • 37. * <“When”, 1>, <“in”, 1>, <“the”, 2> …+ reduce * <“We”, 1>, <“in”, 1>, * <“We”, 1>, <“the”, 2> <“in”, 2>, * <“a”, 5>, …+ …+ reduce <“in”, 6>, * <“a”, 5>, …+ <“in”, 3>, <“the”, 2> …+ reduce * <“apple”, 1>, <“in”, 1>, <“the”, 7> * <“a”, 5>, …+ <“in”, 4>, …+
  • 38. MapReduce 38
  • 39. Key to Massive Parallel Execution Get rid of state and mutation! 39
  • 40. (define (count-matches p b) Functional Programming (list-sum (map (lambda (v) (if (eq? v b) 1 0)) p))) (PS 1-4) def meval(expr, env): Interpreters … return evalApplication(expr, env) ... # 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 # ... Any Mechanical 1 3 Turing Machine 2 Computation A B C R1 R0 (or a b) 0 0 0 0 0 (not (and (not a) 0 0 1 0 1 Any Discrete Function (not b))) … … … … … AND NOT Mechanical Logic “Magic” Transistors 40
  • 41. (define (count-matches p b) Functional Programming (list-sum (map (lambda (v) (if (eq? v b) 1 0)) p))) (PS 1-4) def meval(expr, env): Interpreters … return evalApplication(expr, env) ... # 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 # ... Any Mechanical 1 3 Turing Machine 2 Computation A B C R1 R0 (or a b) 0 0 0 0 0 (not (and (not a) 0 0 1 0 1 Any Discrete Function (not b))) … … … … … AND NOT Mechanical Logic “Magic” Transistors
  • 42. SimObject PhysicalObject Objects Place MobileObject m1: State and Mutation 1 2 3 (define (count-matches p b) Functional Programming (list-sum (map (lambda (v) (if (eq? v b) 1 0)) p))) (PS 1-4) def meval(expr, env): Interpreters … return evalApplication(expr, env) ... # 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 # ... Any Mechanical 1 3 Turing Machine 2 Computation A B C R1 R0 (or a b)
  • 43. SimObject PhysicalObject Objects Place MobileObject m1: State and Mutation 1 2 3 (define (count-matches p b) Functional Programming (list-sum (map (lambda (v) (if (eq? v b) 1 0)) p))) (PS 1-4) def meval(expr, env): Interpreters … return evalApplication(expr, env) ... # 1 0 1 1 0 1 1 1 0 1 1 0 1 1 1 # ... Any Mechanical 1 3 Turing Machine 2 Computation A B C R1 R0 (or a b)
  • 44. Objects Recursive Definitions State and Mutation Functional Programming Charge (PS 1-4) Universality Abstraction Now, you know Interpreters almost everything you need to build the Any Mechanical Computation next reddit or google! Any Discrete Function Mechanical Logic “Magic” Transistors