SlideShare uma empresa Scribd logo
1 de 77
Baixar para ler offline
Challenges in Building Large-Scale Information
              Retrieval Systems


                  Jeff Dean
                Google Fellow
              jeff@google.com
Why Work on Retrieval Systems?
• Challenging blend of science and engineering
  – Many interesting, unsolved problems
  – Spans many areas of CS:
     • architecture, distributed systems, algorithms, compression,
       information retrieval, machine learning, UI, etc.
  – Scale far larger than most other systems


• Small teams can create systems used by
  hundreds of millions
Retrieval System Dimensions
• Must balance engineering tradeoffs between:
   –   number of documents indexed
   –   queries / sec
   –   index freshness/update rate
   –   query latency
   –   information kept about each document
   –   complexity/cost of scoring/retrieval algorithms

• Engineering difficulty roughly equal to the
  product of these parameters

• All of these affect overall performance, and
  performance per $
1999 vs. 2009
•   # docs: ~70M to many billion
•   queries processed/day:
•   per doc info in index:
•   update latency: months to minutes
•   avg. query latency: <1s to <0.2s

• More machines * faster machines:
1999 vs. 2009
•   # docs: ~70M to many billion        ~100X
•   queries processed/day:
•   per doc info in index:
•   update latency: months to minutes
•   avg. query latency: <1s to <0.2s

• More machines * faster machines:
1999 vs. 2009
•   # docs: ~70M to many billion        ~100X
•   queries processed/day:              ~1000X
•   per doc info in index:
•   update latency: months to minutes
•   avg. query latency: <1s to <0.2s

• More machines * faster machines:
1999 vs. 2009
•   # docs: ~70M to many billion        ~100X
•   queries processed/day:              ~1000X
•   per doc info in index:                ~3X
•   update latency: months to minutes
•   avg. query latency: <1s to <0.2s

• More machines * faster machines:
1999 vs. 2009
•   # docs: ~70M to many billion         ~100X
•   queries processed/day:              ~1000X
•   per doc info in index:                 ~3X
•   update latency: months to minutes   ~10000X
•   avg. query latency: <1s to <0.2s

• More machines * faster machines:
1999 vs. 2009
•   # docs: ~70M to many billion         ~100X
•   queries processed/day:              ~1000X
•   per doc info in index:                 ~3X
•   update latency: months to minutes   ~10000X
•   avg. query latency: <1s to <0.2s       ~5X


• More machines * faster machines:
1999 vs. 2009
•   # docs: ~70M to many billion         ~100X
•   queries processed/day:              ~1000X
•   per doc info in index:                 ~3X
•   update latency: months to minutes   ~10000X
•   avg. query latency: <1s to <0.2s       ~5X


• More machines * faster machines:      ~1000X
Constant Change
• Parameters change over time
  – often by many orders of magnitude


• Right design at X may be very wrong at 10X or 100X
  – ... design for ~10X growth, but plan to rewrite before ~100X


• Continuous evolution:
  – 7 significant revisions in last 10 years
  – often rolled out without users realizing we’ve made
    major changes
Rest of Talk
• Evolution of Google’s search systems
  – several gens of crawling/indexing/serving systems
  – brief description of supporting infrastructure

  – Joint work with many, many people

• Interesting directions and challenges
“Google” Circa 1997 (google.stanford.edu)
Research Project, circa 1997
Ways of Index Partitioning
• By doc: each shard has index for subset of docs
  –   pro: each shard can process queries independently
  –   pro: easy to keep additional per-doc information
  –   pro: network traffic (requests/responses) small
  –   con: query has to be processed by each shard
  –   con: O(K*N) disk seeks for K word query on N shards
Ways of Index Partitioning
• By doc: each shard has index for subset of docs
   –   pro: each shard can process queries independently
   –   pro: easy to keep additional per-doc information
   –   pro: network traffic (requests/responses) small
   –   con: query has to be processed by each shard
   –   con: O(K*N) disk seeks for K word query on N shards

• By word: shard has subset of words for all docs
   – pro: K word query => handled by at most K shards
   – pro: O(K) disk seeks for K word query
   – con: much higher network bandwidth needed
        • data about each word for each matching doc must be collected in
          one place
   – con: harder to have per-doc information
Ways of Index Partitioning




In our computing environment, by doc makes more sense
Basic Principles
• Documents assigned small integer ids (docids)
  – good if smaller for higher quality/more important docs
• Index Servers:
  –   given (query) return sorted list of (score, docid, ...)
  –   partitioned (“sharded”) by docid
  –   index shards are replicated for capacity
  –   cost is O(# queries * # docs in index)

• Doc Servers
  –   given (docid, query) generate (title, snippet)
  –   map from docid to full text of docs on disk
  –   also partitioned by docid
  –   cost is O(# queries)
“Corkboards” (1999)
Serving System, circa 1999
Caching
• Cache servers:
  – cache both index results and doc snippets
  – hit rates typically 30-60%
     • depends on frequency of index updates, mix of query traffic,
       level of personalization, etc

• Main benefits:
  – performance! 10s of machines do work of 100s or 1000s
  – reduce query latency on hits
     • queries that hit in cache tend to be both popular and
       expensive (common words, lots of documents to score, etc.)

• Beware: big latency spike/capacity drop when
  index updated or cache flushed
Crawling (circa 1998-1999)
• Simple batch crawling system
  –   start with a few URLs
  –   crawl pages
  –   extract links, add to queue
  –   stop when you have enough pages

• Concerns:
  – don’t hit any site too hard
  – prioritizing among uncrawled pages
       • one way: continuously compute PageRank on changing graph
  – maintaining uncrawled URL queue efficiently
       • one way: keep in a partitioned set of servers
  – dealing with machine failures
Indexing (circa 1998-1999)
• Simple batch indexing system
  – Based on simple unix tools
  – No real checkpointing, so machine failures painful
  – No checksumming of raw data, so hardware bit errors
    caused problems
     • Exacerbated by early machines having no ECC, no parity
     • Sort 1 TB of data without parity: ends up "mostly sorted"
     • Sort it again: "mostly sorted" another way


• “Programming with adversarial memory”
  – Led us to develop a file abstraction that stored
    checksums of small records and could skip and
    resynchronize after corrupted records
Index Updates (circa 1998-1999)
• 1998-1999: Index updates (~once per month):
  –   Wait until traffic is low
  –   Take some replicas offline
  –   Copy new index to these replicas
  –   Start new frontends pointing at updated index and
      serve some traffic from there
Index Updates (circa 1998-1999)
• Index server disk:
  – outer part of disk gives higher disk bandwidth
Index Updates (circa 1998-1999)
• Index server disk:
  – outer part of disk gives higher disk bandwidth
Index Updates (circa 1998-1999)
• Index server disk:
  – outer part of disk gives higher disk bandwidth
Index Updates (circa 1998-1999)
• Index server disk:
  – outer part of disk gives higher disk bandwidth
Index Updates (circa 1998-1999)
• Index server disk:
  – outer part of disk gives higher disk bandwidth
Index Updates (circa 1998-1999)
• Index server disk:
  – outer part of disk gives higher disk bandwidth
Google Data Center (2000)
Google Data Center (2000)
Google Data Center (2000)
Google (new data center 2001)
Google Data Center (3 days later)
Increasing Index Size and Query Capacity
• Huge increases in index size in ’99, ’00, ’01, ...
   – From ~50M pages to more than 1000M pages

• At same time as huge traffic increases
   – ~20% growth per month in 1999, 2000, ...
   – ... plus major new partners (e.g. Yahoo in July 2000
     doubled traffic overnight)

• Performance of index servers was paramount
   – Deploying more machines continuously, but...
   – Needed ~10-30% software-based improvement every
     month
Dealing with Growth
Dealing with Growth
Dealing with Growth
Dealing with Growth
Dealing with Growth
Dealing with Growth
Dealing with Growth
Implications
• Shard response time influenced by:
  – # of disk seeks that must be done
  – amount of data to be read from disk


• Big performance improvements possible with:
  – better disk scheduling
  – improved index encoding
Index Encoding circa 1997-1999
• Original encoding (’97) was very simple:
WORD

   – hit: position plus attributes (font size, title, etc.)

   – Eventually added skip tables for large posting lists


• Simple, byte aligned format
   – cheap to decode, but not very compact
   – ... required lots of disk bandwidth
Encoding Techniques
• Bit-level encodings:
  – Unary: N ‘1’s followed by a ‘0’
  – Gamma: log2(N) in unary, then floor(log2(N)) bits
  – RiceK: floor(N / 2K) in unary, then N mod 2K in K bits
      • special case of Golomb codes where base is power of 2
  – Huffman-Int: like Gamma, except log2(N) is Huffman
    coded instead of encoded w/ Unary


• Byte-aligned encodings:
  – varint: 7 bits per byte with a continuation bit
      • 0-127: 1 byte, 128-4095: 2 bytes, ...
  – ...
Block-Based Index Format
• Block-based, variable-len format reduced both
  space and CPU
WORD




• Reduced index size by ~30%, plus much faster
  to decode
Implications of Ever-Wider Sharding

• Must add shards to keep response time low as
  index size increases

• ... but query cost increases with # of shards
  – typically >= 1 disk seek / shard / query term
  – even for very rare terms

• As # of replicas increases, total amount of
  memory available increases
  – Eventually, have enough memory to hold an entire
    copy of the index in memory
     • radically changes many design parameters
Early 2001: In-Memory Index
In-Memory Indexing Systems
• Many positives:
  – big increase in throughput
  – big decrease in latency
     • especially at the tail: expensive queries that previously
       needed GBs of disk I/O became much faster
               e.g. [ “circle of life” ]


• Some issues:
  – Variance: touch 1000s of machines, not dozens
     • e.g. randomized cron jobs caused us trouble for a while
  – Availability: 1 or few replicas of each doc’s index data
     • Queries of death that kill all the backends at once: very bad
     • Availability of index data when machine failed (esp for
       important docs): replicate important docs
Larger-Scale Computing
Current Machines


• In-house rack design
• PC-class motherboards
• Low-end storage and
  networking hardware
• Linux
• + in-house software
Serving Design, 2004 edition
                   Requests
   Cache servers
                     Root




                              …    Parent
                                   Servers

                               …
Repository
                     …
 Manager


                                   …         Leaf
                                             Servers

                                   …         Repository
                                             Shards
 File
 Loaders


                    GFS
New Index Format
• Block index format used two-level scheme:
   – Each hit was encoded as (docid, word position in doc) pair
   – Docid deltas encoded with Rice encoding
   – Very good compression (originally designed for disk-based
     indices), but slow/CPU-intensive to decode

• New format: single flat position space
   –   Data structures on side keep track of doc boundaries
   –   Posting lists are just lists of delta-encoded positions
   –   Need to be compact (can’t afford 32 bit value per occurrence)
   –   … but need to be very fast to decode
Byte-Aligned Variable-length Encodings
• Varint encoding:
  – 7 bits per byte with continuation bit
  – Con: Decoding requires lots of branches/shifts/masks
 0 0000001 0 0001111 1 1111111 00000011 1 1111111 11111111 0 0000111

     1        15            511                    131071
Byte-Aligned Variable-length Encodings
• Varint encoding:
   – 7 bits per byte with continuation bit
   – Con: Decoding requires lots of branches/shifts/masks
  0 0000001 0 0001111 1 1111111 00000011 1 1111111 11111111 0 0000111

      1        15             511                     131071


• Idea: Encode byte length as low 2 bits
   – Better: fewer branches, shifts, and masks
   – Con: Limited to 30-bit values, still some shifting to decode


  00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111

      1        15             511                      131071
Group Varint Encoding
• Idea: encode groups of 4 values in 5-17 bytes
   – Pull out 4 2-bit binary lengths into single byte prefix
Group Varint Encoding
• Idea: encode groups of 4 values in 5-17 bytes
    – Pull out 4 2-bit binary lengths into single byte prefix

00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111
Group Varint Encoding
• Idea: encode groups of 4 values in 5-17 bytes
    – Pull out 4 2-bit binary lengths into single byte prefix

00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111



00000110

  Tags
Group Varint Encoding
• Idea: encode groups of 4 values in 5-17 bytes
    – Pull out 4 2-bit binary lengths into single byte prefix

00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111



00000110 00000001 00001111 11111111 00000001 11111111 11111111 00000001

  Tags        1          15               511                   131071
Group Varint Encoding
• Idea: encode groups of 4 values in 5-17 bytes
    – Pull out 4 2-bit binary lengths into single byte prefix

00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111



00000110 00000001 00001111 11111111 00000001 11111111 11111111 00000001

  Tags        1          15               511                   131071

 • Decode: Load prefix byte and use value to lookup in 256-entry table:
Group Varint Encoding
• Idea: encode groups of 4 values in 5-17 bytes
    – Pull out 4 2-bit binary lengths into single byte prefix

00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111



00000110 00000001 00001111 11111111 00000001 11111111 11111111 00000001

  Tags         1         15               511                   131071

 • Decode: Load prefix byte and use value to lookup in 256-entry table:
         00000110        Offsets: +1,+2,+3,+5; Masks: ff, ff, ffff, ffffff
Group Varint Encoding
• Idea: encode groups of 4 values in 5-17 bytes
    – Pull out 4 2-bit binary lengths into single byte prefix

00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111



00000110 00000001 00001111 11111111 00000001 11111111 11111111 00000001

  Tags         1         15               511                   131071

 • Decode: Load prefix byte and use value to lookup in 256-entry table:
         00000110        Offsets: +1,+2,+3,+5; Masks: ff, ff, ffff, ffffff


 • Much faster than alternatives:
     – 7-bit-per-byte varint: decode ~180M numbers/second
     – 30-bit Varint w/ 2-bit length: decode ~240M numbers/second
     – Group varint: decode ~400M numbers/second
2007: Universal Search




      Indexing Service
Index that? Just a minute!
• Low-latency crawling and indexing is tough
  – crawl heuristics: what pages should be crawled?
  – crawling system: need to crawl pages quickly
  – indexing system: depends on global data
     • PageRank, anchor text of pages that point to the page, etc.
     • must have online approximations for these global properties
  – serving system: must be prepared to accept updates
    while serving requests
     • very different data structures than batch update serving
       system
Flexibility & Experimentation in IR Systems
• Ease of experimentation hugely important
  – faster turnaround => more exps => more improvement

• Some experiments are easy
  – e.g. just weight existing data differently


• Others are more difficult to perform: need data
  not present in production index
  – Must be easy to generate and incorporate new data
    and use it in experiments
Infrastructure for Search Systems
• Several key pieces of infrastructure:
   – GFS: large-scale distributed file system
   – MapReduce: makes it easy to write/run large scale jobs
      • generate production index data more quickly
      • perform ad-hoc experiments more rapidly
      • ...
   – BigTable: semi-structured storage system
      • online, efficient access to per-document information at any
        time
      • multiple processes can update per-doc info asynchronously
      • critical for updating documents in minutes instead of hours

     http://labs.google.com/papers/gfs.html
     http://labs.google.com/papers/mapreduce.html
     http://labs.google.com/papers/bigtable.html
Experimental Cycle, Part 1
• Start with new ranking idea

• Must be easy to run experiments, and to do so
  quickly:
   – Use tools like MapReduce, BigTable, to generate data...
   – Initially, run off-line experiment & examine effects
       • ...on human-rated query sets of various kinds
       • ...on random queries, to look at changes to existing ranking


   – Latency and throughput of this prototype don't matter


• ...iterate, based on results of experiments ...
Experimental Cycle, Part 2
• Once off-line experiments look promising, want
  to run live experiment
   – Experiment on tiny sliver of user traffic
   – Random sample, usually
      • but sometimes a sample of specific class of queries
          – e.g. English queries, or queries with place names, etc.



• For this, throughput not important, but latency matters
   – Experimental framework must operate at close to
     production latencies!
Experiment Looks Good: Now What?
• Launch!


• Performance tuning/reimplementation to make
  feasible at full load
   – e.g. precompute data rather than computing at runtime
   – e.g. approximate if "good enough" but much cheaper


• Rollout process important:
   – Continuously make quality vs. cost tradeoffs
   – Rapid rollouts at odds with low latency and site stability
      • Need good working relationships between search quality and
        groups chartered to make things fast and stable
Future Directions & Challenges
• A few closing thoughts on interesting directions...
Cross-Language Information Retrieval
• Translate all the world’s documents to all the
  world’s languages
  – increases index size substantially
  – computationally expensive
  – ... but huge benefits if done well


• Challenges:
  – continuously improving translation quality
  – large-scale systems work to deal with larger and more
    complex language models
     • to translate one sentence   ~1M lookups in multi-TB model
ACLs in Information Retrieval Systems
• Retrieval systems with mix of private, semi-
  private, widely shared and public documents
   – e.g. e-mail vs. shared doc among 10 people vs.
     messages in group with 100,000 members vs. public
     web pages

• Challenge: building retrieval systems that
  efficiently deal with ACLs that vary widely in size
   – best solution for doc shared with 10 people is different
     than for doc shared with the world
   – sharing patterns of a document might change over
     time
Automatic Construction of Efficient IR Systems

• Currently use several retrieval systems
   – e.g. one system for sub-second update latencies, one for
     very large # of documents but daily updates, ...
   – common interfaces, but very different implementations
     primarily for efficiency
   – works well, but lots of effort to build, maintain and extend
     different systems

• Challenge: can we have a single parameterizable
  system that automatically constructs efficient
  retrieval system based on these parameters?
Information Extraction from Semi-structured Data

• Data with clearly labelled semantic meaning is a tiny
  fraction of all the data in the world
• But there’s lots semi-structured data
   – books & web pages with tables, data behind forms, ...


• Challenge: algorithms/techniques for improved
  extraction of structured information from
  unstructured/semi-structured sources
   – noisy data, but lots of redundancy
   – want to be able to correlate/combine/aggregate info from
     different sources
In Conclusion...

• Designing and building large-scale retrieval systems
  is a challenging, fun endeavor
   – new problems require continuous evolution
   – work benefits many users
   – new retrieval techniques often require new
     systems



• Thanks for your attention!
Thanks! Questions...?
• Further reading:
Ghemawat, Gobioff, & Leung. Google File System, SOSP 2003.

Barroso, Dean, & Hölzle. Web Search for a Planet: The Google Cluster Architecture, IEEE
Micro, 2003.

Dean & Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004.

Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, & Gruber. Bigtable: A
Distributed Storage System for Structured Data, OSDI 2006.

Brants, Popat, Xu, Och, & Dean. Large Language Models in Machine Translation, EMNLP
2007.


• These and many more available at:
       http://labs.google.com/papers.html

Mais conteúdo relacionado

Mais procurados

Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Skew Mitigation For Facebook PetabyteScale Joins
Skew Mitigation For Facebook PetabyteScale JoinsSkew Mitigation For Facebook PetabyteScale Joins
Skew Mitigation For Facebook PetabyteScale JoinsDatabricks
 
presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_SystemBrett Keim
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersJulien Anguenot
 
SQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopSQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopJan Pieter Posthuma
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
Real-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at ilandReal-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at ilandJulien Anguenot
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCErik Krogen
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDBScalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDBAlluxio, Inc.
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0Heiko Loewe
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 

Mais procurados (20)

Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Skew Mitigation For Facebook PetabyteScale Joins
Skew Mitigation For Facebook PetabyteScale JoinsSkew Mitigation For Facebook PetabyteScale Joins
Skew Mitigation For Facebook PetabyteScale Joins
 
Hadoop - How It Works
Hadoop - How It WorksHadoop - How It Works
Hadoop - How It Works
 
presentation_Hadoop_File_System
presentation_Hadoop_File_Systempresentation_Hadoop_File_System
presentation_Hadoop_File_System
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
SQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopSQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - Hadoop
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Real-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at ilandReal-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at iland
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GCHadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
Hadoop Meetup Jan 2019 - Dynamometer and a Case Study in NameNode GC
 
Scalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDBScalable Filesystem Metadata Services with RocksDB
Scalable Filesystem Metadata Services with RocksDB
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
What's new in hadoop 3.0
What's new in hadoop 3.0What's new in hadoop 3.0
What's new in hadoop 3.0
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 

Semelhante a Jeff Dean: WSDM09 Keynote

Building Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons LearnedBuilding Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons Learnedparallellabs
 
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Software Engineering Advice from Google's Jeff Dean for Big, Distributed SystemsSoftware Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systemsadrianionel
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesAlexandra Sasha Blumenfeld
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptxdhivyak49
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBMongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best PracticesLewis Lin 🦊
 
Building Scalable Aggregation Systems
Building Scalable Aggregation SystemsBuilding Scalable Aggregation Systems
Building Scalable Aggregation SystemsJared Winick
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDaehyeok Kim
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
Memory: The New Disk
Memory: The New DiskMemory: The New Disk
Memory: The New DiskTim Lossen
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB ClusterMongoDB
 
InfiniFlux vs_RDBMS
InfiniFlux vs_RDBMSInfiniFlux vs_RDBMS
InfiniFlux vs_RDBMSInfiniFlux
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 

Semelhante a Jeff Dean: WSDM09 Keynote (20)

Building Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons LearnedBuilding Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons Learned
 
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Software Engineering Advice from Google's Jeff Dean for Big, Distributed SystemsSoftware Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
Software Engineering Advice from Google's Jeff Dean for Big, Distributed Systems
 
WW Historian 10
WW Historian 10WW Historian 10
WW Historian 10
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
try
trytry
try
 
Building Scalable Aggregation Systems
Building Scalable Aggregation SystemsBuilding Scalable Aggregation Systems
Building Scalable Aggregation Systems
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
Memory: The New Disk
Memory: The New DiskMemory: The New Disk
Memory: The New Disk
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
InfiniFlux vs_RDBMS
InfiniFlux vs_RDBMSInfiniFlux vs_RDBMS
InfiniFlux vs_RDBMS
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Jeff Dean: WSDM09 Keynote

  • 1. Challenges in Building Large-Scale Information Retrieval Systems Jeff Dean Google Fellow jeff@google.com
  • 2. Why Work on Retrieval Systems? • Challenging blend of science and engineering – Many interesting, unsolved problems – Spans many areas of CS: • architecture, distributed systems, algorithms, compression, information retrieval, machine learning, UI, etc. – Scale far larger than most other systems • Small teams can create systems used by hundreds of millions
  • 3. Retrieval System Dimensions • Must balance engineering tradeoffs between: – number of documents indexed – queries / sec – index freshness/update rate – query latency – information kept about each document – complexity/cost of scoring/retrieval algorithms • Engineering difficulty roughly equal to the product of these parameters • All of these affect overall performance, and performance per $
  • 4. 1999 vs. 2009 • # docs: ~70M to many billion • queries processed/day: • per doc info in index: • update latency: months to minutes • avg. query latency: <1s to <0.2s • More machines * faster machines:
  • 5. 1999 vs. 2009 • # docs: ~70M to many billion ~100X • queries processed/day: • per doc info in index: • update latency: months to minutes • avg. query latency: <1s to <0.2s • More machines * faster machines:
  • 6. 1999 vs. 2009 • # docs: ~70M to many billion ~100X • queries processed/day: ~1000X • per doc info in index: • update latency: months to minutes • avg. query latency: <1s to <0.2s • More machines * faster machines:
  • 7. 1999 vs. 2009 • # docs: ~70M to many billion ~100X • queries processed/day: ~1000X • per doc info in index: ~3X • update latency: months to minutes • avg. query latency: <1s to <0.2s • More machines * faster machines:
  • 8. 1999 vs. 2009 • # docs: ~70M to many billion ~100X • queries processed/day: ~1000X • per doc info in index: ~3X • update latency: months to minutes ~10000X • avg. query latency: <1s to <0.2s • More machines * faster machines:
  • 9. 1999 vs. 2009 • # docs: ~70M to many billion ~100X • queries processed/day: ~1000X • per doc info in index: ~3X • update latency: months to minutes ~10000X • avg. query latency: <1s to <0.2s ~5X • More machines * faster machines:
  • 10. 1999 vs. 2009 • # docs: ~70M to many billion ~100X • queries processed/day: ~1000X • per doc info in index: ~3X • update latency: months to minutes ~10000X • avg. query latency: <1s to <0.2s ~5X • More machines * faster machines: ~1000X
  • 11. Constant Change • Parameters change over time – often by many orders of magnitude • Right design at X may be very wrong at 10X or 100X – ... design for ~10X growth, but plan to rewrite before ~100X • Continuous evolution: – 7 significant revisions in last 10 years – often rolled out without users realizing we’ve made major changes
  • 12. Rest of Talk • Evolution of Google’s search systems – several gens of crawling/indexing/serving systems – brief description of supporting infrastructure – Joint work with many, many people • Interesting directions and challenges
  • 13. “Google” Circa 1997 (google.stanford.edu)
  • 15. Ways of Index Partitioning • By doc: each shard has index for subset of docs – pro: each shard can process queries independently – pro: easy to keep additional per-doc information – pro: network traffic (requests/responses) small – con: query has to be processed by each shard – con: O(K*N) disk seeks for K word query on N shards
  • 16. Ways of Index Partitioning • By doc: each shard has index for subset of docs – pro: each shard can process queries independently – pro: easy to keep additional per-doc information – pro: network traffic (requests/responses) small – con: query has to be processed by each shard – con: O(K*N) disk seeks for K word query on N shards • By word: shard has subset of words for all docs – pro: K word query => handled by at most K shards – pro: O(K) disk seeks for K word query – con: much higher network bandwidth needed • data about each word for each matching doc must be collected in one place – con: harder to have per-doc information
  • 17. Ways of Index Partitioning In our computing environment, by doc makes more sense
  • 18. Basic Principles • Documents assigned small integer ids (docids) – good if smaller for higher quality/more important docs • Index Servers: – given (query) return sorted list of (score, docid, ...) – partitioned (“sharded”) by docid – index shards are replicated for capacity – cost is O(# queries * # docs in index) • Doc Servers – given (docid, query) generate (title, snippet) – map from docid to full text of docs on disk – also partitioned by docid – cost is O(# queries)
  • 21. Caching • Cache servers: – cache both index results and doc snippets – hit rates typically 30-60% • depends on frequency of index updates, mix of query traffic, level of personalization, etc • Main benefits: – performance! 10s of machines do work of 100s or 1000s – reduce query latency on hits • queries that hit in cache tend to be both popular and expensive (common words, lots of documents to score, etc.) • Beware: big latency spike/capacity drop when index updated or cache flushed
  • 22. Crawling (circa 1998-1999) • Simple batch crawling system – start with a few URLs – crawl pages – extract links, add to queue – stop when you have enough pages • Concerns: – don’t hit any site too hard – prioritizing among uncrawled pages • one way: continuously compute PageRank on changing graph – maintaining uncrawled URL queue efficiently • one way: keep in a partitioned set of servers – dealing with machine failures
  • 23. Indexing (circa 1998-1999) • Simple batch indexing system – Based on simple unix tools – No real checkpointing, so machine failures painful – No checksumming of raw data, so hardware bit errors caused problems • Exacerbated by early machines having no ECC, no parity • Sort 1 TB of data without parity: ends up "mostly sorted" • Sort it again: "mostly sorted" another way • “Programming with adversarial memory” – Led us to develop a file abstraction that stored checksums of small records and could skip and resynchronize after corrupted records
  • 24. Index Updates (circa 1998-1999) • 1998-1999: Index updates (~once per month): – Wait until traffic is low – Take some replicas offline – Copy new index to these replicas – Start new frontends pointing at updated index and serve some traffic from there
  • 25. Index Updates (circa 1998-1999) • Index server disk: – outer part of disk gives higher disk bandwidth
  • 26. Index Updates (circa 1998-1999) • Index server disk: – outer part of disk gives higher disk bandwidth
  • 27. Index Updates (circa 1998-1999) • Index server disk: – outer part of disk gives higher disk bandwidth
  • 28. Index Updates (circa 1998-1999) • Index server disk: – outer part of disk gives higher disk bandwidth
  • 29. Index Updates (circa 1998-1999) • Index server disk: – outer part of disk gives higher disk bandwidth
  • 30. Index Updates (circa 1998-1999) • Index server disk: – outer part of disk gives higher disk bandwidth
  • 34. Google (new data center 2001)
  • 35. Google Data Center (3 days later)
  • 36. Increasing Index Size and Query Capacity • Huge increases in index size in ’99, ’00, ’01, ... – From ~50M pages to more than 1000M pages • At same time as huge traffic increases – ~20% growth per month in 1999, 2000, ... – ... plus major new partners (e.g. Yahoo in July 2000 doubled traffic overnight) • Performance of index servers was paramount – Deploying more machines continuously, but... – Needed ~10-30% software-based improvement every month
  • 44. Implications • Shard response time influenced by: – # of disk seeks that must be done – amount of data to be read from disk • Big performance improvements possible with: – better disk scheduling – improved index encoding
  • 45. Index Encoding circa 1997-1999 • Original encoding (’97) was very simple: WORD – hit: position plus attributes (font size, title, etc.) – Eventually added skip tables for large posting lists • Simple, byte aligned format – cheap to decode, but not very compact – ... required lots of disk bandwidth
  • 46. Encoding Techniques • Bit-level encodings: – Unary: N ‘1’s followed by a ‘0’ – Gamma: log2(N) in unary, then floor(log2(N)) bits – RiceK: floor(N / 2K) in unary, then N mod 2K in K bits • special case of Golomb codes where base is power of 2 – Huffman-Int: like Gamma, except log2(N) is Huffman coded instead of encoded w/ Unary • Byte-aligned encodings: – varint: 7 bits per byte with a continuation bit • 0-127: 1 byte, 128-4095: 2 bytes, ... – ...
  • 47. Block-Based Index Format • Block-based, variable-len format reduced both space and CPU WORD • Reduced index size by ~30%, plus much faster to decode
  • 48. Implications of Ever-Wider Sharding • Must add shards to keep response time low as index size increases • ... but query cost increases with # of shards – typically >= 1 disk seek / shard / query term – even for very rare terms • As # of replicas increases, total amount of memory available increases – Eventually, have enough memory to hold an entire copy of the index in memory • radically changes many design parameters
  • 50. In-Memory Indexing Systems • Many positives: – big increase in throughput – big decrease in latency • especially at the tail: expensive queries that previously needed GBs of disk I/O became much faster e.g. [ “circle of life” ] • Some issues: – Variance: touch 1000s of machines, not dozens • e.g. randomized cron jobs caused us trouble for a while – Availability: 1 or few replicas of each doc’s index data • Queries of death that kill all the backends at once: very bad • Availability of index data when machine failed (esp for important docs): replicate important docs
  • 52. Current Machines • In-house rack design • PC-class motherboards • Low-end storage and networking hardware • Linux • + in-house software
  • 53. Serving Design, 2004 edition Requests Cache servers Root … Parent Servers … Repository … Manager … Leaf Servers … Repository Shards File Loaders GFS
  • 54. New Index Format • Block index format used two-level scheme: – Each hit was encoded as (docid, word position in doc) pair – Docid deltas encoded with Rice encoding – Very good compression (originally designed for disk-based indices), but slow/CPU-intensive to decode • New format: single flat position space – Data structures on side keep track of doc boundaries – Posting lists are just lists of delta-encoded positions – Need to be compact (can’t afford 32 bit value per occurrence) – … but need to be very fast to decode
  • 55. Byte-Aligned Variable-length Encodings • Varint encoding: – 7 bits per byte with continuation bit – Con: Decoding requires lots of branches/shifts/masks 0 0000001 0 0001111 1 1111111 00000011 1 1111111 11111111 0 0000111 1 15 511 131071
  • 56. Byte-Aligned Variable-length Encodings • Varint encoding: – 7 bits per byte with continuation bit – Con: Decoding requires lots of branches/shifts/masks 0 0000001 0 0001111 1 1111111 00000011 1 1111111 11111111 0 0000111 1 15 511 131071 • Idea: Encode byte length as low 2 bits – Better: fewer branches, shifts, and masks – Con: Limited to 30-bit values, still some shifting to decode 00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111 1 15 511 131071
  • 57. Group Varint Encoding • Idea: encode groups of 4 values in 5-17 bytes – Pull out 4 2-bit binary lengths into single byte prefix
  • 58. Group Varint Encoding • Idea: encode groups of 4 values in 5-17 bytes – Pull out 4 2-bit binary lengths into single byte prefix 00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111
  • 59. Group Varint Encoding • Idea: encode groups of 4 values in 5-17 bytes – Pull out 4 2-bit binary lengths into single byte prefix 00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111 00000110 Tags
  • 60. Group Varint Encoding • Idea: encode groups of 4 values in 5-17 bytes – Pull out 4 2-bit binary lengths into single byte prefix 00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111 00000110 00000001 00001111 11111111 00000001 11111111 11111111 00000001 Tags 1 15 511 131071
  • 61. Group Varint Encoding • Idea: encode groups of 4 values in 5-17 bytes – Pull out 4 2-bit binary lengths into single byte prefix 00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111 00000110 00000001 00001111 11111111 00000001 11111111 11111111 00000001 Tags 1 15 511 131071 • Decode: Load prefix byte and use value to lookup in 256-entry table:
  • 62. Group Varint Encoding • Idea: encode groups of 4 values in 5-17 bytes – Pull out 4 2-bit binary lengths into single byte prefix 00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111 00000110 00000001 00001111 11111111 00000001 11111111 11111111 00000001 Tags 1 15 511 131071 • Decode: Load prefix byte and use value to lookup in 256-entry table: 00000110 Offsets: +1,+2,+3,+5; Masks: ff, ff, ffff, ffffff
  • 63. Group Varint Encoding • Idea: encode groups of 4 values in 5-17 bytes – Pull out 4 2-bit binary lengths into single byte prefix 00 000001 00 001111 01 111111 00000011 10 111111 11111111 00000111 00000110 00000001 00001111 11111111 00000001 11111111 11111111 00000001 Tags 1 15 511 131071 • Decode: Load prefix byte and use value to lookup in 256-entry table: 00000110 Offsets: +1,+2,+3,+5; Masks: ff, ff, ffff, ffffff • Much faster than alternatives: – 7-bit-per-byte varint: decode ~180M numbers/second – 30-bit Varint w/ 2-bit length: decode ~240M numbers/second – Group varint: decode ~400M numbers/second
  • 64. 2007: Universal Search Indexing Service
  • 65. Index that? Just a minute! • Low-latency crawling and indexing is tough – crawl heuristics: what pages should be crawled? – crawling system: need to crawl pages quickly – indexing system: depends on global data • PageRank, anchor text of pages that point to the page, etc. • must have online approximations for these global properties – serving system: must be prepared to accept updates while serving requests • very different data structures than batch update serving system
  • 66. Flexibility & Experimentation in IR Systems • Ease of experimentation hugely important – faster turnaround => more exps => more improvement • Some experiments are easy – e.g. just weight existing data differently • Others are more difficult to perform: need data not present in production index – Must be easy to generate and incorporate new data and use it in experiments
  • 67. Infrastructure for Search Systems • Several key pieces of infrastructure: – GFS: large-scale distributed file system – MapReduce: makes it easy to write/run large scale jobs • generate production index data more quickly • perform ad-hoc experiments more rapidly • ... – BigTable: semi-structured storage system • online, efficient access to per-document information at any time • multiple processes can update per-doc info asynchronously • critical for updating documents in minutes instead of hours http://labs.google.com/papers/gfs.html http://labs.google.com/papers/mapreduce.html http://labs.google.com/papers/bigtable.html
  • 68. Experimental Cycle, Part 1 • Start with new ranking idea • Must be easy to run experiments, and to do so quickly: – Use tools like MapReduce, BigTable, to generate data... – Initially, run off-line experiment & examine effects • ...on human-rated query sets of various kinds • ...on random queries, to look at changes to existing ranking – Latency and throughput of this prototype don't matter • ...iterate, based on results of experiments ...
  • 69. Experimental Cycle, Part 2 • Once off-line experiments look promising, want to run live experiment – Experiment on tiny sliver of user traffic – Random sample, usually • but sometimes a sample of specific class of queries – e.g. English queries, or queries with place names, etc. • For this, throughput not important, but latency matters – Experimental framework must operate at close to production latencies!
  • 70. Experiment Looks Good: Now What? • Launch! • Performance tuning/reimplementation to make feasible at full load – e.g. precompute data rather than computing at runtime – e.g. approximate if "good enough" but much cheaper • Rollout process important: – Continuously make quality vs. cost tradeoffs – Rapid rollouts at odds with low latency and site stability • Need good working relationships between search quality and groups chartered to make things fast and stable
  • 71. Future Directions & Challenges • A few closing thoughts on interesting directions...
  • 72. Cross-Language Information Retrieval • Translate all the world’s documents to all the world’s languages – increases index size substantially – computationally expensive – ... but huge benefits if done well • Challenges: – continuously improving translation quality – large-scale systems work to deal with larger and more complex language models • to translate one sentence ~1M lookups in multi-TB model
  • 73. ACLs in Information Retrieval Systems • Retrieval systems with mix of private, semi- private, widely shared and public documents – e.g. e-mail vs. shared doc among 10 people vs. messages in group with 100,000 members vs. public web pages • Challenge: building retrieval systems that efficiently deal with ACLs that vary widely in size – best solution for doc shared with 10 people is different than for doc shared with the world – sharing patterns of a document might change over time
  • 74. Automatic Construction of Efficient IR Systems • Currently use several retrieval systems – e.g. one system for sub-second update latencies, one for very large # of documents but daily updates, ... – common interfaces, but very different implementations primarily for efficiency – works well, but lots of effort to build, maintain and extend different systems • Challenge: can we have a single parameterizable system that automatically constructs efficient retrieval system based on these parameters?
  • 75. Information Extraction from Semi-structured Data • Data with clearly labelled semantic meaning is a tiny fraction of all the data in the world • But there’s lots semi-structured data – books & web pages with tables, data behind forms, ... • Challenge: algorithms/techniques for improved extraction of structured information from unstructured/semi-structured sources – noisy data, but lots of redundancy – want to be able to correlate/combine/aggregate info from different sources
  • 76. In Conclusion... • Designing and building large-scale retrieval systems is a challenging, fun endeavor – new problems require continuous evolution – work benefits many users – new retrieval techniques often require new systems • Thanks for your attention!
  • 77. Thanks! Questions...? • Further reading: Ghemawat, Gobioff, & Leung. Google File System, SOSP 2003. Barroso, Dean, & Hölzle. Web Search for a Planet: The Google Cluster Architecture, IEEE Micro, 2003. Dean & Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, OSDI 2004. Chang, Dean, Ghemawat, Hsieh, Wallach, Burrows, Chandra, Fikes, & Gruber. Bigtable: A Distributed Storage System for Structured Data, OSDI 2006. Brants, Popat, Xu, Och, & Dean. Large Language Models in Machine Translation, EMNLP 2007. • These and many more available at: http://labs.google.com/papers.html