SlideShare uma empresa Scribd logo
1 de 27
AltaVista  Indexing and Search Engine - By Mike Burrows - Recreated by Changshu Liu [ http://xcybercloud.blogspot.com ]
Goals ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Non-Goals ,[object Object],[object Object],[object Object],[object Object]
Structure of Inverted Files ,[object Object],[object Object],[object Object],[object Object]
Documents ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inverted File Format ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parsing a Delta ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Index Stream Reader (ISRs) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ISR Implementations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ISR And—constraint solver ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solver Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Some Metrics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Breakdown of user CPU time ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Postmortem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
AltaVista  Site Architecture - By Mike Burrows - Recreated by Changshu Liu [ http://xcybercloud.blogspot.com ]
Structure of the Site ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Broad Routers 0 Broad Routers 1 FDDI  Router Front End 0 Front End 1 Front End N-1 FDDI  Router Front End N
Handling Failures ,[object Object],[object Object],[object Object],[object Object],[object Object]
RAID ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
File System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Back-Ends ,[object Object],[object Object],[object Object],[object Object],[object Object]
Front-Ends ,[object Object],[object Object],[object Object],[object Object],[object Object]
HTTP Server ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Load Balance ,[object Object],[object Object],[object Object],[object Object]
Overload Handling ,[object Object],[object Object],[object Object]
Reference  ,[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and BioawkHoffman Lab
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Databricks
 
Names and virtual host discovery
Names and virtual host discoveryNames and virtual host discovery
Names and virtual host discoveryjekil
 
Data Federation with Apache Spark
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache SparkDataWorks Summit
 
Linux System Administration - DNS
Linux System Administration - DNSLinux System Administration - DNS
Linux System Administration - DNSSreenatha Reddy K R
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Prakash Pimpale
 
Large scale nlp using python's nltk on azure
Large scale nlp using python's nltk on azureLarge scale nlp using python's nltk on azure
Large scale nlp using python's nltk on azurecloudbeatsch
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud appsDavid Cunningham
 
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt... Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...Databricks
 
Data Storage Formats in Hadoop
Data Storage Formats in HadoopData Storage Formats in Hadoop
Data Storage Formats in HadoopBotond Balázs
 
Practical unix utilities for text processing
Practical unix utilities for text processingPractical unix utilities for text processing
Practical unix utilities for text processingAnton Arhipov
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...source{d}
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsWorkhorse Computing
 
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Databricks
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for SolrToke Eskildsen
 
Range reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernelRange reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernelDavidlohr Bueso
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesJonathan Katz
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 

Mais procurados (20)

Awk primer and Bioawk
Awk primer and BioawkAwk primer and Bioawk
Awk primer and Bioawk
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
 
Names and virtual host discovery
Names and virtual host discoveryNames and virtual host discovery
Names and virtual host discovery
 
Data Federation with Apache Spark
Data Federation with Apache SparkData Federation with Apache Spark
Data Federation with Apache Spark
 
Linux System Administration - DNS
Linux System Administration - DNSLinux System Administration - DNS
Linux System Administration - DNS
 
Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics Natural Language Toolkit (NLTK), Basics
Natural Language Toolkit (NLTK), Basics
 
Large scale nlp using python's nltk on azure
Large scale nlp using python's nltk on azureLarge scale nlp using python's nltk on azure
Large scale nlp using python's nltk on azure
 
5 technical-dns-workshop-day3
5 technical-dns-workshop-day35 technical-dns-workshop-day3
5 technical-dns-workshop-day3
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud apps
 
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt... Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 
Data Storage Formats in Hadoop
Data Storage Formats in HadoopData Storage Formats in Hadoop
Data Storage Formats in Hadoop
 
Practical unix utilities for text processing
Practical unix utilities for text processingPractical unix utilities for text processing
Practical unix utilities for text processing
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for Solr
 
Range reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernelRange reader/writer locking for the Linux kernel
Range reader/writer locking for the Linux kernel
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
Apache Spark Workshop
Apache Spark WorkshopApache Spark Workshop
Apache Spark Workshop
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 

Destaque

TEOMA Plan Creaccion Internacional
TEOMA Plan Creaccion InternacionalTEOMA Plan Creaccion Internacional
TEOMA Plan Creaccion InternacionalTeoma MLM
 
The search engine index
The search engine indexThe search engine index
The search engine indexCJ Jenkins
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines PresentationJSCHO9
 
Indexing in Search Engine
Indexing in Search EngineIndexing in Search Engine
Indexing in Search EngineShikha Gupta
 

Destaque (7)

TEOMA Plan Creaccion Internacional
TEOMA Plan Creaccion InternacionalTEOMA Plan Creaccion Internacional
TEOMA Plan Creaccion Internacional
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
The search engine index
The search engine indexThe search engine index
The search engine index
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Search Engines Presentation
Search Engines PresentationSearch Engines Presentation
Search Engines Presentation
 
Web Crawler
Web CrawlerWeb Crawler
Web Crawler
 
Indexing in Search Engine
Indexing in Search EngineIndexing in Search Engine
Indexing in Search Engine
 

Semelhante a Alta vista indexing and search engine

Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordJAXLondon_Conference
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Scaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customersScaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customersSpeck&Tech
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data eraBill GU
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory OptimizationWei Lin
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimizationguest3eed30
 
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Ontico
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Hypertable
HypertableHypertable
Hypertablebetaisao
 

Semelhante a Alta vista indexing and search engine (20)

Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Intuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin StopfordIntuitions for scaling data centric architectures - Benjamin Stopford
Intuitions for scaling data centric architectures - Benjamin Stopford
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Scaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customersScaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customers
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data era
 
IO Dubi Lebel
IO Dubi LebelIO Dubi Lebel
IO Dubi Lebel
 
04 Cache Memory
04  Cache  Memory04  Cache  Memory
04 Cache Memory
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Memory Optimization
Memory OptimizationMemory Optimization
Memory Optimization
 
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Hypertable Nosql
Hypertable NosqlHypertable Nosql
Hypertable Nosql
 
Hypertable
HypertableHypertable
Hypertable
 

Alta vista indexing and search engine

  • 1. AltaVista Indexing and Search Engine - By Mike Burrows - Recreated by Changshu Liu [ http://xcybercloud.blogspot.com ]
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. AltaVista Site Architecture - By Mike Burrows - Recreated by Changshu Liu [ http://xcybercloud.blogspot.com ]
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.

Notas do Editor

  1. Conditional brunch instructions. For 1/2/3 bytes delta value. Common case is 2-bytes, but only 2/3 of the whole case. Seeking is popular operation
  2. To speed up the algorithm, you should choose the constrain that will move your ISR farthest. Make query logic’s correctness very trivial to verify.