SlideShare uma empresa Scribd logo
1 de 56
Baixar para ler offline
Solr@
Things I’m not going to
     talk about:
     A/B Testing
        i18n
Continuos Deployment
About
 Us
10+ Million Listings
     500 qps
Architecture
 Overview
Architecture Overview
Thrift
Architecture Overview
Thrift
      struct Listing {
         1: i64 listing_id
     }

     struct ListingResults {
         1: i64 count,
         2: list<Listing> listings
     }

     service Search {
         ListingResults search(1:string query)
     }
Architecture Overview
Thrift
Generated Java server code:
 public class Search {

   public interface Iface {

     public ListingResults search(String query) throws TException;

     }


Generated PHP client code:
 class SearchClient implements SearchIf {

    /**...**/
    public function search($query)
    {
      $this->send_search($query);
      return $this->recv_search();
    }
Architecture Overview
Thrift
Why use Thrift?
    • Service Encapsulation
    • Reduced Network Traffic
Architecture Overview
Thrift
Why only return IDs?
    • Index Size
    • Easy to scale PK lookups
The Search Server
Architecture Overview
Search Server


 • Identical Code + Hardware
 • Roles/Behavior controlled by Env variables
 • Single Java Process
 • Solr running as a Jetty Servlet
 • Thrift Servers
 • Smoker
Architecture Overview
Search Server




Master-specific processes:
 • Incremental Indexer
 • External File Field Updaters
Load Balancing
Load Balancing
Thrift TSocketPool
Load Balancing
Thrift TSocketPool
Load Balancing
Thrift TSocketPool
Load Balancing
Server Affinity
Load Balancing
    Server Affinity Algorithm
$serversNew = array();
                                              [“host2”, “host3”, “host1”, “host4”]
$numServers = count($servers);

while($numServers > 0) {
   // Take the first 4 chars of the md5sum of the server count
   // and the query, mod the available servers
   $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers);
   $keySet = array_keys($servers);
   $serverId = $keySet[$key];

    // Push the chosen server onto the new list and remove it
    // from the initial list
    array_push($serversNew, $servers[$serverId]);
    unset($servers[$serverId]);
    --$numServers;
}
Load Balancing
Server Affinity Algorithm
              $key = hexdec(substr(md5($query),0,4))




  “jewelry”                 [“host2”, “host3”, “host1”, “host4”]

  “scarf”                   [“host2”, “host3”, “host1”, “host4”]
Load Balancing
Server Affinity Algorithm
      $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers));




  “jewelry”                            [“host2”, “host3”, “host1”, “host4”]

  “scarf”                              [“host2”, “host1”, “host4”, “host3”]
Load Balancing
Server Affinity Results


      2%        20%
Load Balancing
Server Affinity Caveats

     • Stemming / Analysis
     • Be wary of query distribution
Replication
Replication
The Problem
Replication
The Problem
Replication
Multicast Rsync?
Replication
Multicast Rsync?
[15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes
from host1 to host2 and host3 in prod. I'll be watching the graphs and
what not, but let me know if you see anything funky with the network
[15:26]  <patrick> ok
....

[15:31]  <keyur> is the site down?
Replication
Multicast Rsync?
Hmm...Bit Torrent?
Replication
Bit Torrent POC
Using BitTornado:
Replication
Bit Torrent + Solr
Fork of TTorent: https://github.com/etsy/ttorrent

                            Multi-File Support
                       Performance Enhancements
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Replication
Bit Torrent + Solr
Solr InterOp
QParsers
“writing query strings
   is for suckers”
Solr InterOp
QParsers

  http://host:8393/solr/person/select/?q=_query_:%22{!dismax
  %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf
  %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq}
  %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq=
  %22giovanni%20fernandez-kincade
 %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s
 yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez-
 kincade&lqf=last_name^3
Solr InterOp
QParsers

   http://host:8393/solr/person/select/?q={!personrealqp}giovanni
  %20fernandez-kincade
Solr InterOp
QParsers

 class PersonNameRealQParser extends QParser {
   public PersonNameRealQParser(String qstr, SolrParams localParams,
       SolrParams params, SolrQueryRequest req) {
     super(qstr, localParams, params, req);
   }
Solr InterOp
   QParsers
  @Override
  public Query parse() throws ParseException {
    TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));
    exactFullNameQuery.setBoost(4.0f);

    String[] userQueryTerms = qstr.split("s+");
    Query firstLastQuery = null;

    if (2 == userQueryTerms.length)
      firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);
    else
      firstLastQuery = parseAsFirstOrLast(userQueryTerms);

    DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);
    realNameQuery.add(exactFullNameQuery);
    realNameQuery.add(firstLastQuery);

    return realNameQuery;
  }
Solr InterOp
QParsers
The QParserPlugin that returns our new QParser:
  public class PersonNameRealQParserPlugin extends QParserPlugin {
   public static final String NAME = "personrealqp";

   @Override
   public void init(NamedList args) {}

   @Override
   public QParser createParser(String qstr, SolrParams localParams,
       SolrParams params, SolrQueryRequest req) {
     return new PersonNameRealQParser(qstr, localParams, params, req);
   }
 }
Solr InterOp
QParsers

Registering the plugin in solrconfig.xml:

   <queryParser name="personrealqp"
      class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
Custom Stemmer
Solr InterOp
Custom Stemmer
Solr InterOp
Custom Stemmer

 banded, banding, birding, bouldering, bounded, buffing, bundler, canning,
carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned,
lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter,
roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher,
                      strapped, threaded, yellowing
Solr InterOp
Custom Stemmer
First we extend KStemmer and intercept stem calls:
  public class LStemmer extends KStemmer {

     /**.....**/

      @Override
      String stem(String term) {
          String override = overrideStemTransformations.get(term);
          if(override != null) return override;
          return super.stem(term);
      }
  }
Solr InterOp
 Custom Stemmer
Then create a TokenFilter that uses the new Stemmer:
 final class LStemFilter extends TokenFilter {

   /**.....**/        
   protected LStemFilter(TokenStream input, int cacheSize) {
     super(input);
     stemmer = new LStemmer(cacheSize);
   }
        
   @Override
   public boolean incrementToken() throws IOException {
     /**....**/
   }
Solr InterOp
Custom Stemmer
Create a FilterFactory that exposes it:
       public class LStemFilterFactory extends BaseTokenFilterFactory {
        private int cacheSize = 20000;
        
        @Override
        public void init(Map<String, String> args) {
          super.init(args);
         String cacheSizeStr = args.get("cacheSize");
         if (cacheSizeStr != null) {
          cacheSize = Integer.parseInt(cacheSizeStr);
         }
       }
        
        @Override
       public TokenStream create(TokenStream in) {
        return new LStemFilter(in, cacheSize);
       }
     }
Solr InterOp
Custom Stemmer
And finally plug it into your analysis chain:

 <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
       words="solr/common/conf/stopwords.txt"/>
    <filter class="com.etsy.solr.analysis.LStemFilterFactory" />
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
 </analyzer>
Thanks!

Mais conteúdo relacionado

Mais procurados

[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기NAVER D2
 
New SPL Features in PHP 5.3
New SPL Features in PHP 5.3New SPL Features in PHP 5.3
New SPL Features in PHP 5.3Matthew Turland
 
Redis for the Everyday Developer
Redis for the Everyday DeveloperRedis for the Everyday Developer
Redis for the Everyday DeveloperRoss Tuck
 
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDBScala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDBjorgeortiz85
 
Trading with opensource tools, two years later
Trading with opensource tools, two years laterTrading with opensource tools, two years later
Trading with opensource tools, two years laterclkao
 
Invertible-syntax 入門
Invertible-syntax 入門Invertible-syntax 入門
Invertible-syntax 入門Hiromi Ishii
 
SPL: The Undiscovered Library - DataStructures
SPL: The Undiscovered Library -  DataStructuresSPL: The Undiscovered Library -  DataStructures
SPL: The Undiscovered Library - DataStructuresMark Baker
 
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, SematextSolr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, SematextLucidworks
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)Night Sailer
 
CS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDBCS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDBjorgeortiz85
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDBTakahiro Inoue
 
Things I Believe Now That I'm Old
Things I Believe Now That I'm OldThings I Believe Now That I'm Old
Things I Believe Now That I'm OldRoss Tuck
 
How to write rust instead of c and get away with it
How to write rust instead of c and get away with itHow to write rust instead of c and get away with it
How to write rust instead of c and get away with itFlavien Raynaud
 

Mais procurados (20)

[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기
 
Nubilus Perl
Nubilus PerlNubilus Perl
Nubilus Perl
 
Perl Web Client
Perl Web ClientPerl Web Client
Perl Web Client
 
dotCloud and go
dotCloud and godotCloud and go
dotCloud and go
 
New SPL Features in PHP 5.3
New SPL Features in PHP 5.3New SPL Features in PHP 5.3
New SPL Features in PHP 5.3
 
Redis for the Everyday Developer
Redis for the Everyday DeveloperRedis for the Everyday Developer
Redis for the Everyday Developer
 
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDBScala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
 
Trading with opensource tools, two years later
Trading with opensource tools, two years laterTrading with opensource tools, two years later
Trading with opensource tools, two years later
 
Invertible-syntax 入門
Invertible-syntax 入門Invertible-syntax 入門
Invertible-syntax 入門
 
Your code is not a string
Your code is not a stringYour code is not a string
Your code is not a string
 
Intro to The PHP SPL
Intro to The PHP SPLIntro to The PHP SPL
Intro to The PHP SPL
 
SPL: The Undiscovered Library - DataStructures
SPL: The Undiscovered Library -  DataStructuresSPL: The Undiscovered Library -  DataStructures
SPL: The Undiscovered Library - DataStructures
 
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, SematextSolr Anti-Patterns: Presented by Rafał Kuć, Sematext
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
 
From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)From mysql to MongoDB(MongoDB2011北京交流会)
From mysql to MongoDB(MongoDB2011北京交流会)
 
CS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDBCS442 - Rogue: A Scala DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDB
 
はじめてのMongoDB
はじめてのMongoDBはじめてのMongoDB
はじめてのMongoDB
 
Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09
 
Things I Believe Now That I'm Old
Things I Believe Now That I'm OldThings I Believe Now That I'm Old
Things I Believe Now That I'm Old
 
groovy & grails - lecture 2
groovy & grails - lecture 2groovy & grails - lecture 2
groovy & grails - lecture 2
 
How to write rust instead of c and get away with it
How to write rust instead of c and get away with itHow to write rust instead of c and get away with it
How to write rust instead of c and get away with it
 

Destaque

Emphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudEmphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudgfodor
 
Data mining for_product_search
Data mining for_product_searchData mining for_product_search
Data mining for_product_searchAaron Beppu
 
Transforming Search in the Digital Marketplace
Transforming Search in the Digital MarketplaceTransforming Search in the Digital Marketplace
Transforming Search in the Digital MarketplaceJason Davis
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages MaturelyJohn Allspaw
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMatt Graham
 
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013Gregg Donovan
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex SystemsJohn Allspaw
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorJohn Allspaw
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightRoss Snyder
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012Nick Galbreath
 
Code as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at EtsyCode as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at EtsyChad Dickerson
 

Destaque (12)

Emphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloudEmphemeral hadoop clusters in the cloud
Emphemeral hadoop clusters in the cloud
 
Data mining for_product_search
Data mining for_product_searchData mining for_product_search
Data mining for_product_search
 
Transforming Search in the Digital Marketplace
Transforming Search in the Digital MarketplaceTransforming Search in the Digital Marketplace
Transforming Search in the Digital Marketplace
 
Responding to Outages Maturely
Responding to Outages MaturelyResponding to Outages Maturely
Responding to Outages Maturely
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
Living with Garbage by Gregg Donovan at LuceneSolr Revolution 2013
 
DevTools at Etsy
DevTools at EtsyDevTools at Etsy
DevTools at Etsy
 
Resilient Response In Complex Systems
Resilient Response In Complex SystemsResilient Response In Complex Systems
Resilient Response In Complex Systems
 
Outages, PostMortems, and Human Error
Outages, PostMortems, and Human ErrorOutages, PostMortems, and Human Error
Outages, PostMortems, and Human Error
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
 
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
DevOpsSec: Appling DevOps Principles to Security, DevOpsDays Austin 2012
 
Code as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at EtsyCode as Craft: Building a Strong Engineering Culture at Etsy
Code as Craft: Building a Strong Engineering Culture at Etsy
 

Semelhante a Solr Architecture Overview: Load Balancing, Replication, Custom Stemming and QParsers

Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayJacob Park
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopSages
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemSages
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Painless Persistence with Realm
Painless Persistence with RealmPainless Persistence with Realm
Painless Persistence with RealmChristian Melchior
 
Ft10 de smet
Ft10 de smetFt10 de smet
Ft10 de smetnkaluva
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastJorge Lopez-Malla
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
Parboiled explained
Parboiled explainedParboiled explained
Parboiled explainedPaul Popoff
 
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScriptQiangning Hong
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)Wesley Beary
 
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsWhere the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsMatt Warren
 
Protocol handler in Gecko
Protocol handler in GeckoProtocol handler in Gecko
Protocol handler in GeckoChih-Hsuan Kuo
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the CloudWesley Beary
 

Semelhante a Solr Architecture Overview: Load Balancing, Replication, Custom Stemming and QParsers (20)

Java
JavaJava
Java
 
Developing a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and SprayDeveloping a Real-time Engine with Akka, Cassandra, and Spray
Developing a Real-time Engine with Akka, Cassandra, and Spray
 
What is new in Java 8
What is new in Java 8What is new in Java 8
What is new in Java 8
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Painless Persistence with Realm
Painless Persistence with RealmPainless Persistence with Realm
Painless Persistence with Realm
 
Shooting the Rapids
Shooting the RapidsShooting the Rapids
Shooting the Rapids
 
Ft10 de smet
Ft10 de smetFt10 de smet
Ft10 de smet
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
Parboiled explained
Parboiled explainedParboiled explained
Parboiled explained
 
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
 
Apache Thrift
Apache ThriftApache Thrift
Apache Thrift
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
 
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsWhere the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
 
Protocol handler in Gecko
Protocol handler in GeckoProtocol handler in Gecko
Protocol handler in Gecko
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
 

Último

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Solr Architecture Overview: Load Balancing, Replication, Custom Stemming and QParsers

  • 2. Things I’m not going to talk about: A/B Testing i18n Continuos Deployment
  • 4.
  • 5.
  • 9. Architecture Overview Thrift struct Listing { 1: i64 listing_id } struct ListingResults { 1: i64 count, 2: list<Listing> listings } service Search { ListingResults search(1:string query) }
  • 10. Architecture Overview Thrift Generated Java server code: public class Search { public interface Iface { public ListingResults search(String query) throws TException; } Generated PHP client code: class SearchClient implements SearchIf { /**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
  • 11. Architecture Overview Thrift Why use Thrift? • Service Encapsulation • Reduced Network Traffic
  • 12. Architecture Overview Thrift Why only return IDs? • Index Size • Easy to scale PK lookups
  • 14. Architecture Overview Search Server • Identical Code + Hardware • Roles/Behavior controlled by Env variables • Single Java Process • Solr running as a Jetty Servlet • Thrift Servers • Smoker
  • 15. Architecture Overview Search Server Master-specific processes: • Incremental Indexer • External File Field Updaters
  • 21. Load Balancing Server Affinity Algorithm $serversNew = array(); [“host2”, “host3”, “host1”, “host4”] $numServers = count($servers); while($numServers > 0) { // Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key]; // Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers; }
  • 22. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($query),0,4)) “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host3”, “host1”, “host4”]
  • 23. Load Balancing Server Affinity Algorithm $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers)); “jewelry” [“host2”, “host3”, “host1”, “host4”] “scarf” [“host2”, “host1”, “host4”, “host3”]
  • 25. Load Balancing Server Affinity Caveats • Stemming / Analysis • Be wary of query distribution
  • 30. Replication Multicast Rsync? [15:25]  <engineer> patrick: i'm gonna test multi-rsyncing some indexes from host1 to host2 and host3 in prod. I'll be watching the graphs and what not, but let me know if you see anything funky with the network [15:26]  <patrick> ok .... [15:31]  <keyur> is the site down?
  • 34. Replication Bit Torrent + Solr Fork of TTorent: https://github.com/etsy/ttorrent Multi-File Support Performance Enhancements
  • 41. “writing query strings is for suckers”
  • 42.
  • 43. Solr InterOp QParsers http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade %22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_s yn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez- kincade&lqf=last_name^3
  • 44. Solr InterOp QParsers http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
  • 45. Solr InterOp QParsers class PersonNameRealQParser extends QParser {    public PersonNameRealQParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      super(qstr, localParams, params, req);    }
  • 46. Solr InterOp QParsers @Override   public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr));     exactFullNameQuery.setBoost(4.0f);     String[] userQueryTerms = qstr.split("s+");     Query firstLastQuery = null;     if (2 == userQueryTerms.length)       firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]);     else       firstLastQuery = parseAsFirstOrLast(userQueryTerms);     DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0);     realNameQuery.add(exactFullNameQuery);     realNameQuery.add(firstLastQuery);     return realNameQuery;   }
  • 47. Solr InterOp QParsers The QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin {    public static final String NAME = "personrealqp";    @Override    public void init(NamedList args) {}    @Override    public QParser createParser(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {      return new PersonNameRealQParser(qstr, localParams, params, req);    } }
  • 48. Solr InterOp QParsers Registering the plugin in solrconfig.xml: <queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
  • 51. Solr InterOp Custom Stemmer banded, banding, birding, bouldering, bounded, buffing, bundler, canning, carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned, lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter, roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher, strapped, threaded, yellowing
  • 52. Solr InterOp Custom Stemmer First we extend KStemmer and intercept stem calls: public class LStemmer extends KStemmer { /**.....**/      @Override      String stem(String term) {          String override = overrideStemTransformations.get(term);          if(override != null) return override;          return super.stem(term);      } }
  • 53. Solr InterOp Custom Stemmer Then create a TokenFilter that uses the new Stemmer: final class LStemFilter extends TokenFilter { /**.....**/         protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); }          @Override public boolean incrementToken() throws IOException { /**....**/ }
  • 54. Solr InterOp Custom Stemmer Create a FilterFactory that exposes it: public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000;      @Override public void init(Map<String, String> args) { super.init(args);      String cacheSizeStr = args.get("cacheSize");      if (cacheSizeStr != null) {       cacheSize = Integer.parseInt(cacheSizeStr);      }    }      @Override    public TokenStream create(TokenStream in) {     return new LStemFilter(in, cacheSize);    } }
  • 55. Solr InterOp Custom Stemmer And finally plug it into your analysis chain: <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>