SlideShare a Scribd company logo
1 of 20
Download to read offline
Lucene Revolution
                             Barcelona 2011
                             Using SolrCloud for real
                                        1




Thursday, October 20, 2011
Whoggly?
        • I’m Jon, a happy lucene hacker since 2004
        • We do Logging as a Service (SAAS with a single focus)
                 ‣    Consolidation, Archiving, Search, Alerting (soon!)
                 ‣    You stream your logs to us, we index them, you search
        • Full public launch on Feb 2, 2011
        • Every customer has their own index
                 ‣    ~8k shards, ~7B docs, ~3TB index, 3-5k events/sec, 1-1.5 MB/sec
        • Search finds the splinter in the log-jam
                 ‣    What happened? When? How often?
                             All your logs are belong to you   2


Thursday, October 20, 2011
Thursday, October 20, 2011
Time is on our side...
        • We’re not solving a typical search problem
                 ‣    Endless stream of data (all your logs)
                 ‣    Time is our “score”
                 ‣    Write once, update never
                 ‣    Large number of very small documents (events)
                 ‣    Search is mostly about what just happened
        • So, simple index life-cycle with “natural” sharding
                 ‣    For us, a shard is slice of a single customers index, based on start
                      and end time

                             All your logs are belong to you   4


Thursday, October 20, 2011
Why SolrCloud?
        • The wheel existed, and keeps getting better
                 ‣    Thanks to the community. Big Thanks to Mark & Yonik
        • Solr: Multi-core, Plugins, Facets, migration, caching, ...
                 ‣    Our Solr instances have hundreds of cores (one per shard)
                 ‣    We’ve made very few changes to core Solr code
        • Zookeeper: state of all nodes & shards, so...
                 ‣    Automatic cluster config
                 ‣    Cluster & Shard changes visible everywhere, almost instantly
                 ‣    Any node can service any request (except for indexing)

                             All your logs are belong to you   5


Thursday, October 20, 2011
Cluster Management
          • Solr instances register & deregister themselves in ZooKeeper
                   ‣   always know what Solr instances are available
          • All Solr configs in ZK except for solr.xml
                   ‣   all instances use same schema, etc, so simple management
                   ‣   solr.xml is “special”, instance specific
          • We’ve added our own persistent data
                   ‣   Loggly-specific config, and some performance data for Solr
                   ‣   Other app configs, “live” status


                             All your logs are belong to you   6


Thursday, October 20, 2011
Index Management
          • One Collection (“index”) per customer
                   ‣   Sharded by time, multiple shards per customer
          • Shards migrated from node to node using replication
                   ‣   Minor changes to existing Solr replication code
          • Shards merged by us, never automatically
                   ‣   We merge to create longer time-slices
                   ‣   Doing it manually makes merge load more predictable
          • Completely distributed management, no “Master”
                   ‣   Simple, Robust, “need to know”
                             All your logs are belong to you   7


Thursday, October 20, 2011
SolrCloud, meet reality
          • Day 1 (patch to trunk, 18 months ago), mostly JFW’ed. Since then...
                   ‣   We had to fix a couple of TODO’s
                   ‣   We changed shard selection for search
                   ‣   We hit a ZK performance wall
                   ‣   We’ve pulled (most of) the Solr ZK code into an external jar, and
                             ‣   added some utilities to the ZK controller/client
                             ‣   extended the shard node to include custom data, including “sleep/
                                 wake” state and S3 archive state
                             ‣   added non-Solr application configuration to ZK

                             All your logs are belong to you   8


Thursday, October 20, 2011
TODO’s
          • Very little missing, even 18 months ago...
          • FacetComponent.countFacets()
              ‣ // TODO: facet dates

                   ‣   We did it. Since been added to trunk (not our code)
          • ZkController.unregister()
              ‣ // TODO : perhaps mark the core down in zk?

                   ‣   We did it: remove the shard from ZK, walk up the tree removing its
                       parents if they’re empty
          • One more, but no spoilers...

                             All your logs are belong to you     9


Thursday, October 20, 2011
Shard selection
          • QueryComponent.checkDistributed() selects shards for each
            “slice”, based on the Collection of the core being queried.
          • We changed some things for version 1...
                   ‣   use a “loggly” core, and pass in the collection
                   ‣   select the biggest shard when overlaps occur
                   ‣   select the “best” copy of a shard on multiple machines
          • Now we use plugin variant of admin handler
                   ‣   avoids special case core
                   ‣   lets us do shard-level short-circuiting for search (not facets)

                             All your logs are belong to you   10


Thursday, October 20, 2011
Performance Whack-a-Mole
          • In the last few months...
                   ‣   Default SolrCloud ZooKeeper state implementation
                             ‣   Not happy with 1000’s of shards
                   ‣   Default CoreContainer multi-core behaviour
                             ‣   Not very happy with 100’s (or 1000’s) of cores
          • Next...
                   ‣   Hot-spots in migrations/merging
                             ‣   current code is too complex, needs cleanup/dumbing down



                             All your logs are belong to you   11


Thursday, October 20, 2011
SolrCloud - CloudState
          • ZkStateReader... (the TODO that bit us, hard!)
                   ‣   // TODO: - possibly: incremental update rather than reread everything?

                   ‣   Yep, EVERYTHING in ZooKeeper, every time we update anything
                             ‣   to be fair, we’re kind of a pathological case
                   ‣   Watchers for every collection, every shard
          • The Perfect storm
                   ‣   Full read of ZK with 1000’s of shards ain’t fast
                   ‣   We weren’t using scheduled updates correctly
                   ‣   We’re updating frequently, triggering lots of Watcher updates
          • Up to 20 second wait for CloudState
                             All your logs are belong to you   12


Thursday, October 20, 2011
“Just avoid holding it in that way”
          • Watch fewer things
                   ‣   Every node doesn’t have to know about every shard change
          • Incremental updates
                   ‣   When a watch triggers, we rebuild only that data
          • On-demand Collection updates
                   ‣   We read individual Collection info whenever its needed
          • Wait for CloudState is now usually < 10ms
                   ‣   100‘s of CloudState updates / minute



                             All your logs are belong to you   13


Thursday, October 20, 2011
SolrCloud - CoreContainer
          • Default multi-core behaviour works really well
                   ‣   We’ve had 1000’s of shards on a quad core 16GB box
          • But...
                   ‣   Context switching like crazy, especially on indexers
                   ‣   Lots of RAM, especially when merging & warming
                   ‣   Lots of GC, especially when servicing lots of search requests, or
                       handling large result sets, or merging
                   ‣   Startup times get very very very long (minutes)



                             All your logs are belong to you   14


Thursday, October 20, 2011
Goodnight Solr...
          • We now sleep/wake shards (~50% of shards are asleep)
                   ‣   solr.xml & ZK extended to include state
                   ‣   sleeping shards are close()’ed, removing from CoreContainer
          • Changes to CoreContainer
                   ‣   Added LogglyCore - simple wrapper around SolrCore, manages state
                   ‣   Replaced “cores” (Map of SolrCores) with “liveCores” and
                       “allCores” (Maps of LoggyCores)
          • On startup we start up Jetty first, then open the shards
                   ‣   Solr “available” in seconds, shards come online as needed
          • Same mechanism used to manage shard archiving
                             All your logs are belong to you   15


Thursday, October 20, 2011
ZK Utilities
          • Cleanup
                   ‣   nuke: rm -rf for the ZooKeeper tree
                   ‣   purgeShards: delete all shards for collection X
                   ‣   restore: restore shards for collection X from disk
                   ‣   purgeNode: delete all references to node Y
                   ‣   balance: balance shards across all available nodes
          • upload: upload a file or directory




                             All your logs are belong to you   16


Thursday, October 20, 2011
Loggly magic
          • We’ve spent lots of time on Shard & Cluster Management
                   ‣   Plugins make extending functionality easy
                   ‣   ZK makes cluster management easy, for Solr and associated apps
          • Minimal changes to Solr itself. Biggest are...
                   ‣   0MQ streaming data input
                   ‣   Automatic sharding, baed on time
                   ‣   LogglyCore for Sleep/Wake
          • Lots of custom admin API extensions, to tie Solr into overall system


                             All your logs are belong to you   17


Thursday, October 20, 2011
Loggly ZK magic
          • We use ZK to store
                   ‣   indexing performance data (load balancing indexers)
                   ‣   shard state (awake / asleep, archived)
          • We use ZK in our other apps, in /cluster
                   ‣   we have ~= “live_nodes” for ALL apps
                   ‣   one-off’s easy when entire state of the system is available
                             ‣   json output + ruby/python + REST = what happened?
          • ZK is very robust
                   ‣   multiple ZK host failures, no downtime for ZK quorum

                             All your logs are belong to you   18


Thursday, October 20, 2011
Other Stuff We Use
          • Amazon’s AWS for infrastructure (EC2, S3)
          • Sylog-NG for syslog/TLS input services
          • 0MQ for event queuing & work distribution
          • MongoDB for statistics and API /stat methods
          • Node.js for HTTP/HTTPs input services
          • Django/Python for middleware/app




                             All your logs are belong to you   19
Thursday, October 20, 2011
ACK/FIN
          One Last Thing... http://logg.ly/jobs
                                              20

Thursday, October 20, 2011

More Related Content

Viewers also liked

Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrAndy Jackson
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr CloudCominvent AS
 
SolrとElasticsearchを比べてみよう
SolrとElasticsearchを比べてみようSolrとElasticsearchを比べてみよう
SolrとElasticsearchを比べてみようShinsuke Sugaya
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and TestingMark Miller
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 

Viewers also liked (7)

Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
SolrとElasticsearchを比べてみよう
SolrとElasticsearchを比べてみようSolrとElasticsearchを比べてみよう
SolrとElasticsearchを比べてみよう
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Using Solr Cloud, For Real! - Jon Gifford

  • 1. Lucene Revolution Barcelona 2011 Using SolrCloud for real 1 Thursday, October 20, 2011
  • 2. Whoggly? • I’m Jon, a happy lucene hacker since 2004 • We do Logging as a Service (SAAS with a single focus) ‣ Consolidation, Archiving, Search, Alerting (soon!) ‣ You stream your logs to us, we index them, you search • Full public launch on Feb 2, 2011 • Every customer has their own index ‣ ~8k shards, ~7B docs, ~3TB index, 3-5k events/sec, 1-1.5 MB/sec • Search finds the splinter in the log-jam ‣ What happened? When? How often? All your logs are belong to you 2 Thursday, October 20, 2011
  • 4. Time is on our side... • We’re not solving a typical search problem ‣ Endless stream of data (all your logs) ‣ Time is our “score” ‣ Write once, update never ‣ Large number of very small documents (events) ‣ Search is mostly about what just happened • So, simple index life-cycle with “natural” sharding ‣ For us, a shard is slice of a single customers index, based on start and end time All your logs are belong to you 4 Thursday, October 20, 2011
  • 5. Why SolrCloud? • The wheel existed, and keeps getting better ‣ Thanks to the community. Big Thanks to Mark & Yonik • Solr: Multi-core, Plugins, Facets, migration, caching, ... ‣ Our Solr instances have hundreds of cores (one per shard) ‣ We’ve made very few changes to core Solr code • Zookeeper: state of all nodes & shards, so... ‣ Automatic cluster config ‣ Cluster & Shard changes visible everywhere, almost instantly ‣ Any node can service any request (except for indexing) All your logs are belong to you 5 Thursday, October 20, 2011
  • 6. Cluster Management • Solr instances register & deregister themselves in ZooKeeper ‣ always know what Solr instances are available • All Solr configs in ZK except for solr.xml ‣ all instances use same schema, etc, so simple management ‣ solr.xml is “special”, instance specific • We’ve added our own persistent data ‣ Loggly-specific config, and some performance data for Solr ‣ Other app configs, “live” status All your logs are belong to you 6 Thursday, October 20, 2011
  • 7. Index Management • One Collection (“index”) per customer ‣ Sharded by time, multiple shards per customer • Shards migrated from node to node using replication ‣ Minor changes to existing Solr replication code • Shards merged by us, never automatically ‣ We merge to create longer time-slices ‣ Doing it manually makes merge load more predictable • Completely distributed management, no “Master” ‣ Simple, Robust, “need to know” All your logs are belong to you 7 Thursday, October 20, 2011
  • 8. SolrCloud, meet reality • Day 1 (patch to trunk, 18 months ago), mostly JFW’ed. Since then... ‣ We had to fix a couple of TODO’s ‣ We changed shard selection for search ‣ We hit a ZK performance wall ‣ We’ve pulled (most of) the Solr ZK code into an external jar, and ‣ added some utilities to the ZK controller/client ‣ extended the shard node to include custom data, including “sleep/ wake” state and S3 archive state ‣ added non-Solr application configuration to ZK All your logs are belong to you 8 Thursday, October 20, 2011
  • 9. TODO’s • Very little missing, even 18 months ago... • FacetComponent.countFacets() ‣ // TODO: facet dates ‣ We did it. Since been added to trunk (not our code) • ZkController.unregister() ‣ // TODO : perhaps mark the core down in zk? ‣ We did it: remove the shard from ZK, walk up the tree removing its parents if they’re empty • One more, but no spoilers... All your logs are belong to you 9 Thursday, October 20, 2011
  • 10. Shard selection • QueryComponent.checkDistributed() selects shards for each “slice”, based on the Collection of the core being queried. • We changed some things for version 1... ‣ use a “loggly” core, and pass in the collection ‣ select the biggest shard when overlaps occur ‣ select the “best” copy of a shard on multiple machines • Now we use plugin variant of admin handler ‣ avoids special case core ‣ lets us do shard-level short-circuiting for search (not facets) All your logs are belong to you 10 Thursday, October 20, 2011
  • 11. Performance Whack-a-Mole • In the last few months... ‣ Default SolrCloud ZooKeeper state implementation ‣ Not happy with 1000’s of shards ‣ Default CoreContainer multi-core behaviour ‣ Not very happy with 100’s (or 1000’s) of cores • Next... ‣ Hot-spots in migrations/merging ‣ current code is too complex, needs cleanup/dumbing down All your logs are belong to you 11 Thursday, October 20, 2011
  • 12. SolrCloud - CloudState • ZkStateReader... (the TODO that bit us, hard!) ‣ // TODO: - possibly: incremental update rather than reread everything? ‣ Yep, EVERYTHING in ZooKeeper, every time we update anything ‣ to be fair, we’re kind of a pathological case ‣ Watchers for every collection, every shard • The Perfect storm ‣ Full read of ZK with 1000’s of shards ain’t fast ‣ We weren’t using scheduled updates correctly ‣ We’re updating frequently, triggering lots of Watcher updates • Up to 20 second wait for CloudState All your logs are belong to you 12 Thursday, October 20, 2011
  • 13. “Just avoid holding it in that way” • Watch fewer things ‣ Every node doesn’t have to know about every shard change • Incremental updates ‣ When a watch triggers, we rebuild only that data • On-demand Collection updates ‣ We read individual Collection info whenever its needed • Wait for CloudState is now usually < 10ms ‣ 100‘s of CloudState updates / minute All your logs are belong to you 13 Thursday, October 20, 2011
  • 14. SolrCloud - CoreContainer • Default multi-core behaviour works really well ‣ We’ve had 1000’s of shards on a quad core 16GB box • But... ‣ Context switching like crazy, especially on indexers ‣ Lots of RAM, especially when merging & warming ‣ Lots of GC, especially when servicing lots of search requests, or handling large result sets, or merging ‣ Startup times get very very very long (minutes) All your logs are belong to you 14 Thursday, October 20, 2011
  • 15. Goodnight Solr... • We now sleep/wake shards (~50% of shards are asleep) ‣ solr.xml & ZK extended to include state ‣ sleeping shards are close()’ed, removing from CoreContainer • Changes to CoreContainer ‣ Added LogglyCore - simple wrapper around SolrCore, manages state ‣ Replaced “cores” (Map of SolrCores) with “liveCores” and “allCores” (Maps of LoggyCores) • On startup we start up Jetty first, then open the shards ‣ Solr “available” in seconds, shards come online as needed • Same mechanism used to manage shard archiving All your logs are belong to you 15 Thursday, October 20, 2011
  • 16. ZK Utilities • Cleanup ‣ nuke: rm -rf for the ZooKeeper tree ‣ purgeShards: delete all shards for collection X ‣ restore: restore shards for collection X from disk ‣ purgeNode: delete all references to node Y ‣ balance: balance shards across all available nodes • upload: upload a file or directory All your logs are belong to you 16 Thursday, October 20, 2011
  • 17. Loggly magic • We’ve spent lots of time on Shard & Cluster Management ‣ Plugins make extending functionality easy ‣ ZK makes cluster management easy, for Solr and associated apps • Minimal changes to Solr itself. Biggest are... ‣ 0MQ streaming data input ‣ Automatic sharding, baed on time ‣ LogglyCore for Sleep/Wake • Lots of custom admin API extensions, to tie Solr into overall system All your logs are belong to you 17 Thursday, October 20, 2011
  • 18. Loggly ZK magic • We use ZK to store ‣ indexing performance data (load balancing indexers) ‣ shard state (awake / asleep, archived) • We use ZK in our other apps, in /cluster ‣ we have ~= “live_nodes” for ALL apps ‣ one-off’s easy when entire state of the system is available ‣ json output + ruby/python + REST = what happened? • ZK is very robust ‣ multiple ZK host failures, no downtime for ZK quorum All your logs are belong to you 18 Thursday, October 20, 2011
  • 19. Other Stuff We Use • Amazon’s AWS for infrastructure (EC2, S3) • Sylog-NG for syslog/TLS input services • 0MQ for event queuing & work distribution • MongoDB for statistics and API /stat methods • Node.js for HTTP/HTTPs input services • Django/Python for middleware/app All your logs are belong to you 19 Thursday, October 20, 2011
  • 20. ACK/FIN One Last Thing... http://logg.ly/jobs 20 Thursday, October 20, 2011