SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
SolrCloud and Shard Splitting
Shalin Shekhar Mangar
Bangalore Lucene/Solr Meetup
8th
June 2013
Who am I?
●
Apache Lucene/Solr Committer and PMC member
●
Contributor since January 2008
●
Currently: Engineer at LucidWorks
●
Formerly with AOL
●
Email: shalin@apache.org
●
Twitter: shalinmangar
●
Blog: http://shal.in
Bangalore Lucene/Solr Meetup
8th
June 2013
SolrCloud: Overview
●
Distributed searching/indexing
●
No single points of failure
●
Near Real Time Friendly (push replication)
●
Transaction logs for durability and recovery
●
Real-time get
●
Atomic Updates
●
Optimistic Concurrency
●
Request forwarding from any node in cluster
●
A strong contender for your NoSQL needs as well
Bangalore Lucene/Solr Meetup
8th
June 2013
Bangalore Lucene/Solr Meetup
8th
June 2013
Document Routing
80000000-bfffffff
00000000-3fffffff
40000000-7fffffff
c0000000-ffffffff
shard1shard4
shard3 shard2
1f27
3c7
1
(MurmurHash
3)
1f27
000
0
1f27 ffffto
(hash)
shard
1
q=my_query
shard.keys=BigCo!
numShards=4
router=compositeId
id = BigCo!doc5
Bangalore Lucene/Solr Meetup
8th
June 2013
SolrCloud Collections API
●
/admin/collections?action=CREATE&name=mycollection
– &numShards=3
– &replicationFactor=4
– &maxShardsPerNode=2
– &createNodeSet=node1:8080,node2:8080,node3:8080,...
– &collection.configName=myconfigset
●
/admin/collections?action=DELETE&name=mycollection
●
/admin/collections?action=RELOAD&name=mycollection
●
/admin/collections?action=CREATEALIAS&name=south
– &collections=KA,TN,AP,KL,...
●
Coming soon: Shard aliases
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Background
●
Before Solr 4.3, number of shards had to fixed at the time
of collection creation
●
Forced people to start with large number of shards
●
If a shard ran too hot, the only fix was to re-index and
therefore re-balance the collection
●
Each shard is assigned a hash range
●
Each shard also has a state which defaults to 'ACTIVE'
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Features
●
Seamless on-the-fly splitting – no downtime required
●
Retried on failures
●
/admin/collections?
action=SPLITSHARD&collection=mycollection
– &shard=shardId
●
A lower-level CoreAdmin API comes free!
– /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2
– /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard2_0
Shard1
replic
a
leade
r
Shard2
replic
a
leade
r
Shard3
replic
a
leade
r
Shard2_1
update
Shard Splitting
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Mechanism
●
New sub-shards created in “construction” state
●
Leader starts forwarding applicable updates, which are buffered
by the sub-shards
●
Leader index is split and installed on the sub-shards
●
Sub-shards apply buffered updates
●
Replicas are created for sub-shards and brought up to speed
●
Sub-shard becomes “active” and old shard becomes “inactive”
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Tips and Gotchas
●
Supports collections with a hash based router i.e. “plain”
or “compositeId” routers
●
Operation is executed by the Overseer node, not by the
node you requested
●
HTTP request is synchronous but operation is async. A
read timeout does not mean failure!
●
Operation is retried on failure. Check parent leader's logs
before you re-issue the command or you may end with
more shards than you want
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Tips and gotchas
●
Solr Admin GUI is not aware of shard states yet so the
inactive parent shard is also shown in “green”
●
The CoreAdmin split command can be used against non-
cloud deployments. It will spread docs alternately among
the sub-indexes
●
Inactive shards have to be cleaned up manually. Solr 4.4
will have a delete shard API
●
Shard splitting in 4.3 release is buggy. Wait for 4.3.1
Bangalore Lucene/Solr Meetup
8th
June 2013
Shard Splitting: Looking towards the future
●
GUI integration and better progress reporting/monitoring
●
Better support for custom sharding use-cases
●
More flexibility towards number of sub-shards, hash
ranges, number of replicas etc
●
Store replication factor per shard
●
Suggest splits to admins based on cluster state and load
Confidential and Proprietary
© 2012 LucidWorks14
About LucidWorks
• Intro to LucidWorks (formerly Lucid Imagination)
– Follow: @lucidworks, @lucidimagineer
– Learn: http://www.lucidworks.com
• Check out SearchHub: http://www.searchhub.org
• Solr 4.1 Reference Guide: http://bit.ly/11KSiMN
– Older versions: http://bit.ly/12t1Egq
• Our Products
– LucidWorks Search
– LucidWorks Big Data
• Lucene Revolution
– http://www.lucenerevolution.com
Bangalore Lucene/Solr Meetup
8th
June 2013
Thank you
Shalin Shekhar Mangar
LucidWorks

Mais conteúdo relacionado

Mais procurados

Alfresco node lifecyle, services and zones
Alfresco node lifecyle, services and zonesAlfresco node lifecyle, services and zones
Alfresco node lifecyle, services and zonesSanket Mehta
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Armeriaworkshop2019 openchat julie
Armeriaworkshop2019 openchat julieArmeriaworkshop2019 openchat julie
Armeriaworkshop2019 openchat julieLINE+
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1Maruf Hassan
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQLRodrigo Prates
 
Dawid Weiss- Finite state automata in lucene
 Dawid Weiss- Finite state automata in lucene Dawid Weiss- Finite state automata in lucene
Dawid Weiss- Finite state automata in luceneLucidworks (Archived)
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentHostedbyConfluent
 
An intro to GraphQL
An intro to GraphQLAn intro to GraphQL
An intro to GraphQLvaluebound
 
Programación Reactiva con Spring WebFlux
Programación Reactiva con Spring WebFluxProgramación Reactiva con Spring WebFlux
Programación Reactiva con Spring WebFluxParadigma Digital
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache JackrabbitJukka Zitting
 
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseHow to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseAngel Borroy López
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Introduction to jmeter
Introduction to jmeterIntroduction to jmeter
Introduction to jmetertest test
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Introduction to Python Celery
Introduction to Python CeleryIntroduction to Python Celery
Introduction to Python CeleryMahendra M
 

Mais procurados (20)

Alfresco node lifecyle, services and zones
Alfresco node lifecyle, services and zonesAlfresco node lifecyle, services and zones
Alfresco node lifecyle, services and zones
 
Rest API
Rest APIRest API
Rest API
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Armeriaworkshop2019 openchat julie
Armeriaworkshop2019 openchat julieArmeriaworkshop2019 openchat julie
Armeriaworkshop2019 openchat julie
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
Dawid Weiss- Finite state automata in lucene
 Dawid Weiss- Finite state automata in lucene Dawid Weiss- Finite state automata in lucene
Dawid Weiss- Finite state automata in lucene
 
Introduction to Celery
Introduction to CeleryIntroduction to Celery
Introduction to Celery
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, ConfluentKafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
 
An intro to GraphQL
An intro to GraphQLAn intro to GraphQL
An intro to GraphQL
 
Programación Reactiva con Spring WebFlux
Programación Reactiva con Spring WebFluxProgramación Reactiva con Spring WebFlux
Programación Reactiva con Spring WebFlux
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache Jackrabbit
 
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseHow to migrate from Alfresco Search Services to Alfresco SearchEnterprise
How to migrate from Alfresco Search Services to Alfresco SearchEnterprise
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Introduction to jmeter
Introduction to jmeterIntroduction to jmeter
Introduction to jmeter
 
Governor limits
Governor limitsGovernor limits
Governor limits
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Introduction to Python Celery
Introduction to Python CeleryIntroduction to Python Celery
Introduction to Python Celery
 

Destaque

Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Electionravikgiitk
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudthelabdude
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and TestingMark Miller
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloudVarun Thacker
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Shalin Shekhar Mangar
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupShalin Shekhar Mangar
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksShalin Shekhar Mangar
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataShalin Shekhar Mangar
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 
Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Viet-Dung TRINH
 

Destaque (20)

Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Introduction to SolrCloud
Introduction to SolrCloudIntroduction to SolrCloud
Introduction to SolrCloud
 
Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6Cross Datacenter Replication in Apache Solr 6
Cross Datacenter Replication in Apache Solr 6
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Scaling search with SolrCloud
Scaling search with SolrCloudScaling search with SolrCloud
Scaling search with SolrCloud
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
Search engine ppt
Search engine pptSearch engine ppt
Search engine ppt
 
Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016Apache zookeeper seminar_trinh_viet_dung_03_2016
Apache zookeeper seminar_trinh_viet_dung_03_2016
 

Semelhante a SolrCloud and Shard Splitting

Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Jitendra Chittoda
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Olivier Dobberkau
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101Itiel Shwartz
 
apachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfapachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfssuserbb9f511
 
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoPostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoUri Savelchev
 
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsOracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsChris Muir
 
BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010fschupp
 
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013Andrew Morgan
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Claus Ibsen
 
Apache Camel K - Copenhagen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - CopenhagenClaus Ibsen
 
Akka Clustering And Sharding
Akka Clustering And ShardingAkka Clustering And Sharding
Akka Clustering And ShardingKnoldus Inc.
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Rails 3 : Cool New Things
Rails 3 : Cool New ThingsRails 3 : Cool New Things
Rails 3 : Cool New ThingsY. Thong Kuah
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!PGConf APAC
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfYunusShaikh49
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 

Semelhante a SolrCloud and Shard Splitting (20)

Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???Sequential Concurrency ... WHAT ???
Sequential Concurrency ... WHAT ???
 
ForkJoinPools and parallel streams
ForkJoinPools and parallel streamsForkJoinPools and parallel streams
ForkJoinPools and parallel streams
 
Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101Apache Solr for TYPO3 CMS 101
Apache Solr for TYPO3 CMS 101
 
Intro to openfaas
Intro to openfaasIntro to openfaas
Intro to openfaas
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
apachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdfapachecamelk-april2019-190409093034.pdf
apachecamelk-april2019-190409093034.pdf
 
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in ZalandoPostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
PostgreSQL Finland October meetup - PostgreSQL monitoring in Zalando
 
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation OptionsOracle ADF Architecture TV - Design - Task Flow Navigation Options
Oracle ADF Architecture TV - Design - Task Flow Navigation Options
 
BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010BlackRay FOSS Asia 2010
BlackRay FOSS Asia 2010
 
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
NoSQL & SQL - Best of both worlds - BarCamp Berkshire 2013
 
Advance Features of Hibernate
Advance Features of HibernateAdvance Features of Hibernate
Advance Features of Hibernate
 
Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2Apache Camel K - Copenhagen v2
Apache Camel K - Copenhagen v2
 
Apache Camel K - Copenhagen
Apache Camel K - CopenhagenApache Camel K - Copenhagen
Apache Camel K - Copenhagen
 
Akka Clustering And Sharding
Akka Clustering And ShardingAkka Clustering And Sharding
Akka Clustering And Sharding
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Rails 3 : Cool New Things
Rails 3 : Cool New ThingsRails 3 : Cool New Things
Rails 3 : Cool New Things
 
Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdf
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 

Último

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

SolrCloud and Shard Splitting

  • 1. SolrCloud and Shard Splitting Shalin Shekhar Mangar
  • 2. Bangalore Lucene/Solr Meetup 8th June 2013 Who am I? ● Apache Lucene/Solr Committer and PMC member ● Contributor since January 2008 ● Currently: Engineer at LucidWorks ● Formerly with AOL ● Email: shalin@apache.org ● Twitter: shalinmangar ● Blog: http://shal.in
  • 3. Bangalore Lucene/Solr Meetup 8th June 2013 SolrCloud: Overview ● Distributed searching/indexing ● No single points of failure ● Near Real Time Friendly (push replication) ● Transaction logs for durability and recovery ● Real-time get ● Atomic Updates ● Optimistic Concurrency ● Request forwarding from any node in cluster ● A strong contender for your NoSQL needs as well
  • 5. Bangalore Lucene/Solr Meetup 8th June 2013 Document Routing 80000000-bfffffff 00000000-3fffffff 40000000-7fffffff c0000000-ffffffff shard1shard4 shard3 shard2 1f27 3c7 1 (MurmurHash 3) 1f27 000 0 1f27 ffffto (hash) shard 1 q=my_query shard.keys=BigCo! numShards=4 router=compositeId id = BigCo!doc5
  • 6. Bangalore Lucene/Solr Meetup 8th June 2013 SolrCloud Collections API ● /admin/collections?action=CREATE&name=mycollection – &numShards=3 – &replicationFactor=4 – &maxShardsPerNode=2 – &createNodeSet=node1:8080,node2:8080,node3:8080,... – &collection.configName=myconfigset ● /admin/collections?action=DELETE&name=mycollection ● /admin/collections?action=RELOAD&name=mycollection ● /admin/collections?action=CREATEALIAS&name=south – &collections=KA,TN,AP,KL,... ● Coming soon: Shard aliases
  • 7. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Background ● Before Solr 4.3, number of shards had to fixed at the time of collection creation ● Forced people to start with large number of shards ● If a shard ran too hot, the only fix was to re-index and therefore re-balance the collection ● Each shard is assigned a hash range ● Each shard also has a state which defaults to 'ACTIVE'
  • 8. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Features ● Seamless on-the-fly splitting – no downtime required ● Retried on failures ● /admin/collections? action=SPLITSHARD&collection=mycollection – &shard=shardId ● A lower-level CoreAdmin API comes free! – /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2 – /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
  • 9. Bangalore Lucene/Solr Meetup 8th June 2013 Shard2_0 Shard1 replic a leade r Shard2 replic a leade r Shard3 replic a leade r Shard2_1 update Shard Splitting
  • 10. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Mechanism ● New sub-shards created in “construction” state ● Leader starts forwarding applicable updates, which are buffered by the sub-shards ● Leader index is split and installed on the sub-shards ● Sub-shards apply buffered updates ● Replicas are created for sub-shards and brought up to speed ● Sub-shard becomes “active” and old shard becomes “inactive”
  • 11. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Tips and Gotchas ● Supports collections with a hash based router i.e. “plain” or “compositeId” routers ● Operation is executed by the Overseer node, not by the node you requested ● HTTP request is synchronous but operation is async. A read timeout does not mean failure! ● Operation is retried on failure. Check parent leader's logs before you re-issue the command or you may end with more shards than you want
  • 12. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Tips and gotchas ● Solr Admin GUI is not aware of shard states yet so the inactive parent shard is also shown in “green” ● The CoreAdmin split command can be used against non- cloud deployments. It will spread docs alternately among the sub-indexes ● Inactive shards have to be cleaned up manually. Solr 4.4 will have a delete shard API ● Shard splitting in 4.3 release is buggy. Wait for 4.3.1
  • 13. Bangalore Lucene/Solr Meetup 8th June 2013 Shard Splitting: Looking towards the future ● GUI integration and better progress reporting/monitoring ● Better support for custom sharding use-cases ● More flexibility towards number of sub-shards, hash ranges, number of replicas etc ● Store replication factor per shard ● Suggest splits to admins based on cluster state and load
  • 14. Confidential and Proprietary © 2012 LucidWorks14 About LucidWorks • Intro to LucidWorks (formerly Lucid Imagination) – Follow: @lucidworks, @lucidimagineer – Learn: http://www.lucidworks.com • Check out SearchHub: http://www.searchhub.org • Solr 4.1 Reference Guide: http://bit.ly/11KSiMN – Older versions: http://bit.ly/12t1Egq • Our Products – LucidWorks Search – LucidWorks Big Data • Lucene Revolution – http://www.lucenerevolution.com
  • 15. Bangalore Lucene/Solr Meetup 8th June 2013 Thank you Shalin Shekhar Mangar LucidWorks