Solr cluster with SolrCloud at lucenerevolution (tutorial)

Lucene
revolu+on
2013
SIMPLE & “CHEAP” SOLR CLUSTER
Stéphane Gamard
Searchbox CTO <stephane.gamard@searchbox.com>
1Lucene
revolu+on
2013

Lucene
revolu+on
2013 2
Searchbox
-‐
Search
as
a
Service
“We
are
in
the
business
of
providing

search
engines
on
demand”

Lucene
revolu+on
2013
Solr
Provisioning
3
High
Availability
• Redundancy
• Sustained
QPS
• Monitoring
• Recovery
Index
Provisioning
• Collec+on
crea+on
• Cluster
resizing
• Node
distribu+on

Lucene
revolu+on
2013
Solr
Clustering
4
LB
Master
Slave
Slave
Master
Slave
Backup Backup
Master
Slave
Slave
LB
Monitoring
Before
4.x:
Master/Slave
Custom
Rou+ng
Complex
Provisioning

Lucene
revolu+on
2013
Solr
Clustering
5
A6er
4.x:
Nodes
Automa+c
Rou+ng
Simple
Provisioning
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
Thank
you

to
the
SolrCloud
Team
!!!

Lucene
revolu+on
2013
What
is
SolrCloud?
6
Backward
compa=bility
• Plain
old
Solr
(with
Lucene
4.x)
• Same
schema
• Same
solrconﬁg
• Same
plugins
Some
plugins
might
need
update
(distrib)

Lucene
revolu+on
2013
What
is
SolrCloud?
7
Centralized
conﬁgura=on
• /conf
• /conf/schema.xml
• /conf/solrconﬁg.xml
• numShards
• replica+onFactor
• ...
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB

Lucene
revolu+on
2013
What
is
SolrCloud?
8
Conﬁgura=on
&
Architecture
Agnos=c
Nodes
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
• ZK
driven
conﬁgura+on
• Shard
(1
core)
• ZK
driven
role:
• Leader
• Replica
• Peer

&
Replica+on
• Disposable

Lucene
revolu+on
2013
What
is
SolrCloud?
9
Automa=c
Rou=ng
Node
Monitoring
Node Node Node
ZK
NodeNode Node
ZK
ZK
LB LB
• Smart
client
connect
to
ZK
• Any
node
can
forward
a

requests
to
node
that
can

process
it

Lucene
revolu+on
2013
What
is
SolrCloud?
10
Collec=on
API
• Abstrac+on
level
• An
index
is
a
collec+on
• A
collec+on
is
a
set
of
shards
• A
shard
is
a

set
of
cores
• CRUD
API
for
collec+on
“Collec?ons
represents
a
set
of
cores
with

iden)cal
conﬁgura?on.
The
set
of
cores
of

a
collec?on
covers
the
en?re
index”

Lucene
revolu+on
2013
What
is
SolrCloud?
11
Node
Core
Shard
Collec=on Abstrac+on
level
of
interac+on
&
conﬁg
Scaling
factor
for
collec+on
size
(numShards)
Scaling
factor
for
QPS
(replica?onFactor)
Scaling
factor
for
cluster
size
(liveNodes)
=>
SolrCloud
is
highly
geared
toward
horizontal
scaling

Lucene
revolu+on
2013 12
nodes
=>
Single
eﬀort
for
scalability

That’s
SolrCloud
High
Availability
• Redundancy
• Sustained
QPS
• Monitoring
• Recovery
#
replicas
ZK
(clusterstatus,
livenodes)
peer
&
replica+on
#
replicas
&
#
shards

Lucene
revolu+on
2013 13
Collection
Shards
Cores
Nodes
SolrCloud
-‐
Design
Key
metrics
• Collec+on
size
&
complexity
• JVM
requirement
• Node
requirement

Lucene
revolu+on
2013 14
SolrCloud
-‐
Collec+on
Metrics
Pubmed
Index
• ~12M
documents
• 7
indexed
ﬁelds
• 2
TF
ﬁelds
• 3
sorted
Fields
• 5
stored
Fields

Lucene
revolu+on
2013 15
A
note
on
sharding “The
magic
sauce
of
webscale”
Ram
requirement
eﬀect
0"
1000"
2000"
3000"
4000"
5000"
6000"
0" 2" 4" 6" 8" 10" 12"
RAM$/$Shard$
# shards
ram

Lucene
revolu+on
2013 16
A
note
on
sharding “The
magic
sauce
of
webscale”
Disk
requirement
eﬀect
0"
5"
10"
15"
20"
25"
30"
35"
40"
45"
50"
0" 2" 4" 6" 8" 10" 12" 14" 16"
Disk%/%shard%
# shards
diskspace
“hidden
quote
for
the
book”

Lucene
revolu+on
2013 17
SolrCloud
-‐
Collec+on
Configura+on
Pubmed
Index
• ~12M
documents
• 7
indexed
fields
• 2
TF
fields
• 3
sorted
Fields
• 5
stored
Fields
Configura=on
• numShards:
3
• replica+onFactor:
2
• JVM
ram:
~3G
• Disk:
~15G

Lucene
revolu+on
2013 18
SolrCloud
-‐
Core
Sizing
Heuris=cally
inferred
from
“experience”
• Size
on
shard,
not
collec+on
• Do
NOT
starve
resources
on
nodes
• Senle
for
JVM/Disk
sizing

• Large
amount
of
spare
disk
(op+mize)
RAM Disk
3
G 60
G

Lucene
revolu+on
2013 19
SolrCloud
-‐
Cluster
Availability
Depends
on
the
nodes!!!
Instance ram disk $/h Nodes Min Size $/core/m
m1.medium 3.75 410 0.12 1 6 6 87
m1.large 7.5 850 0.24 2 6 12 87
m1.xlarge 15 1690 0.48 5 6 30 70
m2.xlarge 17.1 420 0.41 5 6 30 60
m2.2xlarge 34.2 850 0.82 11 6 66 54
m1.medium 3.75 410 0.12 3 6 18 28
CCtrl
(paas) 1.02 420 -‐ 1 6 6 75( )

Lucene
revolu+on
2013 20
SolrCloud
-‐
Monitoring
Solr
Monitoring
• clusterstate.json
• /livenodes
Node
Monitoring
*
• load
average
• core-‐to-‐resource
consump+on
(Core
to
CPU)
• collec+on-‐to-‐node
consump+on
(LB
logs)

Lucene
revolu+on
2013 21
SolrCloud
-‐
Provisioning
Stand-‐by
nodes
• Automa+cally
assigned
as
replica
• provides
a
metric
of
HA
Node
addi=on
*
(self
healing)
• Scheduled
check
on
cluster
conges+on
• Automa+cally
spawn
new
nodes
per
need

Lucene
revolu+on
2013 22
SolrCloud
-‐
Conclusion
Using
SolrCloud
is
like
juggling
• Gets
bener
with
prac+ce
• There
is
always
some
magic
leq
• Could
become
very
overwhelming
• When
it
fails
you
loose
your
balls
Test
-‐>
Test
-‐>
Test
-‐>
some
more
Tests
-‐>
Test

Lucene
revolu+on
2013 23
What
would
make
our
current
SolrCloud
cluster

even
more
awesome:
• Balance/distribute
core
based
on
machine

load
• Standby
core
(replicas
not
serving
request

and
auto-‐shurng
down
Next
Steps

Lucene
revolu+on
2013 24
Requirement
for
solrCloud:
• Solr
Mailing
list:
solr-‐user@lucene.apache.org
Further
informa+on
• blogs
&
feed:
hnp://www.searchbox.com/blog/
• Searchbox
email:
contact@searchbox.com
Further
Informa+on

Lucene
revolu+on
2013
CONFERENCE PARTY
The Tipsy Crow: 770 5th Ave
Starts after Stump The Chump
Your conference badge gets
you in the door
TOMORROW
Breakfast starts at 7:30
Keynotes start at 8:30
CONTACT
Stephane Gamard
stephane.gamard@searchbox.com
25Lucene
revolu+on
2013

Solr cluster with SolrCloud at lucenerevolution (tutorial)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (13)

Semelhante a Solr cluster with SolrCloud at lucenerevolution (tutorial)

Semelhante a Solr cluster with SolrCloud at lucenerevolution (tutorial) (20)

Último

Último (20)

Solr cluster with SolrCloud at lucenerevolution (tutorial)