Solr Architecture

By:
Ramez
Ibrahim
AL
Fayez

Agenda

¡ Introduc9ons

¡ What
is
Solr?

¡ Main
Solr
Features
and
A@ributes

¡ Content,
Query,
Facet,
API,
Scalability

¡ Interface
and
useful
commands

¡ Live
Demo

Introduc9on

—  Search
has
become
mission
cri9cal
for
most
enterprises

—  Intranet

—  Web
presence

—  E-‐commerce

—  Exponen9al
growth
of
data

—  Cost
of
not
ﬁnding
informa9on

—  Knowledge
(sharing)

—  Time

—  Money

—  Informa9on
blackhole

What
is
Solr?

Oﬃcial
deﬁni,on:

“Solr
is
an
open
source
enterprise
search
pla7orm
based
on
the

Lucene
Java
search
library,
with
an
HTTP
interface
using
XML,

JSON
or
other
formats.
It
provides
hit
highligh,ng,
faceted

search,
caching,
replica,on,
a
web
administra,on
interface
and

many
more
features.
It
runs
in
a
Java
servlet
container
such
as

Apache
Tomcat.”

— h#p://lucene.apache.org/solr

What
is
Solr?

—  In
2004,
Solr
was
created
by
Yonik
Seeley
at
CNET
Networks
as
an
in-‐house
project

to
add
search
capability
for
the
company
website.

—  Open-‐source,
license-‐free
search
engine

—  Built
on
top
of
Apache
Lucene
library,
and
adds
enterprise
search
server
features

and
capabili9es

—  Web
based
applica9on
that
processes
requests
and
returns
responses
via
HTTP,

and
APIs

Why
choosing
Solr?

—  Customizable

—  High
quality
and
easily
modiﬁable
relevancy

—  Very
fast
query
and
indexing
performance

—  Open
source
so^ware
is
free

—  Highly
ﬂexible
data
processing/transforma9on

—  Easy
scalability
and
great
performance

—  Modern
solu9on
architecture
based
on
XML
and
Java

—  Well
integrated
with
the
ecosystem
around
Big
Data,
such
as
Hadoop
(also

Nutch,
Tika)

Solr’s
Main
Features

—  Full
text
search

—  Field
search

—  Number
and
date
searching

—  Facets

—  Spelling
assistance
–
“Did
you
mean…?”

—  Related
hits

—  Query
comple9on

—  Admin
GUI

—  Data
Import
Handler

—  Index
Databases,
Mails,
RSS,
XMLs
etc.

—  Rich
document
support

—  PDF,
MS
Oﬃce,
Images
etc

—  Replica9on
for
high
query
volume

—  Distributed
search
for
large
indexes

—  Produc9on
systems
with
1B+
documents

—  Very
extensible
and
customizable

—  Embedded
in
commercial
search
products

from
LucidWorks,
DataStax,
Cloudera,

Hortonworks,
Amazon
CloudSearch
and
Riak

Main
A@ribute

—  Index(ing)

—  Inverted
index

—  Document

—  Field

—  Stored
and/or
indexed

ﬁelds

—  Analysis

—  Tokeniza9on

—  Filters

—  Terms

—  Query

—  Filter

—  Func9on

—  Facet

Content

—  Out
of
the
box
support
for
JSON

—  Solr
handles
CSV,
XML,
Rich
Content
out
of
the
box
without

having
to
install
plugins

Indexing
and
Ranking

—  Solr
use
Inverted
index

—  For
ranking,
solr
use
TF-‐IDF
and
Similarity

—  Similarity
is
a
combina9on
of
Boolean
model
(BM)
and

Vector
Space
Model
(VSM)

—  Another
feature,
user
can
do
re-‐rank
to
the
query

Query

—  Common
parameters

—  Start,
rows,
ﬂ,
fq,
sort

?q=*:*&start=0&rows=10&ﬂ=9tle&fq=collec9on:popular&sort=9tle
asc

—  Slightly
more
advanced

—  &facets

—  &qf

&qf=keyword^4
content1^8
content2^3
content3^2
stem1^1.5
stem2^1.2

stem3^0.5

Facet

“Faceted
search
is
the
dynamic
clustering
of
items
or
search
results

into
categories
that
let
users
drill
into
search
results
(or
even
skip

searching
en9rely)
by
any
value
in
any
field.
“

—  Naviga9on/discovery
technique

—  Tally
of
docs
for
each
dis9nct
field
value

—  Parameters

—  &facet=true

—  &facet.field=category

API

—  REST
API
for
adding
ﬁeld
types,
and
dynamic
ﬁelds

—  Managing
Request
Handlers
through
API

—  Improved
APIs
for
managing
collec9ons

—  Implicit
registra9on
of
replica9on,
Real
Time
Get
and
Administra9on

Handlers

—  Out
of
the
box
support
for
JSON

—  Solr
handles
CSV,
XML,
Rich
Content
out
of
the
box
without
having
to
install

plugins

Scalability

—  Architecture
goals:

—  More
queries
per
second
(qps)

—  Faster
query
execu9on

—  Bigger
indexes

—  Faster
indexing

—  Scaling
op9ons

—  Mul9core

—  Replica9on

—  Sharding

Useful
commands

—  ./bin/solr
{start|stop}

—  ./bin/solr
create
-‐c
<COLL_NAME>

—  bin/post
-‐c
<COLL_NAME>
<Files
to
index>

—  /bin/solr
delete

Solr Architecture

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Solr Architecture

Semelhante a Solr Architecture (20)

Mais de Ramez Al-Fayez

Mais de Ramez Al-Fayez (7)

Último

Último (20)

Solr Architecture