Lucidworks Senior Search Engineer, Evan Sayer, and Enterprise Content Management and Big Data Architect for the County of Sacramento, Guy Sperry, explore the benefits of replacing Google Search Appliance with Lucidworks Fusion.
Webinar: Replace Google Search Appliance with Lucidworks Fusion
1.
2. Replacing
GSA
with
Lucidworks
Fusion
Evan
Sayer
Senior
Search
Engineer
Lucidworks
Guy
Sperry
Enterprise
Content
Management
&
Big
Data
Architect
County
of
Sacramento
3.
4. Introduc)on
• Lucidworks
– Founded
in
2007
– Contributes
~70%
of
the
open-‐source
code
commiJed
to
the
Apache
Lucene/Solr
project
• Lucidworks
Fusion:
our
enterprise
search
plaNorm
built
on
top
of
Apache
Solr
• Apache
Solr:
the
most
popular
open-‐source
enterprise
search
engine
on
Earth
5. Google
Search
Appliance
(GSA)
• Google’s
enterprise
search
soluPon
offered
from
2002-‐2016
• One-‐stop
shopping:
a
complete
enterprise-‐search
soluPon
in
one
box
• EoL
as
of
February
2016,
support
phased-‐out
completely
by
2018
6. GSA
Strengths
• Easy
to
setup
and
configure
–
“plug
and
play”
– Lower
start-‐up
cost
and
lower
Pme-‐to-‐value
than
many
other
contemporary
soluPons
– RelaPvely
straighNorward
to
operate
on
an
ongoing
basis
– Achieve
a
decent
search
experience
quite
quickly
and
easily
• Takeaway:
GSA
minimized
necessary
investment
in
technical
experPse
7. Replacing
GSA
with
Fusion
• Easy
to
setup
and
configure,
“plug
and
play”
– Fusion
Index
Workbench
• Quickly
connect
to
and
ingest
data
• IntuiPvely
iterate
on
improving
search
results
• Easily
A/B
test
tweaks
to
ETL
logic
– Dashboards
and
Log
AnalyPcs
– Monitoring/alerPng
APIs
that
integrate
with
common
tools
to
ease
ongoing
maintenance
8. GSA
Strengths
• Out-‐of-‐box
search
UI
– Highly
useful
during
development,
iteraPng
on
relevancy
improvements,
etc.
– Customizable
enough
to
use
as
an
end-‐user
search
UI
• Takeaway:
GSA
minimized
necessary
investment
in
technical
experPse
9. Replacing
GSA
with
Fusion
• Out-‐of-‐box
search
UI
– Lucidworks
View
• Highly
customizable/”skin-‐able”
• Fully
open-‐source:
hJps://github.com/
lucidworks/lucidworks-‐
view
• Built
on
top
of
a
modern
stack
(AngularJS)
10. GSA
Strengths
• Broad
support
for
connecPng
to,
ingesPng,
and
securing
data
– Many
out-‐of-‐box
connectors
to
common
sources:
CRM,
Wikis,
databases
etc.
– Extensible
connector
framework
• Takeaway:
GSA
minimized
necessary
investment
in
technical
experPse
11. Replacing
GSA
with
Fusion
• Broad
support
for
connecPng
to,
ingesPng,
and
securing
data
– Fusion
ships
with
~40
connectors
to
common
sources
• JDBC,
Web,
Alfresco,
Box,
Dropbox,
Drupal,
Github,
Google
Drive,
Jive,
JIRA,
Sharepoint,
MongoDB,
Hadoop/HDFS,
Salesforce,
Slack,
lots
more…
• Fusion
connectors’
security-‐trimming
funcPonality
secures
content/searches
out-‐of-‐box
– Fusion
Index
Pipelines
enable
easily
pushing
data
into
the
index
as
well,
via
a
REST
API
– Custom
connector
development
via
Fusion’s
Connectors
API
12. GSA
Weaknesses
• Broad
theme:
insufficient
control
over
the
search
experience
– Relevancy
tuning
and
controls
are
exceedingly
opaque
• “Source
Biasing”:
+/-‐
[strong|medium|weak]
– Lack
of
control
over
indexing
workflow
• Custom
metadata
processing
was
a
chore,
if
feasible
– Oren
referred
to
as
a
“black
box”
design
• Non-‐trivial
to
scale
– Appliance
packaging
restricts
freedom
in
scaling
up
– Per-‐document
pricing
model
• Incorrect
facet
counts!?
14. Fusion
–
Fine-‐grained
Control
over
*Everything*
• Fusion
Index
Pipelines
– True
fine-‐grained
control
over
ETL;
as
much
or
as
liJle
as
desired
• For
content
from
source
X,
I
want
to
redact
this
set
of
keywords
• For
content
from
source
Y,
I
want
to
extract
the
Ptle
from
this
HTML
tag
• For
content
from
source
Z,
I
want
to
lookup
the
authorized
groups
from
another
database,
and
add
them
to
a
field
in
each
document
• Fusion
Query
Pipelines
– True
fine-‐grained
control
over
request/response
logic
at
query-‐Pme
• For
queries
containing
keyword
X,
I
want
to
rewrite
the
query
to
be
something
else
• For
queries
in
language
Y,
I
want
to
boost
results
matching
in
this
separate
set
of
fields
• For
matching
documents
containing
keyword
Z,
I
want
to
redact
all
occurrences
of
Z
before
returning
the
results
– Fusion
signals:
collect
users’
queries+clicks
and
aggregate
them
over
Pme
• UPlize
this
knowledge
to
dynamically
boost
the
most
commonly-‐clicked
item(s)
for
a
given
query
• ConPnually
improve
relevancy
without
manual
human
input
• If
you’re
already
familiar
with
Solr/Lucene,
hack
away!
J
15. Fusion
–
Fine-‐grained
Control
over
*Everything*
• Scaling
– Fusion
uPlizes
best-‐in-‐class
Apache
Solr
as
the
backend
search
engine
• Scale
to
billions
of
documents
linearly
– Fusion
services
scale
independently
• As
opposed
to
GSA,
which
scaled
in
units
of
enPre
appliances
• If
you
want
to
ingest
content
faster,
add
addiPonal
connectors
nodes
• If
you
want
to
enable
greater
query
throughput,
add
addiPonal
query-‐processing
nodes
– StraighNorward
APIs/processes
for
provisioning
addiPonal
nodes
• Just
spin
up
a
new
node,
install
Fusion,
and
point
it
at
the
central
cluster
manager
(Apache
Zookeeper)
• Easily
overlay
Fusion
on
top
of
any
exisPng
Solr
cluster
16. Fusion
as
a
plaDorm
• Get
started
with
ease:
hJps://lucidworks.com/products/fusion/download/
1. Point
Fusion
at
your
data
2. Setup
a
simple
baseline
search
app
with
Lucidworks
View
3. Iterate
on
the
actual
search
experience
to
your
heart’s
content
J
• Delve
into
the
details
(or
don’t!)
– Fusion
provides
the
necessary
framework
to
tackle
tough
and/or
use-‐case-‐specific
search
problems
– Anything
but
a
“black
box”
design
– Most
components
are
customizable
and
extensible
• Implement
your
own
Fusion
components
in
Java
using
our
APIs
• Scale
with
minimal
effort,
maximal
flexibility
– Scale
linearly
up
to
billions
of
docs
with
Apache
Solr
– Self-‐service
APIs
for
se{ng
up
addiPonal
nodes
to
expand
capacity
– Per-‐node
instead
of
per-‐doc
pricing
means
fewer
surprises
when
it’s
Pme
to
renew
licenses
17. “Fusion gave us the features we needed to replace
Google Search Appliance in a matter of weeks. With
Fusion’s out-of-the-box capabilities, we skipped months
in our dev cycle so we could focus our team where they
would have the most impact. We cut our licensing costs
by 50% and improved application usability. The
Lucidworks professional services team amplified our
success even further.
“We’re all Fusion from here on out!”
Lourduraju Pamishetty
Senior IT Application Architect
Infoblox
19. Fusion
as
a
plaDorm
• Accurate
facet
counts
– What
a
concept!
J
• Take
Fusion
for
a
spin:
hJps://lucidworks.com/products/fusion/
download/
20. Agenda
• IntroducPon
to
County
of
Sacramento
• Why
Sacramento
County
is
search
first
for
data
delivery
• How
Fusion
helps
us
meet
our
data
delivery
challenges
• How
Fusion
has
helped
us
fill
gaps
ler
by
GSA
rePrement
21. Sacramento
County
• 34
departments
and
affiliated
organizaPons
serving
1.5
million
people
• Commitment
to
open
government
and
transparency
• CiPzen
engagement
22. Why
Sacramento
County
is
Search
First
• Enterprise
apps,
data
snackers
and
LOB
apps
– ADABAS
(Mainframe)
– RDBMS
– CDH
– ECM
• Diverse,
heterogeneous
data
environment
• Our
challenge:
securely
deliver
prompt
access
to
relevant
data
23. Fusion/Solr
in
Sacramento
County
• Documents
and
content
– Cross-‐repository
search
– Source
repository
security
• GIS
• Cross-‐Source
Data
Processing
and
AnalyPcs
– Fusion
connectors
– Spark
in
Fusion
• Log
Analysis
• NOSQL
– Why
be
MEAN
when
you
can
be
SANE?
25. AgendaSearch.saccounty.net
• The
Brown
Act
– Make
public
meePngs
accessible
to
ciPzens
– Maintain
transparency
• AgendaSearch
– Search
and
consume
public
documents
– Integrate
with
agenda
management
– Lucidworks
View
– Has
reduced
PRAs
26. Immediate
Win
with
View
• County
Legal
Counsel
• ~2
million
document
archive
• Document
level
security
• IntuiPve
and
feature
rich
UI
• Search
soluPon
delivered
before
lunch