SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Good Information
Is Hard to Find:
Guidelines for Managers
Considering
Open Source
Enterprise Search
A Lucid Imagination White Paper
Abstract
Enterprise search helps your employees, customers, and partners find the most relevant
and timely information; they need it to make smart, efficient decisions about doing
business with and in your company. Open source has delivered great benefits to enterprise
software customers, with innovative operating systems, databases, and middleware and a
broad range of applications; now the open source model can unleash this value for your
enterprise search needs. Lucid Imagination brings market-leading expertise to open source
enterprise search, and can help any organization quickly design and optimize search
solutions based on Lucene and Solr.




Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009                                       Page i
Table of Contents
Introduction and Overview ............................................................................................................................... 1
The Advantages of Open Source ...................................................................................................................... 3
   Lower Costs ......................................................................................................................................................... 3
   Pay at the Point of Value................................................................................................................................. 4
   Transparent Development ............................................................................................................................ 5
   Re-tool the employees, retire the software............................................................................................. 5
   Lower Overall Risk ........................................................................................................................................... 6
About Lucid Imagination.................................................................................................................................... 6
Engagement Scenarios ........................................................................................................................................ 8
   Considering Alternatives to Legacy Packaged Search Applications .............................................. 9
   Building on In-house Lucene/Solr Expertise ...................................................................................... 11
Next Steps ............................................................................................................................................................. 12
Appendix: About Apache Lucene and Solr ....................................Error! Bookmark not defined.




Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009                                                                                                                   Page ii
Introduction and Overview
Raising the collective intelligence of company employees can make them smarter and more
efficient—but how do you enable them to keep up with the vast, ever-changing amount of
data your organization produces? Many operations seem to be better at creating data than
using it to operate more productively. Using search tools designed for the Web can make it
difficult to find relevant, timely corporate information, mostly because corporate data is
not much like Web data:
    •    Corporate data can be stored in a variety of different and unstructured formats,
         including documents and database records.
    •    A document’s popularity is not necessarily what makes it useful to a specific search.
    •    Information may require controlled access, yet still be discoverable to those users
         with the appropriate permissions.

Two state-of-the-art, open source search technologies—Lucene and Solr—are available for
free from the Apache Software Foundation. Lucene is a powerful search engine and library;
Solr provides a platform built on top of Lucene that makes it easy to build Lucene-based
applications.1 Rich, flexible text query tools and sophisticated ranking capabilities of
Lucene/Solr enable users to quickly find the most useful documents or records.
Either of these full-featured technologies delivers excellent performance, relevancy
ranking, and scalability. They are used today by thousands of organizations, powering
substantial and diverse search applications for AOL, CNET, Comcast Interactive Media, IBM,
Netflix, LinkedIn, MySpace, and many others. For these companies, Lucene/Solr solutions
regularly index and search hundreds of millions of documents with subsecond response
time, all without incurring any licensing fees.
These solutions excel at quickly and effectively searching large volumes of unstructured
text—documents or other records containing freeform text—and returning results based




1
 Most organizations use Solr today as their search development platform. Because Lucene serves as the core of
Solr’s search capabilities, this paper refers to them as Lucene/Solr. For more information about these technologies,
see the Appendix.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                                                  Page 1
on how well they match the user’s query. At most companies, this means digesting and
searching through dozens of different file formats—including documents, spreadsheets,
presentations, e-mail, and records stored in databases, to name just a few—and delivering
relevant results to authorized users. Incremental update capabilities mean that
Lucene/Solr searches can track document collections easily as they grow and change,
finding information nearly as fast as it is created.
Solr can speedily facet, or categorize, data and search results based on specific field values.
An excellent example of this function is Zappos.com, the popular shoe e-tailer, where users
can quickly refine searches based on product criteria such as price or features.
For most application development teams, building a search application is not an everyday
project. By definition, enterprise search technology processes unstructured data, which can
change frequently. Expert guidance on architectural considerations, such as index
optimization, result relevance, deployment configuration, and retrieval performance can
make a tremendous difference in deploying a successful solution. By taking advantage of
expert, experienced personnel to assist with application design, development, and
deployment, organizations can leverage the full benefit of Lucene/Solr search technologies
without the cost of licensing proprietary software.


                                    “Expert guidance on architectural
                                    considerations, such as index optimization,
                                    result relevance, deployment
                                    configuration, and retrieval performance
                                    can make a tremendous difference in
                                    deploying a successful solution.”

For these reasons, Lucid Imagination provides commercial-grade support, training, and
professional consulting services that are essential to designing and installing successful
enterprise applications.
This paper is intended for business decision makers who are considering options for
powerful, flexible enterprise search solutions. It provides guidelines for understanding:
•   Advantages of open source software, including ways it can lower costs and risks,

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                             Page 2
•     Why Lucid Imagination’s service and support is a key ingredient in achieving successful
      Lucene/Solr solutions,
•     Engagement scenarios—the types of situations where Lucid Imagination can help, and
•     The capabilities of Lucene/Solr, which are provided in an appendix.



The Advantages of Open Source
Open Source has changed the IT landscape. Gartner says 85 percent of polled companies
are already using open source software, calling the use of open source software
“pervasive.”2 Most organizations are now familiar with free and open source products such
as Linux, MySQL, Apache, and SugarCRM, because of the many benefits, including:
      •    Lower costs
      •    Pay at the point of value
      •    Transparent development
      •    Control and flexibility – investing in people instead of software licenses
      •    Lower overall risk

With Lucene/Solr’s broad, successful adoption across markets and deployments, these
advantages are now available for enterprise search applications. Let’s take a closer look at
how open source pays off.

Lower Costs
While proprietary software vendors must try to recover their development costs, this is not
the case with open source software, because it does not have capital costs associated with
source code IP. The cost of talent is less, too. Community development, adherence to
standards, and lower barriers to adoption all help increase the number of developers who




2
    http://www.theregister.co.uk/2008/11/18/gartner_open_source/

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                            Page 3
become proficient in the use of a product or technology. Together, these factors combine to
reduce upward pricing pressure.
The high license fees associated with proprietary and closed source development can
discourage developers and customers from adopting a product or technology. In contrast,
open source communities help lower costs by encouraging participation and allowing
anyone to download the source code and try it out. Most open source communities release
updated binaries on a periodic basis, so users can easily try the software on their own
timetables.


                                    “In most cases, however, the technology’s
                                    purchase price makes up less than half of
                                    the implementation cost, with the balance
                                    going to services.”

Many commercial solutions combine proprietary software with service and support, and
customers may believe that buying a software license is sufficient to get a search
application up and running. In most cases, however, the technology’s purchase price makes
up less than half of the implementation cost, with the balance going to services. Both open
source and proprietary software usually require a significant amount of customization,
which means some service and support costs are inevitable.

Pay at the Point of Value
Open source project code is freely available for any use. If a company can become proficient
with the code, it can make productive use of the code at any phase from evaluation to
production. Only in those areas where an open source customer sees value—for support
and integration services, or for additional functionality or expertise—does money need to
be spent. There are no restrictions on when open source software can be used.
In contrast, proprietary products typically must be purchased before they can be used, or
in some cases, even evaluated. Some vendors offer evaluation or trial versions, but these
often have reduced functionality or restrictive licenses. Because the software must be



Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                         Page 4
purchased before the customer can see any value from the product, return on investment is
delayed.

Transparent Development
Community-developed software enables everyone to see what is being built and which
features are included as early as possible. Developers and customers do not need to wait
for a vendor to publish a roadmap, or for a vendor product launch, to know what is being
readied for release. As a result, prospective users can make better, faster, and more
informed decisions relating to their software infrastructure.
Compare this to proprietary software, where customers have little if any insight into
upcoming products until very late in the product life cycle. This is typically no sooner than
the software’s beta release, when it is too late to provide input on features and
functionality. This delays assessment and adoption of innovations.

Re-tool the employees, retire the software
In this tough economic climate, managers who own budgets need to review every expense
with a critical eye. Many software applications that made sense a few years back may have
out-lived their intended fit to business needs.
Any application development effort generates significant learning. The work of
development imbues in-house developers with deep knowledge and understanding of the
company, its IT infrastructure, culture, and usage requirements. Given that software
applications must keep up with an organization’s changing goals and requirements as the
needs of its market and constituents evolve, the expertise which the technical staff
develops becomes is a vital competitive asset.
This is key corollary benefit of the open source model: by retiring old software packages
and investing in staff expertise, companies combine innovative technology with their most
valuable asset – their people, establishing vital competitive advantage.
Companies who leverage savings from not purchasing software licenses to build
development talent in-house reduce the cost of addressing inevitable change. What’s more,
increasing a technical team’s ability to translate company business objectives into
technology solutions increases the likelihood that the software they build will continue to
fit that inevitable change. This is particularly true for an enterprise search solution. What’s

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                             Page 5
more, compared to closed source implementations, in-house developers can work with
open source code and supplement additional functions or expertise by relying on the
community and marketplace of readily available resources – again capturing unique
competitive advantage.


                                    “The expertise which your technical staff
                                    develops becomes is a vital competitive
                                    asset.”

Supplementing open source development with training, consulting, and reliable support
from established industry experts reinforces a company’s competitive advantage – with the
control and flexibility needed to survive and thrive.

Lower Overall Risk
Vendors use proprietary interfaces and components to lock in customers. However, the
source code for open source software is freely available and widely supported by the
community, based on standardized, free public interfaces. If a commercial vendor goes out
of business (or is purchased by another), or tries to increase fees for a commercial product,
open source vendors may be able to step in to meet the needs of customers at market-
competitive prices.
Open source software can reduce security and operational risks, too. Widely used open
source software is essentially under constant peer review. Technical or security issues,
once exposed in the community, are readily addressed, resulting in a safer and more
reliable product.


About Lucid Imagination
The benefits of open source have unlocked tremendous value in many software categories:
Red Hat’s Enterprise Linux in operating systems, MySQL in database software, Sugar in
CRM software—all have benefited from matching the efficiencies of open source with deep,
robust commercial resources to ensure successful applications. Today, Lucid Imagination’s


Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                           Page 6
capabilities and expertise brings that same approach to unlocking enterprise search with
Lucene and Solr.
Lucid Imagination’s mission is to enable customers to achieve business objectives for
optimal search performance and accuracy, with lower total cost of ownership and faster
time to market. The company’s founding team consists of many key contributors and
committers to the Lucene/Solr project, as well as other experts in enterprise search
application development. Our skills, acquired across hundreds of deployments, including
best practices and technical know-how, can enhance and optimize any phase of an open
source search implementation.
Lucid Imagination’s team has a deep understanding of indexing, which is the foundation of
any search solution; it captures all the content and location of searched documents for
quick lookup, much as a book index does. We have broad experience indexing:
   •   Documents of widely varying sizes and formats within a very large collection,
   •   Documents with diverse metadata requirements, and
   •   Multilingual documents.

The team is also skilled at applying business rules such as boosting documents and fields,
indexing dates, or other attributes of terms and data. Lucid Imagination has developed best
practices for indexing and metadata management, and can help establish and refine
policies to meet business and technical search requirements, such as:
   •   How and when to add documents to an index,
   •   Removing documents from an index,
   •   Results relevancy and document/data findability
   •   Undeleting documents, and
   •   Batch and real-time updates.

The Lucid Imagination team has extensive experience with large-scale search applications,
including engagements with:
   •   Large collections—more than one billion documents,
   •   High query volumes and large user populations,
   •   High document growth rates,

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                            Page 7
•   Distributed indexing and searching,
   •   Replication and high availability, and
   •   Cloud environments.

In addition to fine-tuning search technology machinery, the Lucid Imagination team has
significant expertise in natural language processing, which optimizes the interaction of
compute resources with human-created content. Key considerations include:

   •   Developing structured methods for characterizing how well a set of results meets user
       needs,
   •   Establishing a tradeoff between overall net gain in the quality of results across the whole
       application, versus a single improvement for one query or user, and
   •   Improving the ability to find accurate answers by leveraging a balanced mix of content
       analysis and query interpretation algorithms.

The breadth of expertise offered by Lucid is available in a variety of forms suited to a range
of different business needs and deployment requirements. This enables customers to
create even more powerful and successful search applications.


Engagement Scenarios
Virtually every company and organization uses some form of enterprise search, to help
customers, employees, and partners find the information they need. Many companies use
packaged commercial software applications; but, over time, their requirements evolve
beyond the original platform’s limitations. Also, licensing or customization costs may grow
too high, or the number and type of documents may expand beyond the original design’s
capacity. As companies evaluate the ongoing fit of their current search applications to an
ever changing market and organizational landscape, they naturally ask “Is there a faster,
cheaper, more effective way to do this?”
Today, thousands of companies and organizations—each with unique search and retrieval
requirements—answered this question with Lucene/Solr. The essential value of Lucid
Imagination and open source Lucene/Solr technology is that it provides commercial
support that adapts to specific requirements. Whether a company is evaluating
Lucene/Solr for a new implementation, considering replacement of a commercial search

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                                Page 8
product, or enhancing an existing Lucene/Solr implementation, Lucid Imagination offers
skills and resources to help at every phase of the project life cycle.

Considering Alternatives to Legacy Packaged Search Applications
Change happens quickly, but taking advantage of new opportunities can be limited by
existing applications and traditional ways of doing things. Organizations with legacy search
applications often realize that they are paying too much to align packaged enterprise
search applications with evolving business requirements. In other cases, they discover it is
too difficult to integrate existing software with new services, or it takes too long to meet
new corporate goals. With the power of Lucene/Solr, Lucid Imagination supplies the
expertise organizations need to produce successful search solution efforts, more quickly
and less expensively—now and going forward—than other solutions.
   •   Consulting services are highly customized and able to engage quickly to shorten
       cycles and ramp times, minimize errors and design pitfalls, and improve production
       results. Lucid Imagination’s consulting team consists of senior search technologists
       who are intimately familiar with Lucene/Solr technologies and have extensive
       experience in field-tested search solutions for diverse deployment scenarios.


                                    “Organizations with legacy search
                                    applications often realize that they are
                                    paying too much to align packaged
                                    enterprise search applications with
                                    evolving business requirements.”

       Open source software is ideally suited to low-cost prototyping, because it can
       reduce time to deployment and refine the user experience. For customers striving to
       integrate a highly diverse base of data and documents, Lucid Imagination offers
       prototyping services to assist with the process.
   •   Technical training can bring everyone in the IT department up to speed on best
       practices and the elements of good search design—establishing a solid base of skills
       before coding begins. This can greatly reduce downstream problems and reduce

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                          Page 9
overall costs. Lucid Imagination works with in-house application and system
       administration teams to provide the knowledge transfer, guidance, training, and
       support required to implement an enterprise search solution that fits the
       organization’s specific needs.
   •   When dependable, predictable support is required to accompany an organization’s
       efforts on a regular basis over time, Lucid Imagination’s support subscriptions
       provide reliable access to domain experts during the entire application life cycle
       process.
              Technical Support features the latest tested versions and timely,
              predictable support turnaround times.
              Advanced Development Support provides expert architectural design,
              development, and testing guidance for building search applications using
              Lucene and Solr.
              Advanced Production Support provides expert advice on configuration,
              performance tuning, and optimization for applications deployed to a
              production operation environment with live users and service-level
              attainment regimes.
              Search Health Check, included with Advanced Support, is a comprehensive
              set of services that ensures applications are designed to meet recommended
              best practices for search configuration, optimization, and effectiveness.
              Custom Support packages are also available for unique situations.

   •   Lucid Imagination’s free 30-Day Get Started Program is available with downloads of
       Lucidworks, our certified distributions of Lucene and Solr. The Get Started Program
       complements Lucidworks with added guidance for questions on first-time
       installation, configuration, and basic usage, as well as evaluation of Lucene/Solr and
       included utilities. LucidWorks for Solr is the logical starting point for most
       developers building search applications with Lucene/Solr technology for websites,
       products, or internal organizational use, because it bundles the most recent and
       stable Apache/Solr capabilities, along with other tools and utilities.




Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                          Page 10
Building on In-house Lucene/Solr Expertise
Many organizations with in-house Lucene/Solr expertise have achieved considerable
sophistication in their deployments. Still, they may reach a point where it is difficult to
move the architecture or implementation past a particular design, deployment, or
optimization constraint. There can be many reasons for this, such as limitations on staff
expertise, design, or architecture. Configurations and policies may not have kept pace with
current best practices. A dependent part of the IT environment may have changed—
anything from upgraded complementary applications to new middleware, or expanded
data volume and variety.
For organizations that are ready to gain the required knowledge to move ahead, address
the current situation, and make sure that a deployment stays at peak performance, Lucid
Imagination recommends an in-depth engagement. Typically in a consultative format,
engagement begins with an in-depth assessment and review followed by best practices
design recommendations, and ends with a strategy proposal for achieving long-term,
sustainable innovation for search solutions.


                                    “A significant benefit of open source
                                    software is its ability to provide fast, low-
                                    cost prototyping as a means to reduce
                                    time to deployment and refine the user
                                    experience.”

Another key area where Lucid Imagination stands ready to help is in optimizing
performance—both in application response time and its utilization of hardware/software
resources. Lucid Imagination experts work with in-house teams to diagnose and improve
search application efficiencies.
As mentioned earlier, a significant benefit of open source software is its ability to provide
fast, low-cost prototyping as a means to reduce time to deployment and refine the user
experience. For customers that seek to integrate highly diverse bases of data and
documents, or accelerate evaluations of open source search solutions, Lucid Imagination
offers prototyping services.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                           Page 11
While community support has always been a significant benefit of open source projects,
tough issues may not always be answered in timely fashion or with the discretion
necessary to prevent exposure of confidential organizational knowledge. That’s when Lucid
Imagination’s expert teams can help.
Some companies are already skilled in open source technologies in general and
Lucene/Solr in particular. For these, Lucid Imagination offers Technical Support and
Advanced Support. Technical Support can provide answers within defined response times
for users encountering problems with Lucene/Solr projects or production
implementations.
Different levels of support address most situations. For example, an e-commerce startup
may find that community forums provide suitable answers, but not always as quickly as
needed. Basic Technical Support provides Web-based and e-mail support at competitive
rates for customers that do not require same-day response or direct telephone support.
Lucid Imagination also offers various levels of Technical Support for larger or mission-
critical installations, including fast turnaround, diagnosis, and bug fixes. Finally, Enterprise
Technical Support includes Search Health Checks by Lucid Imagination domain experts to
help ensure optimal runtime effectiveness.



Next Steps
For more information on how Lucid Imagination can help employees, customers, and
partners find the information they need, please visit http://www.lucidimagination.com to
access blog posts, articles, and reviews of dozens of successful implementations. Please e-
mail specific questions to:
Support and Service: support@lucidimagination.com
Sales and Commercial: sales@lucidimagination.com
Consulting: consulting@lucidimagination.com
Or call: 1.650.353.4057




Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                            Page 12
Appendix: Lucene/Solr Features and Benefits
Lucene and Solr are complementary technologies that offer very similar underlying capabilities. In
choosing a search solution that is best suited for your requirements, key factors to consider are
application scope, development environment, and software development preferences.
Lucene is a Java technology-based search library that offers speed, relevancy ranking, complete
query capabilities, portability, scalability, and low overhead indexes and rapid incremental
indexing.
Solr is the Lucene Search Server. It presents a web service layer built atop Lucene using the Lucene
search library and extending it to provide application users with a ready-to-use search platform.
Solr brings with it operational and administrative capabilities like web services, faceting,
configurable schema, caching, replication, and administrative tools for configuration, data loading,
statistics, logging, cache management, and more.
Lucene presents a collection of directly callable Java libraries and requires coding and solid
information retrieval experience. Solr extends the capabilities of Lucene to provide an enterprise-
ready search platform, eliminating the need for extensive programming.
Solr provides the starting point for most developers who are building a Lucene-based search
application. It comes ready to run in a servlet container such as Tomcat or Jetty, making it ready to
scale in a production Java environment.
With convenient ReST-like/web-service interfaces callable over HTTP, and transparent XML-based
configuration files, Solr can greatly accelerate application development and maintenance. In fact,
Lucene programmers have often reported that they find Solr contains “the same features I was
going to build myself as a framework for Lucene, but already very well implemented.” Using Solr,
enterprises can customize the search application according to their requirements, without
involving the cost and risk of writing the code from the scratch.
Lucene provides greater control of your source code and works best in development environments
where resources need to be controlled exclusively by Java API calls. It works best when
constructing and embedding a state-of-the-art search engine, allowing programmers to assemble
and compile inside a native Java application. While working with Lucene, programmers can directly
control the large set of sophisticated features with low-level access, data, or state manipulation.
Enterprises that do not require strict control of low-level Java libraries generally prefer Solr, as it
provides ease of use and scalable search power out of the box.




Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                                     Page 13
As functional siblings, Lucene and Solr have become popular alternatives for search applications;
the two differ mainly in the style of application development used. Key benefits of search with
Lucene/Solr include:
   •   Search Quality: Speed, Relevance, and Precision Lucene/Solr provides near-real-time search
       and strong relevance ranking to deliver contextually relevant and accurate results very quickly.
       Tailor-made coding for relevancy ranking and sophisticated search capabilities like faceted search
       help users in sorting, organizing, classifying, and structuring retrieved information to ensure that
       search delivers desired results. Search with Lucene/Solr also provides proximity operators,
       wildcards, fielded searching, term/field/document weights, find-similar functions, spell checking,
       multilingual search, and much more.
   •   Lower Cost and Greater Flexibility, Plug and Play Architecture Lucene/Solr reduces
       recurring and nonrecurring costs, lowering your TCO. As open source software, it does not
       require purchase of a license and is freely available for use. The open source code can be used as
       is, modified, customized, and updated as appropriate to your needs. Solr is easily embedded in
       your enterprise’s existing infrastructure, reducing costs of installation, configuration, and
       management.
   •   Open Source Platform for Portability and Easy Deployment Because Lucene/Solr is an open-
       source software solution, it is based on open standards and community-driven development
       processes. It is highly portable and can run on any platform that supports Java. For instance, you
       can build an index on Linux and copy it to a Microsoft Windows machine and search there. This
       unsurpassed portability enables you to keep your search application and your company’s evolving
       infrastructure in tandem. Lucene, in turn, has been implemented in other environments, including
       C#, C, Python, and PHP. At deployment time, Solr offers very flexible options; it can be easily
       deployed on a single server as well as on distributed, multiserver systems.
   •   Largest Installed Base of Applications, Increasing Customer Base Lucene/Solr is the most
       widely used open source search system and is installed in around 4,000 organizations worldwide.
       Publicly visible search sites that use Lucene/Solr include CNET, LinkedIn, Monster, Digg,
       Zappos, MySpace, Netflix, and Wikipedia. Lucene/Solr is also in use at Apple, HP, IBM, Iron
       Mountain, and Los Alamos National Laboratories.




Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                                      Page 14
•   Large Developer Base and Adaptability As community developed software, Lucene/Solr
       provides transparent development and easy access to updates and releases. Developers can work
       with open source code and customize the software according to business-specific needs and
       objectives. Its open source paradigm lets Lucene/Solr provide developers with the freedom and
       flexibility to evolve the software with changing requirements, liberating them from the
       constraints of commercial vendors.
   •   Commercial-Grade Support for Mission Critical Search Applications from Lucid
       Imagination Lucid Imagination provides the expertise, resources, and services that are needed to
       help enterprises deploy and develop Lucene-based search solutions efficiently and cost-
       effectively. Lucid helps enterprises achieve optimal search performance and accuracy with its
       broad range of expertise, which includes indexing and metadata management, content analysis,
       business rule application, and natural language processing. Lucid Imagination also offers certified
       distributions of Lucene and Solr, commercial-grade SLA-based support, training, high-level
       consulting and value-added software extensions to enable customers to create powerful and
       successful search applications.




Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • March 2010                                                     Page 15

Mais conteúdo relacionado

Mais procurados

Robin Meure Daniel McPherson - zevenseas - rapid circle - implementing gove...
Robin Meure   Daniel McPherson - zevenseas - rapid circle - implementing gove...Robin Meure   Daniel McPherson - zevenseas - rapid circle - implementing gove...
Robin Meure Daniel McPherson - zevenseas - rapid circle - implementing gove...
Wilco Turnhout
 
bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000
Kartik Padmanabhan
 

Mais procurados (20)

SaaS E- Book Part 1
SaaS E- Book Part 1SaaS E- Book Part 1
SaaS E- Book Part 1
 
Enterprise Data Management - Data Lake - A Perspective
Enterprise Data Management - Data Lake - A PerspectiveEnterprise Data Management - Data Lake - A Perspective
Enterprise Data Management - Data Lake - A Perspective
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
SPCA2013 - The Newest Trends in Document and Data Capture on Microsoft Platform
SPCA2013 - The Newest Trends in Document and Data Capture on Microsoft PlatformSPCA2013 - The Newest Trends in Document and Data Capture on Microsoft Platform
SPCA2013 - The Newest Trends in Document and Data Capture on Microsoft Platform
 
IBM Watson Explorer: Explore, analyze and interpret information for better bu...
IBM Watson Explorer: Explore, analyze and interpret information for better bu...IBM Watson Explorer: Explore, analyze and interpret information for better bu...
IBM Watson Explorer: Explore, analyze and interpret information for better bu...
 
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in KafkaSlides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
 
Cloud - NDT - Presentation
Cloud - NDT - PresentationCloud - NDT - Presentation
Cloud - NDT - Presentation
 
SharePoint Alternatives
SharePoint AlternativesSharePoint Alternatives
SharePoint Alternatives
 
Connectivity to business outcomes
Connectivity to business outcomesConnectivity to business outcomes
Connectivity to business outcomes
 
Portfolio of TechPoint Bisnis Solusi
Portfolio of TechPoint Bisnis SolusiPortfolio of TechPoint Bisnis Solusi
Portfolio of TechPoint Bisnis Solusi
 
Robin Meure Daniel McPherson - zevenseas - rapid circle - implementing gove...
Robin Meure   Daniel McPherson - zevenseas - rapid circle - implementing gove...Robin Meure   Daniel McPherson - zevenseas - rapid circle - implementing gove...
Robin Meure Daniel McPherson - zevenseas - rapid circle - implementing gove...
 
Don’t Make Bad Data an Excuse
Don’t Make Bad Data an ExcuseDon’t Make Bad Data an Excuse
Don’t Make Bad Data an Excuse
 
Insights success the 10 best hadoop solution provider companies nov 2017
Insights success the 10 best hadoop solution provider companies nov 2017Insights success the 10 best hadoop solution provider companies nov 2017
Insights success the 10 best hadoop solution provider companies nov 2017
 
Technology Intelligence for R&D
Technology Intelligence for R&DTechnology Intelligence for R&D
Technology Intelligence for R&D
 
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
 
Hybrid IT
Hybrid ITHybrid IT
Hybrid IT
 
bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000bigdatasqloverview21jan2015-2408000
bigdatasqloverview21jan2015-2408000
 
Pingar App for SharePoint
Pingar App for SharePointPingar App for SharePoint
Pingar App for SharePoint
 
SharePoint Jumpstart #1 Creating a SharePoint Strategy
SharePoint Jumpstart #1 Creating a SharePoint StrategySharePoint Jumpstart #1 Creating a SharePoint Strategy
SharePoint Jumpstart #1 Creating a SharePoint Strategy
 
Whitepaper the application network
Whitepaper   the application networkWhitepaper   the application network
Whitepaper the application network
 

Destaque

Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 PercentMicrosoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Enterprise Technology Management (ETM)
 

Destaque (10)

Content Aware SIEM™ defined
Content Aware SIEM™ definedContent Aware SIEM™ defined
Content Aware SIEM™ defined
 
Don't let wireless_detour_your_pci_compliance
Don't let wireless_detour_your_pci_complianceDon't let wireless_detour_your_pci_compliance
Don't let wireless_detour_your_pci_compliance
 
.The Complete Guide to Log and Event Management
.The Complete Guide to Log and Event Management.The Complete Guide to Log and Event Management
.The Complete Guide to Log and Event Management
 
Leveraging Log Management to provide business value
Leveraging Log Management to provide business valueLeveraging Log Management to provide business value
Leveraging Log Management to provide business value
 
Is Outsourcing Right for You?
Is Outsourcing Right for You?Is Outsourcing Right for You?
Is Outsourcing Right for You?
 
Managing The Virtualized Enterprise New Technology, New Challenges
Managing The Virtualized Enterprise New Technology, New ChallengesManaging The Virtualized Enterprise New Technology, New Challenges
Managing The Virtualized Enterprise New Technology, New Challenges
 
10 obvious statements about software configuration and change
10 obvious statements about software configuration and change10 obvious statements about software configuration and change
10 obvious statements about software configuration and change
 
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 PercentMicrosoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
 
Optimizing the Cloud Infrastructure for Enterprise Applications
Optimizing the Cloud Infrastructure for Enterprise ApplicationsOptimizing the Cloud Infrastructure for Enterprise Applications
Optimizing the Cloud Infrastructure for Enterprise Applications
 
Qradar Business Case
Qradar Business CaseQradar Business Case
Qradar Business Case
 

Semelhante a Liwp consider opensource2010

Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Lucidworks (Archived)
 
Strategic Advantage and the Microsoft Application Platform (1)
Strategic Advantage and the Microsoft Application Platform (1)Strategic Advantage and the Microsoft Application Platform (1)
Strategic Advantage and the Microsoft Application Platform (1)
Olivia Jones
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Google Simplicity Enterprise Wp
Google Simplicity Enterprise WpGoogle Simplicity Enterprise Wp
Google Simplicity Enterprise Wp
Juan Pittau
 
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Findwise
 
Analytics and Self Service
Analytics and Self ServiceAnalytics and Self Service
Analytics and Self Service
Mike Streb
 

Semelhante a Liwp consider opensource2010 (20)

Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
 
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source Search
 
Whitepaper- Real World Search
Whitepaper-  Real World SearchWhitepaper-  Real World Search
Whitepaper- Real World Search
 
Flow-ABriefExplanation
Flow-ABriefExplanationFlow-ABriefExplanation
Flow-ABriefExplanation
 
FAST Search-webinar-06-29-2010
FAST Search-webinar-06-29-2010FAST Search-webinar-06-29-2010
FAST Search-webinar-06-29-2010
 
Is your business ready for open source databases? | Sysfore
Is your business ready for open source databases? | SysforeIs your business ready for open source databases? | Sysfore
Is your business ready for open source databases? | Sysfore
 
Splunk for big_data
Splunk for big_dataSplunk for big_data
Splunk for big_data
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 
Strategic Advantage and the Microsoft Application Platform (1)
Strategic Advantage and the Microsoft Application Platform (1)Strategic Advantage and the Microsoft Application Platform (1)
Strategic Advantage and the Microsoft Application Platform (1)
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
OSTS_White_Paper
OSTS_White_PaperOSTS_White_Paper
OSTS_White_Paper
 
Accelerate Innovation & Productivity With Rapid Prototyping & Development - ...
Accelerate Innovation & Productivity With Rapid Prototyping & Development -  ...Accelerate Innovation & Productivity With Rapid Prototyping & Development -  ...
Accelerate Innovation & Productivity With Rapid Prototyping & Development - ...
 
infox technologies
infox technologiesinfox technologies
infox technologies
 
Crib Blogger
Crib  BloggerCrib  Blogger
Crib Blogger
 
Google Simplicity Enterprise Wp
Google Simplicity Enterprise WpGoogle Simplicity Enterprise Wp
Google Simplicity Enterprise Wp
 
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
Enterprise Search White Paper: Increase Your Competitiveness - Make a Knowled...
 
Analytics and Self Service
Analytics and Self ServiceAnalytics and Self Service
Analytics and Self Service
 

Mais de Enterprise Technology Management (ETM)

Mais de Enterprise Technology Management (ETM) (8)

The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...
The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...
The Unexpected Benefits of a Unified Approach to Governance, Risk, and Compli...
 
IMPROVING ORDER-TO-CASH CYCLE.
IMPROVING ORDER-TO-CASH CYCLE.IMPROVING ORDER-TO-CASH CYCLE.
IMPROVING ORDER-TO-CASH CYCLE.
 
The future of Finance
The future of FinanceThe future of Finance
The future of Finance
 
The Top Ten Insider Threats And How To Prevent Them
The Top Ten Insider Threats And How To Prevent ThemThe Top Ten Insider Threats And How To Prevent Them
The Top Ten Insider Threats And How To Prevent Them
 
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTH
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTHImplementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTH
Implementation Brief Active Endpoints’ ActiveVOS BPMS - ENABLING DYNAMIC GROWTH
 
Ibm social commerce_whitepaper
Ibm social commerce_whitepaperIbm social commerce_whitepaper
Ibm social commerce_whitepaper
 
Cloud view platform-highlights-web3
Cloud view platform-highlights-web3Cloud view platform-highlights-web3
Cloud view platform-highlights-web3
 
Kickfire: Best Of All Worlds
Kickfire: Best Of All WorldsKickfire: Best Of All Worlds
Kickfire: Best Of All Worlds
 

Liwp consider opensource2010

  • 1. Good Information Is Hard to Find: Guidelines for Managers Considering Open Source Enterprise Search A Lucid Imagination White Paper
  • 2. Abstract Enterprise search helps your employees, customers, and partners find the most relevant and timely information; they need it to make smart, efficient decisions about doing business with and in your company. Open source has delivered great benefits to enterprise software customers, with innovative operating systems, databases, and middleware and a broad range of applications; now the open source model can unleash this value for your enterprise search needs. Lucid Imagination brings market-leading expertise to open source enterprise search, and can help any organization quickly design and optimize search solutions based on Lucene and Solr. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page i
  • 3. Table of Contents Introduction and Overview ............................................................................................................................... 1 The Advantages of Open Source ...................................................................................................................... 3 Lower Costs ......................................................................................................................................................... 3 Pay at the Point of Value................................................................................................................................. 4 Transparent Development ............................................................................................................................ 5 Re-tool the employees, retire the software............................................................................................. 5 Lower Overall Risk ........................................................................................................................................... 6 About Lucid Imagination.................................................................................................................................... 6 Engagement Scenarios ........................................................................................................................................ 8 Considering Alternatives to Legacy Packaged Search Applications .............................................. 9 Building on In-house Lucene/Solr Expertise ...................................................................................... 11 Next Steps ............................................................................................................................................................. 12 Appendix: About Apache Lucene and Solr ....................................Error! Bookmark not defined. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • April 2009 Page ii
  • 4. Introduction and Overview Raising the collective intelligence of company employees can make them smarter and more efficient—but how do you enable them to keep up with the vast, ever-changing amount of data your organization produces? Many operations seem to be better at creating data than using it to operate more productively. Using search tools designed for the Web can make it difficult to find relevant, timely corporate information, mostly because corporate data is not much like Web data: • Corporate data can be stored in a variety of different and unstructured formats, including documents and database records. • A document’s popularity is not necessarily what makes it useful to a specific search. • Information may require controlled access, yet still be discoverable to those users with the appropriate permissions. Two state-of-the-art, open source search technologies—Lucene and Solr—are available for free from the Apache Software Foundation. Lucene is a powerful search engine and library; Solr provides a platform built on top of Lucene that makes it easy to build Lucene-based applications.1 Rich, flexible text query tools and sophisticated ranking capabilities of Lucene/Solr enable users to quickly find the most useful documents or records. Either of these full-featured technologies delivers excellent performance, relevancy ranking, and scalability. They are used today by thousands of organizations, powering substantial and diverse search applications for AOL, CNET, Comcast Interactive Media, IBM, Netflix, LinkedIn, MySpace, and many others. For these companies, Lucene/Solr solutions regularly index and search hundreds of millions of documents with subsecond response time, all without incurring any licensing fees. These solutions excel at quickly and effectively searching large volumes of unstructured text—documents or other records containing freeform text—and returning results based 1 Most organizations use Solr today as their search development platform. Because Lucene serves as the core of Solr’s search capabilities, this paper refers to them as Lucene/Solr. For more information about these technologies, see the Appendix. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 1
  • 5. on how well they match the user’s query. At most companies, this means digesting and searching through dozens of different file formats—including documents, spreadsheets, presentations, e-mail, and records stored in databases, to name just a few—and delivering relevant results to authorized users. Incremental update capabilities mean that Lucene/Solr searches can track document collections easily as they grow and change, finding information nearly as fast as it is created. Solr can speedily facet, or categorize, data and search results based on specific field values. An excellent example of this function is Zappos.com, the popular shoe e-tailer, where users can quickly refine searches based on product criteria such as price or features. For most application development teams, building a search application is not an everyday project. By definition, enterprise search technology processes unstructured data, which can change frequently. Expert guidance on architectural considerations, such as index optimization, result relevance, deployment configuration, and retrieval performance can make a tremendous difference in deploying a successful solution. By taking advantage of expert, experienced personnel to assist with application design, development, and deployment, organizations can leverage the full benefit of Lucene/Solr search technologies without the cost of licensing proprietary software. “Expert guidance on architectural considerations, such as index optimization, result relevance, deployment configuration, and retrieval performance can make a tremendous difference in deploying a successful solution.” For these reasons, Lucid Imagination provides commercial-grade support, training, and professional consulting services that are essential to designing and installing successful enterprise applications. This paper is intended for business decision makers who are considering options for powerful, flexible enterprise search solutions. It provides guidelines for understanding: • Advantages of open source software, including ways it can lower costs and risks, Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 2
  • 6. Why Lucid Imagination’s service and support is a key ingredient in achieving successful Lucene/Solr solutions, • Engagement scenarios—the types of situations where Lucid Imagination can help, and • The capabilities of Lucene/Solr, which are provided in an appendix. The Advantages of Open Source Open Source has changed the IT landscape. Gartner says 85 percent of polled companies are already using open source software, calling the use of open source software “pervasive.”2 Most organizations are now familiar with free and open source products such as Linux, MySQL, Apache, and SugarCRM, because of the many benefits, including: • Lower costs • Pay at the point of value • Transparent development • Control and flexibility – investing in people instead of software licenses • Lower overall risk With Lucene/Solr’s broad, successful adoption across markets and deployments, these advantages are now available for enterprise search applications. Let’s take a closer look at how open source pays off. Lower Costs While proprietary software vendors must try to recover their development costs, this is not the case with open source software, because it does not have capital costs associated with source code IP. The cost of talent is less, too. Community development, adherence to standards, and lower barriers to adoption all help increase the number of developers who 2 http://www.theregister.co.uk/2008/11/18/gartner_open_source/ Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 3
  • 7. become proficient in the use of a product or technology. Together, these factors combine to reduce upward pricing pressure. The high license fees associated with proprietary and closed source development can discourage developers and customers from adopting a product or technology. In contrast, open source communities help lower costs by encouraging participation and allowing anyone to download the source code and try it out. Most open source communities release updated binaries on a periodic basis, so users can easily try the software on their own timetables. “In most cases, however, the technology’s purchase price makes up less than half of the implementation cost, with the balance going to services.” Many commercial solutions combine proprietary software with service and support, and customers may believe that buying a software license is sufficient to get a search application up and running. In most cases, however, the technology’s purchase price makes up less than half of the implementation cost, with the balance going to services. Both open source and proprietary software usually require a significant amount of customization, which means some service and support costs are inevitable. Pay at the Point of Value Open source project code is freely available for any use. If a company can become proficient with the code, it can make productive use of the code at any phase from evaluation to production. Only in those areas where an open source customer sees value—for support and integration services, or for additional functionality or expertise—does money need to be spent. There are no restrictions on when open source software can be used. In contrast, proprietary products typically must be purchased before they can be used, or in some cases, even evaluated. Some vendors offer evaluation or trial versions, but these often have reduced functionality or restrictive licenses. Because the software must be Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 4
  • 8. purchased before the customer can see any value from the product, return on investment is delayed. Transparent Development Community-developed software enables everyone to see what is being built and which features are included as early as possible. Developers and customers do not need to wait for a vendor to publish a roadmap, or for a vendor product launch, to know what is being readied for release. As a result, prospective users can make better, faster, and more informed decisions relating to their software infrastructure. Compare this to proprietary software, where customers have little if any insight into upcoming products until very late in the product life cycle. This is typically no sooner than the software’s beta release, when it is too late to provide input on features and functionality. This delays assessment and adoption of innovations. Re-tool the employees, retire the software In this tough economic climate, managers who own budgets need to review every expense with a critical eye. Many software applications that made sense a few years back may have out-lived their intended fit to business needs. Any application development effort generates significant learning. The work of development imbues in-house developers with deep knowledge and understanding of the company, its IT infrastructure, culture, and usage requirements. Given that software applications must keep up with an organization’s changing goals and requirements as the needs of its market and constituents evolve, the expertise which the technical staff develops becomes is a vital competitive asset. This is key corollary benefit of the open source model: by retiring old software packages and investing in staff expertise, companies combine innovative technology with their most valuable asset – their people, establishing vital competitive advantage. Companies who leverage savings from not purchasing software licenses to build development talent in-house reduce the cost of addressing inevitable change. What’s more, increasing a technical team’s ability to translate company business objectives into technology solutions increases the likelihood that the software they build will continue to fit that inevitable change. This is particularly true for an enterprise search solution. What’s Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 5
  • 9. more, compared to closed source implementations, in-house developers can work with open source code and supplement additional functions or expertise by relying on the community and marketplace of readily available resources – again capturing unique competitive advantage. “The expertise which your technical staff develops becomes is a vital competitive asset.” Supplementing open source development with training, consulting, and reliable support from established industry experts reinforces a company’s competitive advantage – with the control and flexibility needed to survive and thrive. Lower Overall Risk Vendors use proprietary interfaces and components to lock in customers. However, the source code for open source software is freely available and widely supported by the community, based on standardized, free public interfaces. If a commercial vendor goes out of business (or is purchased by another), or tries to increase fees for a commercial product, open source vendors may be able to step in to meet the needs of customers at market- competitive prices. Open source software can reduce security and operational risks, too. Widely used open source software is essentially under constant peer review. Technical or security issues, once exposed in the community, are readily addressed, resulting in a safer and more reliable product. About Lucid Imagination The benefits of open source have unlocked tremendous value in many software categories: Red Hat’s Enterprise Linux in operating systems, MySQL in database software, Sugar in CRM software—all have benefited from matching the efficiencies of open source with deep, robust commercial resources to ensure successful applications. Today, Lucid Imagination’s Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 6
  • 10. capabilities and expertise brings that same approach to unlocking enterprise search with Lucene and Solr. Lucid Imagination’s mission is to enable customers to achieve business objectives for optimal search performance and accuracy, with lower total cost of ownership and faster time to market. The company’s founding team consists of many key contributors and committers to the Lucene/Solr project, as well as other experts in enterprise search application development. Our skills, acquired across hundreds of deployments, including best practices and technical know-how, can enhance and optimize any phase of an open source search implementation. Lucid Imagination’s team has a deep understanding of indexing, which is the foundation of any search solution; it captures all the content and location of searched documents for quick lookup, much as a book index does. We have broad experience indexing: • Documents of widely varying sizes and formats within a very large collection, • Documents with diverse metadata requirements, and • Multilingual documents. The team is also skilled at applying business rules such as boosting documents and fields, indexing dates, or other attributes of terms and data. Lucid Imagination has developed best practices for indexing and metadata management, and can help establish and refine policies to meet business and technical search requirements, such as: • How and when to add documents to an index, • Removing documents from an index, • Results relevancy and document/data findability • Undeleting documents, and • Batch and real-time updates. The Lucid Imagination team has extensive experience with large-scale search applications, including engagements with: • Large collections—more than one billion documents, • High query volumes and large user populations, • High document growth rates, Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 7
  • 11. Distributed indexing and searching, • Replication and high availability, and • Cloud environments. In addition to fine-tuning search technology machinery, the Lucid Imagination team has significant expertise in natural language processing, which optimizes the interaction of compute resources with human-created content. Key considerations include: • Developing structured methods for characterizing how well a set of results meets user needs, • Establishing a tradeoff between overall net gain in the quality of results across the whole application, versus a single improvement for one query or user, and • Improving the ability to find accurate answers by leveraging a balanced mix of content analysis and query interpretation algorithms. The breadth of expertise offered by Lucid is available in a variety of forms suited to a range of different business needs and deployment requirements. This enables customers to create even more powerful and successful search applications. Engagement Scenarios Virtually every company and organization uses some form of enterprise search, to help customers, employees, and partners find the information they need. Many companies use packaged commercial software applications; but, over time, their requirements evolve beyond the original platform’s limitations. Also, licensing or customization costs may grow too high, or the number and type of documents may expand beyond the original design’s capacity. As companies evaluate the ongoing fit of their current search applications to an ever changing market and organizational landscape, they naturally ask “Is there a faster, cheaper, more effective way to do this?” Today, thousands of companies and organizations—each with unique search and retrieval requirements—answered this question with Lucene/Solr. The essential value of Lucid Imagination and open source Lucene/Solr technology is that it provides commercial support that adapts to specific requirements. Whether a company is evaluating Lucene/Solr for a new implementation, considering replacement of a commercial search Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 8
  • 12. product, or enhancing an existing Lucene/Solr implementation, Lucid Imagination offers skills and resources to help at every phase of the project life cycle. Considering Alternatives to Legacy Packaged Search Applications Change happens quickly, but taking advantage of new opportunities can be limited by existing applications and traditional ways of doing things. Organizations with legacy search applications often realize that they are paying too much to align packaged enterprise search applications with evolving business requirements. In other cases, they discover it is too difficult to integrate existing software with new services, or it takes too long to meet new corporate goals. With the power of Lucene/Solr, Lucid Imagination supplies the expertise organizations need to produce successful search solution efforts, more quickly and less expensively—now and going forward—than other solutions. • Consulting services are highly customized and able to engage quickly to shorten cycles and ramp times, minimize errors and design pitfalls, and improve production results. Lucid Imagination’s consulting team consists of senior search technologists who are intimately familiar with Lucene/Solr technologies and have extensive experience in field-tested search solutions for diverse deployment scenarios. “Organizations with legacy search applications often realize that they are paying too much to align packaged enterprise search applications with evolving business requirements.” Open source software is ideally suited to low-cost prototyping, because it can reduce time to deployment and refine the user experience. For customers striving to integrate a highly diverse base of data and documents, Lucid Imagination offers prototyping services to assist with the process. • Technical training can bring everyone in the IT department up to speed on best practices and the elements of good search design—establishing a solid base of skills before coding begins. This can greatly reduce downstream problems and reduce Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 9
  • 13. overall costs. Lucid Imagination works with in-house application and system administration teams to provide the knowledge transfer, guidance, training, and support required to implement an enterprise search solution that fits the organization’s specific needs. • When dependable, predictable support is required to accompany an organization’s efforts on a regular basis over time, Lucid Imagination’s support subscriptions provide reliable access to domain experts during the entire application life cycle process. Technical Support features the latest tested versions and timely, predictable support turnaround times. Advanced Development Support provides expert architectural design, development, and testing guidance for building search applications using Lucene and Solr. Advanced Production Support provides expert advice on configuration, performance tuning, and optimization for applications deployed to a production operation environment with live users and service-level attainment regimes. Search Health Check, included with Advanced Support, is a comprehensive set of services that ensures applications are designed to meet recommended best practices for search configuration, optimization, and effectiveness. Custom Support packages are also available for unique situations. • Lucid Imagination’s free 30-Day Get Started Program is available with downloads of Lucidworks, our certified distributions of Lucene and Solr. The Get Started Program complements Lucidworks with added guidance for questions on first-time installation, configuration, and basic usage, as well as evaluation of Lucene/Solr and included utilities. LucidWorks for Solr is the logical starting point for most developers building search applications with Lucene/Solr technology for websites, products, or internal organizational use, because it bundles the most recent and stable Apache/Solr capabilities, along with other tools and utilities. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 10
  • 14. Building on In-house Lucene/Solr Expertise Many organizations with in-house Lucene/Solr expertise have achieved considerable sophistication in their deployments. Still, they may reach a point where it is difficult to move the architecture or implementation past a particular design, deployment, or optimization constraint. There can be many reasons for this, such as limitations on staff expertise, design, or architecture. Configurations and policies may not have kept pace with current best practices. A dependent part of the IT environment may have changed— anything from upgraded complementary applications to new middleware, or expanded data volume and variety. For organizations that are ready to gain the required knowledge to move ahead, address the current situation, and make sure that a deployment stays at peak performance, Lucid Imagination recommends an in-depth engagement. Typically in a consultative format, engagement begins with an in-depth assessment and review followed by best practices design recommendations, and ends with a strategy proposal for achieving long-term, sustainable innovation for search solutions. “A significant benefit of open source software is its ability to provide fast, low- cost prototyping as a means to reduce time to deployment and refine the user experience.” Another key area where Lucid Imagination stands ready to help is in optimizing performance—both in application response time and its utilization of hardware/software resources. Lucid Imagination experts work with in-house teams to diagnose and improve search application efficiencies. As mentioned earlier, a significant benefit of open source software is its ability to provide fast, low-cost prototyping as a means to reduce time to deployment and refine the user experience. For customers that seek to integrate highly diverse bases of data and documents, or accelerate evaluations of open source search solutions, Lucid Imagination offers prototyping services. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 11
  • 15. While community support has always been a significant benefit of open source projects, tough issues may not always be answered in timely fashion or with the discretion necessary to prevent exposure of confidential organizational knowledge. That’s when Lucid Imagination’s expert teams can help. Some companies are already skilled in open source technologies in general and Lucene/Solr in particular. For these, Lucid Imagination offers Technical Support and Advanced Support. Technical Support can provide answers within defined response times for users encountering problems with Lucene/Solr projects or production implementations. Different levels of support address most situations. For example, an e-commerce startup may find that community forums provide suitable answers, but not always as quickly as needed. Basic Technical Support provides Web-based and e-mail support at competitive rates for customers that do not require same-day response or direct telephone support. Lucid Imagination also offers various levels of Technical Support for larger or mission- critical installations, including fast turnaround, diagnosis, and bug fixes. Finally, Enterprise Technical Support includes Search Health Checks by Lucid Imagination domain experts to help ensure optimal runtime effectiveness. Next Steps For more information on how Lucid Imagination can help employees, customers, and partners find the information they need, please visit http://www.lucidimagination.com to access blog posts, articles, and reviews of dozens of successful implementations. Please e- mail specific questions to: Support and Service: support@lucidimagination.com Sales and Commercial: sales@lucidimagination.com Consulting: consulting@lucidimagination.com Or call: 1.650.353.4057 Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 12
  • 16. Appendix: Lucene/Solr Features and Benefits Lucene and Solr are complementary technologies that offer very similar underlying capabilities. In choosing a search solution that is best suited for your requirements, key factors to consider are application scope, development environment, and software development preferences. Lucene is a Java technology-based search library that offers speed, relevancy ranking, complete query capabilities, portability, scalability, and low overhead indexes and rapid incremental indexing. Solr is the Lucene Search Server. It presents a web service layer built atop Lucene using the Lucene search library and extending it to provide application users with a ready-to-use search platform. Solr brings with it operational and administrative capabilities like web services, faceting, configurable schema, caching, replication, and administrative tools for configuration, data loading, statistics, logging, cache management, and more. Lucene presents a collection of directly callable Java libraries and requires coding and solid information retrieval experience. Solr extends the capabilities of Lucene to provide an enterprise- ready search platform, eliminating the need for extensive programming. Solr provides the starting point for most developers who are building a Lucene-based search application. It comes ready to run in a servlet container such as Tomcat or Jetty, making it ready to scale in a production Java environment. With convenient ReST-like/web-service interfaces callable over HTTP, and transparent XML-based configuration files, Solr can greatly accelerate application development and maintenance. In fact, Lucene programmers have often reported that they find Solr contains “the same features I was going to build myself as a framework for Lucene, but already very well implemented.” Using Solr, enterprises can customize the search application according to their requirements, without involving the cost and risk of writing the code from the scratch. Lucene provides greater control of your source code and works best in development environments where resources need to be controlled exclusively by Java API calls. It works best when constructing and embedding a state-of-the-art search engine, allowing programmers to assemble and compile inside a native Java application. While working with Lucene, programmers can directly control the large set of sophisticated features with low-level access, data, or state manipulation. Enterprises that do not require strict control of low-level Java libraries generally prefer Solr, as it provides ease of use and scalable search power out of the box. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 13
  • 17. As functional siblings, Lucene and Solr have become popular alternatives for search applications; the two differ mainly in the style of application development used. Key benefits of search with Lucene/Solr include: • Search Quality: Speed, Relevance, and Precision Lucene/Solr provides near-real-time search and strong relevance ranking to deliver contextually relevant and accurate results very quickly. Tailor-made coding for relevancy ranking and sophisticated search capabilities like faceted search help users in sorting, organizing, classifying, and structuring retrieved information to ensure that search delivers desired results. Search with Lucene/Solr also provides proximity operators, wildcards, fielded searching, term/field/document weights, find-similar functions, spell checking, multilingual search, and much more. • Lower Cost and Greater Flexibility, Plug and Play Architecture Lucene/Solr reduces recurring and nonrecurring costs, lowering your TCO. As open source software, it does not require purchase of a license and is freely available for use. The open source code can be used as is, modified, customized, and updated as appropriate to your needs. Solr is easily embedded in your enterprise’s existing infrastructure, reducing costs of installation, configuration, and management. • Open Source Platform for Portability and Easy Deployment Because Lucene/Solr is an open- source software solution, it is based on open standards and community-driven development processes. It is highly portable and can run on any platform that supports Java. For instance, you can build an index on Linux and copy it to a Microsoft Windows machine and search there. This unsurpassed portability enables you to keep your search application and your company’s evolving infrastructure in tandem. Lucene, in turn, has been implemented in other environments, including C#, C, Python, and PHP. At deployment time, Solr offers very flexible options; it can be easily deployed on a single server as well as on distributed, multiserver systems. • Largest Installed Base of Applications, Increasing Customer Base Lucene/Solr is the most widely used open source search system and is installed in around 4,000 organizations worldwide. Publicly visible search sites that use Lucene/Solr include CNET, LinkedIn, Monster, Digg, Zappos, MySpace, Netflix, and Wikipedia. Lucene/Solr is also in use at Apple, HP, IBM, Iron Mountain, and Los Alamos National Laboratories. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 14
  • 18. Large Developer Base and Adaptability As community developed software, Lucene/Solr provides transparent development and easy access to updates and releases. Developers can work with open source code and customize the software according to business-specific needs and objectives. Its open source paradigm lets Lucene/Solr provide developers with the freedom and flexibility to evolve the software with changing requirements, liberating them from the constraints of commercial vendors. • Commercial-Grade Support for Mission Critical Search Applications from Lucid Imagination Lucid Imagination provides the expertise, resources, and services that are needed to help enterprises deploy and develop Lucene-based search solutions efficiently and cost- effectively. Lucid helps enterprises achieve optimal search performance and accuracy with its broad range of expertise, which includes indexing and metadata management, content analysis, business rule application, and natural language processing. Lucid Imagination also offers certified distributions of Lucene and Solr, commercial-grade SLA-based support, training, high-level consulting and value-added software extensions to enable customers to create powerful and successful search applications. Good Information Is Hard to Find: Considering Open Source for Enterprise Search A Lucid Imagination White Paper • March 2010 Page 15